These data were accessed from Inside Airbnb on September 2, 2017. The version of the data that we will use in this class can be found here.
Inside Airbnb is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world.
By analyzing publicly available information about a city’s Airbnb’s listings, Inside Airbnb provides filters and key metrics so you can see how Airbnb is being used to compete with the residential housing market.
Inside Airbnb provides some visualizations of the NYC Airbnb data here, where you can see maps showing type of room, activity, availability, and listings per host for all NYC Airbnb listings.
After downloading both the “listings.csv.gz” and “listings.csv” files, the following code was used to create the provided dataset:
library(dplyr)
# Uncompress and load the detailed NYC airbnb data
# But only keep the 2 variables of interest
zz <- gzfile("listings.csv.gz", 'rt')
df1 <- read.csv(zz, header = T) %>%
select(id, review_scores_location)
# Read in the summary data for NYC
df2 <- read.csv("listings.csv", stringsAsFactors = FALSE) %>%
mutate(last_review = as.Date(last_review, format = "%Y-%m-%d"))
# Combine the two datasets
df <- inner_join(df1, df2, by = "id")
# Save the data
save(df, file = "nyc_airbnb.RData")
The resulting R data file nyc_airbnb
contains a single dataframe df
with 40,753 rows of data on 17 variables:
id
: listing idreview_scores_location
: 0-5 stars converted into a 0-10 scalename
: listing namehost_id
: host idhost_name
: host nameneighbourhood_group
: NYC boroughneighbourhood
: NYC neighborhoodlatitude
: listing latitudelongitude
: listing longituderoom_type
: type of listing (Entire home/apt, Private room, Shared room)price
: listing priceminimum_nights
: required minimum nights staynumber_of_reviews
: total number of reviewslast_review
: date of last reviewreviews per month
: average number of reviews per monthcalculated_host_listings_count
: total number of listings for this hostavailability_365
: number of days listing is available out of 365