Boston’s Waterfront

Plan a vacation in Boston or Seattle?

4 Interesting things to learn from Airbnb’s April 2020 open data

Donglin Chen
6 min readMay 11, 2020

--

Introduction

Travelers around the world have been using Airbnb for rental accomadations to save money or look for different experiences.

As a software engineer who loves travel and are currently venturing into data science field, I am interested in analyzing the Airbnb datasets to find answers to the following 4 questions, and to build a price model to predict the rental price for a potential property to be listed on Airbnb.

1. What kind of property type were available in listings and what were the availabilities for those property types?

2. What were the listed prices in Boston and Seattle? What price range have the most listings?

3. Where could you find low price or high quality airbnb rentals in Boston?

4. How did price change in different months of the year?

The 4 interesting things to learn

1. What kind of property type were available in listings and what were the availabilities for those property types?

As we can see from below graphs (Figure 1) that show the 8 top number listings available by property type in Boston and Seattle.

In Boston, 62.76% of total available rentals listed on Airbnb were apartments. House ranked in second with only16%, and condominium ranked third with only 8%.

In Seattle, things were different, both home and apartment has similar share of total listings with 31% and 29% respectively, while guest suite, townhouse, condominium, and serviced apartment each counted from 11.7% to 5.44%.

Compare to Boston, Seattle has about twice the number of rental properties available on Airbnb. In addition, Seattle has more variety of property types available to choose from.

Figure 1 — Top 8 listings by property type

2. What were the listed prices in Boston and Seattle? What price range have the most listings?

The below figures show the number of Airbnb listings by price range.

Seattle’s listings tend to have a normal price distribution with most listed at around $100. In Boston, prices fluctuate more across the whole range we selected. There were about equal number of listings in the price range of $100, $150, and $200.

As a result the prices in Boston had slightly higher median, average, and standard deviation. It seemed harder for a traveler to compare and find a good rental in Boston than in Seattle.

Figure 2 — Number of Airbnb listings by price range
Boston airbnb list prices:
Average price: $149.03
Median price: $130.00
Price standard diviation: $90.87

Seattle airbnb list prices:
Average price: $138.48
Median price: $115.00
Price standard deviation: $85.26

3. Where could you find low price or high quality airbnb rentals in Boston?

Now we know that prices in Boston fluctuate more, we can further explore the data to investigate where are the low price or high quality rentals.

Figure 3 shows that Dorchester has the highest number of listings, and the average price in Dorchester is ranked to bottom three, which means is cheaper than almost all other neighborhoods. The second most avaiable neighborhood was Allston-Brighton which is also cheaper than most of other neighborhoods. However, Back Bay area ranked 3 in availability (8%), and the average price in back Bay is ranked top 4.

The number of listings in Boston is only half of that in Seattle, but since a great number of listing are in cheaper neighborhoods so the total average price is only a little higher than the average price in Seattle.

If you were only looking for saving money which traveling in Boston, you could check out the listings located in Dorchester or Allston-Brighton area. But you were looking for a balance between high quality and price, then Back Bay area could be the place to consider.

Figure 3 — Top 25 neighborhoods with most number of listings, and their average prices

4. How did price change in different months of the year?

From figure 4, it shows in both Boston and Seattle, the price changed in similar pattern, although in different scale, in different months of the year.

Not supprisingly, prices during June, July, and August were highest when most people started to take summer vacations.

Figure 4 — Average price change by month

A pricing model to predict Airbnb price in Boston

Since the listing prices in Boston fluctuated more accross the whole price range, while listing prices in Seattle seemed to follow a normal distribution. It would be reatively harder to figure out what should be the proper price in Boston. It would be interesting to build a simple price model to predict the listing price bases on some common airbnb listing factors, like property type, neighborhood, number of guest, bedrooms, bathrooms, review scores, etc… This model could also be helpful to the ones who were interested in posting rentals on airbnb site and were wondering how to set the proper price according to market value.

When building a model, it is important to preprocess the raw boston listings datasets so it can be properly fed to train the price model. I took the following aproaches to clean and encoding the listing data:

  1. Removed abnormal listing prices. There are records that has unusual prices and after investigate they seemed to be a mistake during data collection or maybe in initial posting, i.e. property 475254 has a list price of $6,000, but following the listing url to the site the price actually showed $146. I decide to filter all listings with prices between $10 and $500.
  2. Removed the unusual property types where they count less than one percent of total number of listings, those rare properties like castle, barn, or boat would have a negative impact to normal model accuracy.
  3. Removed the rare neighborhoods where they only had a few (less than 10) listings.
  4. Encoded categorical values into numerical values
  5. Imputed missing values with different strategies, i.e. filled the missing values with median or 0.

After done above data preprocessing, I used scikit-learn package to split data (80/20) into train and test dataset, fed the normalized training data into LinearRegression model to train the price model.

After the price model was trained, I randomly sampled 40 listings from the test dataset and use them to predict the prices. Figure 5 compared the predicted prices against the true listing prices for the same listing.

Some listings on the airbnb might be priced off the market value, the price model could be used to double check if the listing were priced properly in the market.

If you were interested in a particular airbnb listing in Boston, and the model predicted the listing with much higher price than the actual listing price, then you may have found yourself to be a really smart shopper!

Figure 5 — Price comparison between model predicted prices and true listing prices from 40 random sampled test data

The data for this post can be found at https://www.kaggle.com/airbnb/boston and https://www.kaggle.com/airbnb/seattle/data

The source code I use to analyze the data and to build a simple linear regression model are available at https://github.com/donglinchen/airbnb_analysis.

--

--