8) Do Hosts Discriminate against Black Guests in Airbnb?¶
E-mail: econometrics.methods@gmail.com
Last updated: 11-1-2020
Edelman et al. (2017) found that Black sounding-names are 16% less likely to be accepted as a guest in Airbnb than White sounding-names. This result is not a mere correlation. The variable race was randomized. The only difference between Blacks and Whites is the name. For everything else, Black and White guests are the same.
Let’s open the dataset of Edelman et al. (2017). Each row is a property of Airbnb in July 2015. The sample is composed of all properties in Baltimore, Dallas, Los Angeles, St. Louis, and Washington, DC.
import numpy as np
import pandas as pd
pd.set_option('precision', 3)
# Data from Edelman et al. (2017)
path = "https://github.com/causal-methods/Data/raw/master/"
df = pd.read_csv(path + "Airbnb.csv")
df.head(5)
host_response | response_date | number_of_messages | automated_coding | latitude | longitude | bed_type | property_type | cancellation_policy | number_guests | ... | los_angeles | sl | dc | total_guests | raw_black | prop_black | any_black | past_guest_merge | filled_september | pr_filled | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Yes | 2015-07-19 08:26:17 | 2.0 | 1.0 | 34.081 | -118.270 | Real Bed | House | Flexible | 3.0 | ... | 1 | 0 | 0 | 11.0 | 0.0 | 0.0 | 0.0 | matched (3) | 1 | 0.412 |
1 | No or unavailable | 2015-07-14 14:13:39 | NaN | 1.0 | 38.911 | -77.020 | NaN | House | Moderate | 2.0 | ... | 0 | 0 | 1 | 167.0 | 0.0 | 0.0 | 0.0 | matched (3) | 1 | 0.686 |
2 | Request for more info (Can you verify? How man... | 2015-07-20 16:24:08 | 2.0 | 0.0 | 34.005 | -118.481 | Pull-out Sofa | Apartment | Strict | 1.0 | ... | 1 | 0 | 0 | 19.0 | 0.0 | 0.0 | 0.0 | matched (3) | 0 | 0.331 |
3 | I will get back to you | 2015-07-20 06:47:38 | NaN | 0.0 | 34.092 | -118.282 | NaN | House | Strict | 8.0 | ... | 1 | 0 | 0 | 41.0 | 0.0 | 0.0 | 0.0 | matched (3) | 0 | 0.536 |
4 | Message not sent | . | NaN | 1.0 | 38.830 | -76.897 | Real Bed | House | Strict | 2.0 | ... | 0 | 0 | 1 | 28.0 | 0.0 | 0.0 | 0.0 | matched (3) | 1 | 0.555 |
5 rows × 104 columns
The chart below shows that a Black guest receives less “Yes” from the hosts than a White guest. Somebody might argue that the results of Edelman et al. (2017) are driven by differences in host responses, such as conditional or non-response. For example, you could argue that Blacks are more likely to have fake accounts categorized as spam. However, note that discrimination results are driven by “Yes” and “No” and not by intermediate responses.
# Data for bar chart
count = pd.crosstab(df["graph_bins"], df["guest_black"])
import plotly.graph_objects as go
node = ['Conditional No', 'Conditional Yes', 'No',
'No Response', 'Yes']
fig = go.Figure(data=[
go.Bar(name='Guest is white', x=node, y=count[0]),
go.Bar(name='Guest is African American', x=node, y=count[1]) ])
fig.update_layout(barmode='group',
title_text = 'Host Responses by Race',
font=dict(size=18) )
fig.show()
Let’s replicate the main results of Edelman et al. (2017).
import statsmodels.api as sm
df['const'] = 1
# Column 1
# The default missing ='drop' of statsmodels doesn't apply
# to the cluster variable. Therefore, it is necessary to drop
# the missing values like below to get the clustered standard
# errors.
df1 = df.dropna(subset=['yes', 'guest_black', 'name_by_city'])
reg1 = sm.OLS(df1['yes'], df1[['const', 'guest_black']])
res1 = reg1.fit(cov_type='cluster',
cov_kwds={'groups': df1['name_by_city']})
# Column 2
vars2 = ['yes', 'guest_black', 'name_by_city',
'host_race_black', 'host_gender_M']
df2 = df.dropna(subset = vars2)
reg2 = sm.OLS(df2['yes'], df2[['const', 'guest_black',
'host_race_black', 'host_gender_M']])
res2 = reg2.fit(cov_type='cluster',
cov_kwds={'groups': df2['name_by_city']})
# Column 3
vars3 = ['yes', 'guest_black', 'name_by_city',
'host_race_black', 'host_gender_M',
'multiple_listings', 'shared_property',
'ten_reviews', 'log_price']
df3 = df.dropna(subset = vars3)
reg3 = sm.OLS(df3['yes'], df3[['const', 'guest_black',
'host_race_black', 'host_gender_M',
'multiple_listings', 'shared_property',
'ten_reviews', 'log_price']])
res3 = reg3.fit(cov_type='cluster',
cov_kwds={'groups': df3['name_by_city']})
columns =[res1, res2, res3]
C:\Anaconda\envs\textbook\lib\site-packages\statsmodels\tools\_testing.py:19: FutureWarning:
pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
# Library to print professional publication
# tables in Latex, HTML, etc.
!pip install stargazer
Requirement already satisfied: stargazer in c:\anaconda\envs\textbook\lib\site-packages (0.0.5)
WARNING: Error parsing requirements for numpy: [Errno 2] No such file or directory: 'c:\\anaconda\\envs\\textbook\\lib\\site-packages\\numpy-1.19.2.dist-info\\METADATA'
In column 1, White-sounding names are accepted 49% of the time; whereas, Black- sounding names are accepted by around 41% of the time. Therefore, a Black name carries a penalty of 8%. This result is remarkably robust to a set of control variables in columns 2 and 3.
# Settings for a nice table
from stargazer.stargazer import Stargazer
stargazer = Stargazer(columns)
stargazer.title('The Impact of Race on Likelihood of Acceptance')
stargazer
Dependent variable:yes | |||
(1) | (2) | (3) | |
const | 0.488*** | 0.497*** | 0.755*** |
(0.012) | (0.013) | (0.067) | |
guest_black | -0.080*** | -0.080*** | -0.087*** |
(0.017) | (0.017) | (0.017) | |
host_gender_M | -0.050*** | -0.048*** | |
(0.014) | (0.014) | ||
host_race_black | 0.069*** | 0.093*** | |
(0.023) | (0.023) | ||
log_price | -0.062*** | ||
(0.013) | |||
multiple_listings | 0.062*** | ||
(0.015) | |||
shared_property | -0.068*** | ||
(0.017) | |||
ten_reviews | 0.120*** | ||
(0.013) | |||
Observations | 6,235 | 6,235 | 6,168 |
R2 | 0.006 | 0.010 | 0.040 |
Adjusted R2 | 0.006 | 0.009 | 0.039 |
Residual Std. Error | 0.496 (df=6233) | 0.495 (df=6231) | 0.488 (df=6160) |
F Statistic | 21.879*** (df=1; 6233) | 15.899*** (df=3; 6231) | 35.523*** (df=7; 6160) |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
The table below presents the summary statistics about the hosts and properties. In an experiment, the mean values of the control variables are identical to the mean values broken by the treatment group and control group.
control = ['host_race_white', 'host_race_black', 'host_gender_F',
'host_gender_M', 'price', 'bedrooms', 'bathrooms', 'number_of_reviews',
'multiple_listings', 'any_black', 'tract_listings', 'black_proportion']
df.describe()[control].T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
host_race_white | 6392.0 | 0.634 | 0.482 | 0.0 | 0.00 | 1.00 | 1.000 | 1.000 |
host_race_black | 6392.0 | 0.078 | 0.269 | 0.0 | 0.00 | 0.00 | 0.000 | 1.000 |
host_gender_F | 6392.0 | 0.376 | 0.485 | 0.0 | 0.00 | 0.00 | 1.000 | 1.000 |
host_gender_M | 6392.0 | 0.298 | 0.457 | 0.0 | 0.00 | 0.00 | 1.000 | 1.000 |
price | 6302.0 | 181.108 | 1280.228 | 10.0 | 75.00 | 109.00 | 175.000 | 100000.000 |
bedrooms | 6242.0 | 3.177 | 2.265 | 1.0 | 2.00 | 2.00 | 4.000 | 16.000 |
bathrooms | 6285.0 | 3.169 | 2.264 | 1.0 | 2.00 | 2.00 | 4.000 | 16.000 |
number_of_reviews | 6390.0 | 30.869 | 72.505 | 0.0 | 2.00 | 9.00 | 29.000 | 1208.000 |
multiple_listings | 6392.0 | 0.326 | 0.469 | 0.0 | 0.00 | 0.00 | 1.000 | 1.000 |
any_black | 6390.0 | 0.282 | 0.450 | 0.0 | 0.00 | 0.00 | 1.000 | 1.000 |
tract_listings | 6392.0 | 9.514 | 9.277 | 1.0 | 2.00 | 6.00 | 14.000 | 53.000 |
black_proportion | 6378.0 | 0.140 | 0.203 | 0.0 | 0.03 | 0.05 | 0.142 | 0.984 |
The balanced treatment tests (t-tests) below show that the Black and White guests are identical.
result = []
for var in control:
# Do the T-test and save the p-value
pvalue = sm.OLS(df[var], df[['const', 'guest_black']],
missing = 'drop').fit().pvalues[1]
result.append(pvalue)
ttest = df.groupby('guest_black').agg([np.mean])[control].T
ttest['p-value'] = result
ttest
guest_black | 0.0 | 1.0 | p-value | |
---|---|---|---|---|
host_race_white | mean | 0.643 | 0.626 | 0.154 |
host_race_black | mean | 0.078 | 0.078 | 0.972 |
host_gender_F | mean | 0.381 | 0.372 | 0.439 |
host_gender_M | mean | 0.298 | 0.299 | 0.896 |
price | mean | 166.429 | 195.815 | 0.362 |
bedrooms | mean | 3.178 | 3.176 | 0.962 |
bathrooms | mean | 3.172 | 3.167 | 0.927 |
number_of_reviews | mean | 30.709 | 31.030 | 0.860 |
multiple_listings | mean | 0.321 | 0.330 | 0.451 |
any_black | mean | 0.287 | 0.277 | 0.382 |
tract_listings | mean | 9.494 | 9.538 | 0.848 |
black_proportion | mean | 0.141 | 0.140 | 0.919 |
Exercises¶
1| To the best of my knowledge, the 3 most important empirical papers in the literature of racial discrimination are Bertrand & Mullainathan (2004), Oreopoulos (2011), and Edelman et al. (2017). These 3 papers use a field experiment to capture causality and rule out confound factors. Search on the Internet and return a reference list of experimental papers about racial discrimination.
2| Tell me a topic that you are passionate. Return a reference list of experimental papers about your topic.
3| Somebody argues that specific names drive the results of Edelman et al. (2017). In the tables below, you can see that there are not many different names representing Black and White. How can this critic be refuted? What can you do to show that results are not driven by specific names?
female = df['guest_gender']=='female'
df[female].groupby(['guest_race', 'guest_first_name'])['yes'].mean()
guest_race guest_first_name
black Lakisha 0.433
Latonya 0.370
Latoya 0.442
Tamika 0.482
Tanisha 0.413
white Allison 0.500
Anne 0.567
Kristen 0.486
Laurie 0.508
Meredith 0.498
Name: yes, dtype: float64
male = df['guest_gender']=='male'
df[male].groupby(['guest_race', 'guest_first_name'])['yes'].mean()
guest_race guest_first_name
black Darnell 0.412
Jamal 0.354
Jermaine 0.379
Kareem 0.436
Leroy 0.371
Rasheed 0.409
Tyrone 0.377
white Brad 0.419
Brent 0.494
Brett 0.466
Greg 0.467
Jay 0.581
Todd 0.448
Name: yes, dtype: float64
4| Is there any potential research question that can be explored based on the table below? Justify.
pd.crosstab(index= [df['host_gender_F'], df['host_race']],
columns=[df['guest_gender'], df['guest_race']],
values=df['yes'], aggfunc='mean')
guest_gender | female | male | |||
---|---|---|---|---|---|
guest_race | black | white | black | white | |
host_gender_F | host_race | ||||
0 | UU | 0.400 | 0.542 | 0.158 | 0.381 |
asian | 0.319 | 0.378 | 0.474 | 0.511 | |
black | 0.444 | 0.643 | 0.419 | 0.569 | |
hisp | 0.464 | 0.571 | 0.375 | 0.478 | |
mult | 0.568 | 0.727 | 0.408 | 0.357 | |
unclear | 0.444 | 0.500 | 0.444 | 0.333 | |
unclear_three votes | 0.476 | 0.392 | 0.368 | 0.367 | |
white | 0.383 | 0.514 | 0.386 | 0.449 | |
1 | UU | 0.444 | 0.250 | 0.333 | 0.750 |
asian | 0.429 | 0.607 | 0.436 | 0.460 | |
black | 0.603 | 0.537 | 0.397 | 0.446 | |
hisp | 0.391 | 0.667 | 0.292 | 0.389 | |
unclear | 0.600 | 0.556 | 0.125 | 0.400 | |
unclear_three votes | 0.387 | 0.583 | 0.312 | 0.657 | |
white | 0.450 | 0.494 | 0.370 | 0.476 |
5| In Edelman et al. (2017), the variable “name_by_city” was used to cluster the standard errors. How was the variable “name_by_city” created based on other variables? Show the code.
6| Use the data from Edelman et al. (2017) to test the homophily hypothesis that hosts might prefer guests of the same race. Produce a nice table using the library Stargazer. Interpret the results.
7| Overall, people know that socioeconomic status is correlated with race. Fryer & Levitt (2004) showed that distinct/unique African American names are correlated with lower socioeconomic status. Edelman et al. (2017: 17) clearly state: “Our findings cannot identify whether the discrimination is based on race, socioeconomic status, or a combination of these two.” Propose an experimental design to disentangle the effect of race from socioeconomic status. Explain your assumptions and describe the procedures in detail.
Reference¶
Bertrand, Marianne, and Sendhil Mullainathan. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94 (4): 991-1013.
Edelman, Benjamin, Michael Luca, and Dan Svirsky. (2017). Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment. American Economic Journal: Applied Economics, 9 (2): 1-22.
Fryer, Roland G., Jr., and Steven D. Levitt. (2004). The Causes and Consequences of Distinctively Black Names. Quarterly Journal of Economics 119 (3): 767–805.
Oreopoulos, Philip. (2011). Why Do Skilled Immigrants Struggle in the Labor Market? A Field Experiment with Thirteen Thousand Resumes. American Economic Journal: Economic Policy, 3 (4): 148-71.