8) Do Hosts Discriminate against Black Guests in Airbnb?¶

E-mail: econometrics.methods@gmail.com

Last updated: 11-1-2020

Edelman et al. (2017) found that Black sounding-names are 16% less likely to be accepted as a guest in Airbnb than White sounding-names. This result is not a mere correlation. The variable race was randomized. The only difference between Blacks and Whites is the name. For everything else, Black and White guests are the same.

Let’s open the dataset of Edelman et al. (2017). Each row is a property of Airbnb in July 2015. The sample is composed of all properties in Baltimore, Dallas, Los Angeles, St. Louis, and Washington, DC.

import numpy as np
import pandas as pd
pd.set_option('precision', 3)

# Data from Edelman et al. (2017)
path = "https://github.com/causal-methods/Data/raw/master/" 
df = pd.read_csv(path + "Airbnb.csv")
df.head(5)

	host_response	response_date	number_of_messages	automated_coding	latitude	longitude	bed_type	property_type	cancellation_policy	number_guests	...	los_angeles	dc	total_guests	past_guest_merge	filled_september	pr_filled
0	Yes	2015-07-19 08:26:17	2.0	1.0	34.081	-118.270	Real Bed	House	Flexible	3.0	...	1	0	11.0	matched (3)	1	0.412
1	No or unavailable	2015-07-14 14:13:39	NaN	1.0	38.911	-77.020	NaN	House	Moderate	2.0	...	0	1	167.0	matched (3)	1	0.686
2	Request for more info (Can you verify? How man...	2015-07-20 16:24:08	2.0	0.0	34.005	-118.481	Pull-out Sofa	Apartment	Strict	1.0	...	1	0	19.0	matched (3)	0	0.331
3	I will get back to you	2015-07-20 06:47:38	NaN	0.0	34.092	-118.282	NaN	House	Strict	8.0	...	1	0	41.0	matched (3)	0	0.536
4	Message not sent	.	NaN	1.0	38.830	-76.897	Real Bed	House	Strict	2.0	...	0	1	28.0	matched (3)	1	0.555

5 rows × 104 columns

The chart below shows that a Black guest receives less “Yes” from the hosts than a White guest. Somebody might argue that the results of Edelman et al. (2017) are driven by differences in host responses, such as conditional or non-response. For example, you could argue that Blacks are more likely to have fake accounts categorized as spam. However, note that discrimination results are driven by “Yes” and “No” and not by intermediate responses.

# Data for bar chart
count = pd.crosstab(df["graph_bins"], df["guest_black"])

import plotly.graph_objects as go

node = ['Conditional No', 'Conditional Yes', 'No',
        'No Response', 'Yes']
fig = go.Figure(data=[
    go.Bar(name='Guest is white', x=node, y=count[0]),
    go.Bar(name='Guest is African American', x=node, y=count[1]) ])

fig.update_layout(barmode='group',
  title_text = 'Host Responses by Race',
  font=dict(size=18) )

fig.show()

Let’s replicate the main results of Edelman et al. (2017).

import statsmodels.api as sm

df['const'] = 1 

# Column 1
#  The default missing ='drop' of statsmodels doesn't apply
# to the cluster variable. Therefore, it is necessary to drop
# the missing values like below to get the clustered standard 
# errors.
df1 = df.dropna(subset=['yes', 'guest_black', 'name_by_city'])
reg1 = sm.OLS(df1['yes'], df1[['const', 'guest_black']])
res1 = reg1.fit(cov_type='cluster',
                cov_kwds={'groups': df1['name_by_city']})

# Column 2
vars2 = ['yes', 'guest_black', 'name_by_city', 
        'host_race_black', 'host_gender_M']
df2 = df.dropna(subset = vars2)
reg2 = sm.OLS(df2['yes'], df2[['const', 'guest_black',
                    'host_race_black', 'host_gender_M']])
res2 = reg2.fit(cov_type='cluster',
                cov_kwds={'groups': df2['name_by_city']})

# Column 3
vars3 = ['yes', 'guest_black', 'name_by_city', 
         'host_race_black', 'host_gender_M',
         'multiple_listings', 'shared_property',
         'ten_reviews', 'log_price']
df3 = df.dropna(subset = vars3)
reg3 = sm.OLS(df3['yes'], df3[['const', 'guest_black',
                    'host_race_black', 'host_gender_M',
                    'multiple_listings', 'shared_property',
                    'ten_reviews', 'log_price']])
res3 = reg3.fit(cov_type='cluster',
                cov_kwds={'groups': df3['name_by_city']})

columns =[res1, res2, res3]

C:\Anaconda\envs\textbook\lib\site-packages\statsmodels\tools\_testing.py:19: FutureWarning:

pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

#  Library to print professional publication
# tables in Latex, HTML, etc.
!pip install stargazer

Requirement already satisfied: stargazer in c:\anaconda\envs\textbook\lib\site-packages (0.0.5)

WARNING: Error parsing requirements for numpy: [Errno 2] No such file or directory: 'c:\\anaconda\\envs\\textbook\\lib\\site-packages\\numpy-1.19.2.dist-info\\METADATA'

In column 1, White-sounding names are accepted 49% of the time; whereas, Black- sounding names are accepted by around 41% of the time. Therefore, a Black name carries a penalty of 8%. This result is remarkably robust to a set of control variables in columns 2 and 3.

# Settings for a nice table
from stargazer.stargazer import Stargazer
stargazer = Stargazer(columns)
stargazer.title('The Impact of Race on Likelihood of Acceptance')
stargazer

The Impact of Race on Likelihood of Acceptance


	Dependent variable:yes

	(1)	(2)	(3)

const	0.488^***	0.497^***	0.755^***
	(0.012)	(0.013)	(0.067)
guest_black	-0.080^***	-0.080^***	-0.087^***
	(0.017)	(0.017)	(0.017)
host_gender_M		-0.050^***	-0.048^***
		(0.014)	(0.014)
host_race_black		0.069^***	0.093^***
		(0.023)	(0.023)
log_price			-0.062^***
			(0.013)
multiple_listings			0.062^***
			(0.015)
shared_property			-0.068^***
			(0.017)
ten_reviews			0.120^***
			(0.013)

Observations	6,235	6,235	6,168
R²	0.006	0.010	0.040
Adjusted R²	0.006	0.009	0.039
Residual Std. Error	0.496 (df=6233)	0.495 (df=6231)	0.488 (df=6160)
F Statistic	21.879^*** (df=1; 6233)	15.899^*** (df=3; 6231)	35.523^*** (df=7; 6160)

Note:	^p<0.1; ^p<0.05; ^**p<0.01

The table below presents the summary statistics about the hosts and properties. In an experiment, the mean values of the control variables are identical to the mean values broken by the treatment group and control group.

control = ['host_race_white', 'host_race_black', 'host_gender_F', 
	'host_gender_M', 'price', 'bedrooms', 'bathrooms', 'number_of_reviews', 
	'multiple_listings', 'any_black', 'tract_listings', 'black_proportion']

df.describe()[control].T          

	count	mean	std	min	25%	50%	75%	max
host_race_white	6392.0	0.634	0.482	0.0	0.00	1.00	1.000	1.000
host_race_black	6392.0	0.078	0.269	0.0	0.00	0.00	0.000	1.000
host_gender_F	6392.0	0.376	0.485	0.0	0.00	0.00	1.000	1.000
host_gender_M	6392.0	0.298	0.457	0.0	0.00	0.00	1.000	1.000
price	6302.0	181.108	1280.228	10.0	75.00	109.00	175.000	100000.000
bedrooms	6242.0	3.177	2.265	1.0	2.00	2.00	4.000	16.000
bathrooms	6285.0	3.169	2.264	1.0	2.00	2.00	4.000	16.000
number_of_reviews	6390.0	30.869	72.505	0.0	2.00	9.00	29.000	1208.000
multiple_listings	6392.0	0.326	0.469	0.0	0.00	0.00	1.000	1.000
any_black	6390.0	0.282	0.450	0.0	0.00	0.00	1.000	1.000
tract_listings	6392.0	9.514	9.277	1.0	2.00	6.00	14.000	53.000
black_proportion	6378.0	0.140	0.203	0.0	0.03	0.05	0.142	0.984

The balanced treatment tests (t-tests) below show that the Black and White guests are identical.

result = []

for var in control:
    # Do the T-test and save the p-value
    pvalue = sm.OLS(df[var], df[['const', 'guest_black']],
               missing = 'drop').fit().pvalues[1]
    result.append(pvalue)

ttest = df.groupby('guest_black').agg([np.mean])[control].T
ttest['p-value'] = result
ttest

	guest_black	0.0	1.0	p-value
host_race_white	mean	0.643	0.626	0.154
host_race_black	mean	0.078	0.078	0.972
host_gender_F	mean	0.381	0.372	0.439
host_gender_M	mean	0.298	0.299	0.896
price	mean	166.429	195.815	0.362
bedrooms	mean	3.178	3.176	0.962
bathrooms	mean	3.172	3.167	0.927
number_of_reviews	mean	30.709	31.030	0.860
multiple_listings	mean	0.321	0.330	0.451
any_black	mean	0.287	0.277	0.382
tract_listings	mean	9.494	9.538	0.848
black_proportion	mean	0.141	0.140	0.919

Exercises¶

1| To the best of my knowledge, the 3 most important empirical papers in the literature of racial discrimination are Bertrand & Mullainathan (2004), Oreopoulos (2011), and Edelman et al. (2017). These 3 papers use a field experiment to capture causality and rule out confound factors. Search on the Internet and return a reference list of experimental papers about racial discrimination.

2| Tell me a topic that you are passionate. Return a reference list of experimental papers about your topic.

3| Somebody argues that specific names drive the results of Edelman et al. (2017). In the tables below, you can see that there are not many different names representing Black and White. How can this critic be refuted? What can you do to show that results are not driven by specific names?

female = df['guest_gender']=='female'
df[female].groupby(['guest_race', 'guest_first_name'])['yes'].mean()

guest_race  guest_first_name
black       Lakisha             0.433
            Latonya             0.370
            Latoya              0.442
            Tamika              0.482
            Tanisha             0.413
white       Allison             0.500
            Anne                0.567
            Kristen             0.486
            Laurie              0.508
            Meredith            0.498
Name: yes, dtype: float64

male = df['guest_gender']=='male'
df[male].groupby(['guest_race', 'guest_first_name'])['yes'].mean()

guest_race  guest_first_name
black       Darnell             0.412
            Jamal               0.354
            Jermaine            0.379
            Kareem              0.436
            Leroy               0.371
            Rasheed             0.409
            Tyrone              0.377
white       Brad                0.419
            Brent               0.494
            Brett               0.466
            Greg                0.467
            Jay                 0.581
            Todd                0.448
Name: yes, dtype: float64

4| Is there any potential research question that can be explored based on the table below? Justify.

pd.crosstab(index= [df['host_gender_F'], df['host_race']],
            columns=[df['guest_gender'], df['guest_race']], 
            values=df['yes'], aggfunc='mean')

	guest_gender	female		male
	guest_race	black	white	black	white
host_gender_F	host_race
0	UU	0.400	0.542	0.158	0.381
	asian	0.319	0.378	0.474	0.511
	black	0.444	0.643	0.419	0.569
	hisp	0.464	0.571	0.375	0.478
	mult	0.568	0.727	0.408	0.357
	unclear	0.444	0.500	0.444	0.333
	unclear_three votes	0.476	0.392	0.368	0.367
	white	0.383	0.514	0.386	0.449
1	UU	0.444	0.250	0.333	0.750
	asian	0.429	0.607	0.436	0.460
	black	0.603	0.537	0.397	0.446
	hisp	0.391	0.667	0.292	0.389
	unclear	0.600	0.556	0.125	0.400
	unclear_three votes	0.387	0.583	0.312	0.657
	white	0.450	0.494	0.370	0.476

5| In Edelman et al. (2017), the variable “name_by_city” was used to cluster the standard errors. How was the variable “name_by_city” created based on other variables? Show the code.

6| Use the data from Edelman et al. (2017) to test the homophily hypothesis that hosts might prefer guests of the same race. Produce a nice table using the library Stargazer. Interpret the results.

7| Overall, people know that socioeconomic status is correlated with race. Fryer & Levitt (2004) showed that distinct/unique African American names are correlated with lower socioeconomic status. Edelman et al. (2017: 17) clearly state: “Our findings cannot identify whether the discrimination is based on race, socioeconomic status, or a combination of these two.” Propose an experimental design to disentangle the effect of race from socioeconomic status. Explain your assumptions and describe the procedures in detail.

Reference¶

Bertrand, Marianne, and Sendhil Mullainathan. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94 (4): 991-1013.

Edelman, Benjamin, Michael Luca, and Dan Svirsky. (2017). Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment. American Economic Journal: Applied Economics, 9 (2): 1-22.

Fryer, Roland G., Jr., and Steven D. Levitt. (2004). The Causes and Consequences of Distinctively Black Names. Quarterly Journal of Economics 119 (3): 767–805.

Oreopoulos, Philip. (2011). Why Do Skilled Immigrants Struggle in the Labor Market? A Field Experiment with Thirteen Thousand Resumes. American Economic Journal: Economic Policy, 3 (4): 148-71.

Causal Inference with Python

8) Do Hosts Discriminate against Black Guests in Airbnb?¶

Exercises¶

Reference¶