Tannenbaum Paper#
PISA vs. Survey Data#
The Tannenbaum paper aimed to validate survey measures of social capital using their correlation with wallet report rates. Since PISA education scores are not explicit and direct measurements of social capital, it’s not clear that they could also be seen as validating the survey measures referenced in the paper. However, as can be seen below, the correlations are surprisingly consistent between wallet reporting rates/PISA scores and survey measures. This suggests that they not only contain the same amount of information about social capital but also the same type of information about social capital.
Show code cell source
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = "sphinx_gallery"
import statsmodels.api as sm
# Data import
survey_cols = [
"general_trust",
"GPS_trust",
"general_morality",
"MFQ_genmorality",
"civic_cooperation",
"GPS_posrecip",
"GPS_altruism",
"stranger1",
]
cat_cols = [
"country",
"response",
"male",
"above40",
"computer",
"coworkers",
"other_bystanders",
"institution",
"cond",
"security_cam",
"security_guard",
"local_recipient",
"no_english",
"understood_situation",
]
sc_cols = [
"log_gdp",
"log_tfp",
"gee",
"letter_grading",
]
# Import Tannenbaum data
df = pd.read_csv(
"../data/tannenbaum_data.csv",
dtype={col: "category" for col in cat_cols},
)
# Import PISA data
pisa = pd.read_csv("../data/pisa_data.csv").rename(columns={"mean_score": "pisa_score"})
df = df.merge(pisa, how="left", on="country")
# Columns we want to see correlations for.
cols_for_country_avg_corr = ["response", "pisa_score"] + survey_cols
df_corr = df.copy().astype({"response": int})
# Calculate country averages for these measures
country_avg_data = df_corr.groupby("country")[cols_for_country_avg_corr].mean()
# Compute the correlation matrix
comprehensive_corr_matrix = country_avg_data.corr()
# Show correlations of interest
comprehensive_corr_matrix.columns = pd.MultiIndex.from_product(
[["Correlation (r)"], comprehensive_corr_matrix.columns]
)
comprehensive_corr_matrix.iloc[:2, 2:]
| Correlation (r) | ||||||||
|---|---|---|---|---|---|---|---|---|
| general_trust | GPS_trust | general_morality | MFQ_genmorality | civic_cooperation | GPS_posrecip | GPS_altruism | stranger1 | |
| response | 0.603736 | 0.023510 | 0.612047 | 0.461323 | 0.391755 | 0.050279 | -0.214705 | 0.645001 |
| pisa_score | 0.633428 | 0.122152 | 0.659558 | 0.364130 | 0.395210 | -0.156832 | -0.159437 | 0.665572 |
Show code cell source
# Reshape dataframe for graphing ease.
df_reshaped = country_avg_data.reset_index().melt(
id_vars=["country", "response", "pisa_score"]
)
# Calculate sample size for each survey measure and wallet report rates
ens_wallet = pd.DataFrame(
{
col: country_avg_data[["response", col]].dropna().shape[0]
for col in survey_cols
},
index=["N"],
)
# Wallet report rate vs survey measure facet plot.
fig = px.scatter(
df_reshaped,
x="value",
y="response",
facet_col="variable",
facet_col_wrap=4,
trendline="ols",
facet_col_spacing=0.06,
facet_row_spacing=0.15,
)
fig.update_xaxes(showline=True, linecolor="darkgray")
fig.update_yaxes(showline=True, linecolor="darkgray")
fig.for_each_annotation(
lambda a: a.update(
text=a.text.split("=")[-1]
+ " (N="
+ str(ens_wallet.loc["N", a.text.split("=")[-1]])
+ ")"
)
)
fig.show()
The facet plot above is a replication of Figure 3 from Tannenbaum and can be compared with the plot below that has PISA scores instead of wallet return rates on the y-axes.
Show code cell source
# Calculate sample size for each survey measure and PISA
ens_pisa = pd.DataFrame(
{
col: country_avg_data[["pisa_score", col]].dropna().shape[0]
for col in survey_cols
},
index=["N"],
)
# PISA vs Survey measure facet plot
fig = px.scatter(
df_reshaped,
x="value",
y="pisa_score",
facet_col="variable",
facet_col_wrap=4,
trendline="ols",
facet_col_spacing=0.06,
facet_row_spacing=0.15,
)
fig.update_xaxes(showline=True, linecolor="darkgray")
fig.update_yaxes(showline=True, linecolor="darkgray")
fig.for_each_annotation(
lambda a: a.update(
text=a.text.split("=")[-1]
+ " (N="
+ str(ens_pisa.loc["N", a.text.split("=")[-1]])
+ ")"
)
)
fig.show()
PISA as a Predictor of Economic and Institutional Performance#
To address the second topic of the Tannenbaum paper, we ask how well do PISA scores (as compared to lost wallet reporting rates) explain variation in economic development?
There are four measures of “Economic and Institutional Performance” in the second part of the paper: GDP per capita (log_gdp), productivity(log_tfp), government effectiveness (gee), and letter grade efficiency (letter_grading). If PISA scores are considered a fifth measure of the same sort, we find that wallet reporting rates are an equally effective predictor. When combined with any other measure of social capital, the coefficient for wallet reporting rates are always statistically significant with p<0.01 and with \(R^2\) greater than most of the other fit models.
Regression Results#
The table below seeks to replicate part of Table 2 from Tannenbaum and adds two new columns (Model 9 and Model 10) which contain the respective OLS model results where PISA scores are predicted. Models 7 and 8 are recreated here to show that the table is generated with the same process that generated Table 2 in Tannenbaum.
Show code cell source
import pandas as pd
import statsmodels.api as sm
from great_tables import GT
# Data import
survey_cols = [
"general_trust",
"GPS_trust",
"general_morality",
"MFQ_genmorality",
"civic_cooperation",
"GPS_posrecip",
"GPS_altruism",
"stranger1",
]
econ_cols = [
"log_gdp",
"log_tfp",
"gee",
"letter_grading",
]
df = pd.read_csv(
"../data/tannenbaum_data.csv",
)
# Add PISA data
pisa = pd.read_csv("../data/pisa_data.csv")
pisa = pisa.loc[pisa['year'] == 2015]
pisa = pisa.groupby('country')['pisa_score'].mean().reset_index()
df = df.merge(pisa, how="left", on="country")
# p-value stars to award to each parameter coef estimate.
def stars(p):
if p > 0.1:
return ""
elif p > 0.05:
return "*"
elif p > 0.01:
return "**"
else:
return "***"
# Run regression for each survey measure.
def get_model_results_no_pred(survey_measure, econ_measure):
regression_df = (
df.groupby("country")[[econ_measure, survey_measure]].mean().dropna()
)
y = regression_df[econ_measure]
X = regression_df[[survey_measure]]
# Standardize predictors
X_std = (X - X.mean()) / X.std()
X_std = sm.add_constant(X_std)
model = sm.OLS(y, X_std)
results = model.fit(cov_type="HC1") # Robust standard errors same as in Tannenbaum
result_df = (
pd.DataFrame(
{
"param": pd.Series(
[
f"{v:.3f}{stars(p)}"
for v, p in zip(results.params[1:], results.pvalues[1:])
],
index=results.params.index[1:],
),
"se": results.bse[1:].apply(lambda x: f"({x:.3f})"),
}
)
.astype({"param": object, "se": object})
.stack()
)
result_df.loc[("<i>N<i>", "")] = X.shape[0]
result_df.loc[("<i>R<i><sup>2</sup>", "")] = f"{results.rsquared:.3f}"
result_df = result_df.reset_index()
result_df["measure"] = survey_measure
return result_df
# Run regression for each survey measure with predictor variable.
def get_model_results(survey_measure, econ_measure):
regression_df = (
df.groupby("country")[["response", econ_measure, survey_measure]]
.mean()
.dropna()
)
y = regression_df[econ_measure]
X = regression_df[[survey_measure, "response"]]
# Standardize predictors
X_std = (X - X.mean()) / X.std()
X_std = sm.add_constant(X_std)
model = sm.OLS(y, X_std)
results = model.fit(cov_type="HC1") # Robust standard errors same as in Tannenbaum
result_df = (
pd.DataFrame(
{
"param": pd.Series(
[
f"{v:.3f}{stars(p)}"
for v, p in zip(results.params[1:], results.pvalues[1:])
],
index=results.params.index[1:],
),
"se": results.bse[1:].apply(lambda x: f"({x:.3f})"),
}
)
.astype({"param": object, "se": object})
.stack()
)
result_df.loc[("<i>N<i>", "")] = X.shape[0]
result_df.loc[("<i>R<i><sup>2</sup>", "")] = f"{results.rsquared:.3f}"
result_df = result_df.reset_index()
result_df["measure"] = survey_measure
return result_df
model_7_results = [
get_model_results_no_pred(col, "letter_grading") for col in survey_cols
]
model_7 = pd.concat(model_7_results)
model_7 = model_7.rename(columns={0: "Model 7"})
model_8_results = [get_model_results(col, "letter_grading") for col in survey_cols]
model_8 = pd.concat(model_8_results)
model_8 = model_8.rename(columns={0: "Model 8"})
model_9_results = [get_model_results_no_pred(col, "pisa_score") for col in survey_cols]
model_9 = pd.concat(model_9_results)
model_9 = model_9.rename(columns={0: "Model 9"})
model_10_results = [get_model_results(col, "pisa_score") for col in survey_cols]
model_10 = pd.concat(model_10_results)
model_10 = model_10.rename(columns={0: "Model 10"})
# Combine results and make pretty.
display_df = (
model_7.merge(model_8, on=["level_0", "level_1", "measure"], how="right")
.merge(model_9, on=["level_0", "level_1", "measure"], how="left")
.merge(model_10, on=["level_0", "level_1", "measure"], how="right")
.iloc[:, [0, 2, 3, 4, 5, 6]]
)
display_df.loc[:, "level_0"] = display_df.loc[:, "level_0"].where(
display_df.loc[:, "level_0"] != display_df.loc[:, "level_0"].shift(), ""
)
display_df
(
GT(display_df)
.tab_header(title="TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES")
.tab_stub(rowname_col="level_0", groupname_col="measure")
.tab_spanner(label="Letter grade efficiency", columns=["Model 7", "Model 8"])
.tab_spanner(label="PISA Score", columns=["Model 9", "Model 10"])
.tab_options(
table_body_hlines_style="none",
)
.cols_align(align="center", columns=["Model 7", "Model 8"])
)
| TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES | ||||
| Letter grade efficiency | PISA Score | |||
|---|---|---|---|---|
| Model 7 | Model 8 | Model 9 | Model 10 | |
| general_trust | ||||
| general_trust | 0.077* | -0.013 | 26.665*** | 9.099 |
| (0.041) | (0.040) | (4.461) | (6.632) | |
| response | 0.148*** | 24.956*** | ||
| (0.050) | (5.819) | |||
| N | 39 | 39 | 32 | 32 |
| R2 | 0.078 | 0.263 | 0.455 | 0.656 |
| GPS_trust | ||||
| GPS_trust | -0.016 | -0.018 | 3.309 | 4.477 |
| (0.050) | (0.039) | (7.499) | (3.642) | |
| response | 0.125*** | 31.574*** | ||
| (0.041) | (3.851) | |||
| N | 36 | 36 | 29 | 29 |
| R2 | 0.003 | 0.213 | 0.007 | 0.628 |
| general_morality | ||||
| general_morality | 0.080** | -0.012 | 25.366*** | 9.868** |
| (0.036) | (0.047) | (4.857) | (4.125) | |
| response | 0.150*** | 25.321*** | ||
| (0.055) | (3.718) | |||
| N | 38 | 38 | 32 | 32 |
| R2 | 0.083 | 0.268 | 0.412 | 0.669 |
| MFQ_genmorality | ||||
| MFQ_genmorality | 0.118*** | 0.069* | 13.271* | 2.465 |
| (0.041) | (0.041) | (7.993) | (3.818) | |
| response | 0.107*** | 31.432*** | ||
| (0.032) | (3.616) | |||
| N | 35 | 35 | 31 | 31 |
| R2 | 0.219 | 0.360 | 0.110 | 0.656 |
| civic_cooperation | ||||
| civic_cooperation | 0.089* | 0.038 | 19.470*** | 2.093 |
| (0.045) | (0.059) | (5.560) | (4.312) | |
| response | 0.130** | 30.727*** | ||
| (0.054) | (4.505) | |||
| N | 37 | 37 | 31 | 31 |
| R2 | 0.100 | 0.283 | 0.235 | 0.633 |
| GPS_posrecip | ||||
| GPS_posrecip | 0.009 | 0.003 | 3.255 | 1.411 |
| (0.040) | (0.044) | (7.924) | (4.321) | |
| response | 0.125*** | 31.325*** | ||
| (0.042) | (3.888) | |||
| N | 36 | 36 | 29 | 29 |
| R2 | 0.001 | 0.209 | 0.007 | 0.617 |
| GPS_altruism | ||||
| GPS_altruism | -0.033 | -0.006 | 0.699 | 3.908 |
| (0.037) | (0.037) | (8.108) | (5.177) | |
| response | 0.124*** | 31.803*** | ||
| (0.040) | (3.547) | |||
| N | 36 | 36 | 29 | 29 |
| R2 | 0.015 | 0.209 | 0.000 | 0.625 |
| stranger1 | ||||
| stranger1 | 0.091** | 0.001 | 28.408*** | 12.071*** |
| (0.036) | (0.062) | (4.186) | (4.536) | |
| response | 0.140** | 23.480*** | ||
| (0.061) | (4.237) | |||
| N | 39 | 39 | 32 | 32 |
| R2 | 0.110 | 0.263 | 0.508 | 0.687 |