Data from several sources were joined together into a merged dataset. We use 2016 year to build the model. Main outcome is crude death rate for each state, candidate predictors are law strength, unemployment rate, sleep time, smoking, self-reported health, overweight, population, poverty, mental health, leisure physical activity, drinking, disability and diabetes for each state. Two models were generated using criteria-based model selection and stepwise regression. Models were compared by BIC, adjusted R square, Cp, etc. Models were examined by distribution of residuals (QQ plot, residuals vs fitted value), outliers. We also used cross-validation to compare the two models.

Comments:
Strong correlations can be seen among several pairs of variables. To minimize multi-collinearity, model should be selected carefully.
Most of the variates follow approximately normal distributions.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|
| diabetes | * | |||||||
| disability | * | * | * | * | * | * | ||
| drinking | * | * | * | |||||
| law_strength_2016_only | * | * | * | * | * | * | * | |
| leisure_physical_activities | * | * | * | * | ||||
| mental_health | ||||||||
| overweight | ||||||||
| population | ||||||||
| poverty | ||||||||
| self_reported_health | * | * | * | |||||
| sleep | ||||||||
| smoking | * | * | * | * | * | * | ||
| unemployment_rate | * | * | * | * | * | * |
Comments:
It seems that “Law Strength”, “Smoking”, “Disability”, and “Unemployment Rate” are all strong predictors. “Leisure & Physical Activities” appears in larger models.

Comments:
The model with four predictors seems to have the highest adjusted R-squared value, lowest BIC and Cp values, and has a high regression mean sum of squares and low residual mean sum of squares. The six-predictor model has similar model statistics as the four-predictor model, with the exception of the higher BIC value.
Model from stepwise selection:
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 10.4708267 | 4.6327477 | 2.260176 | 0.0289302 |
| drinking | -0.6333038 | 0.3807451 | -1.663327 | 0.1035197 |
| law_strength_2016_only | -0.1759851 | 0.0299607 | -5.873866 | 0.0000006 |
| leisure_physical_activities | -0.3321040 | 0.1482859 | -2.239619 | 0.0303397 |
| self_reported_health | 0.3079383 | 0.2168874 | 1.419807 | 0.1628735 |
| smoking | 0.4285323 | 0.1605796 | 2.668659 | 0.0106983 |
| unemployment_rate | 1.3501531 | 0.4602038 | 2.933815 | 0.0053517 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | |
|---|---|---|---|---|---|---|
| value | 0.7730692 | 0.7414044 | 2.475718 | 24.41418 | 0 | 7 |
Comments:
Drinking, law strength, and leisure all seem to be negatively associated with firearm crude death rate. Self-rated health, smoking, and unemployment rate are all positively associated with the firearm crude death rate.
Criteria based model:
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -3.8337969 | 5.0898242 | -0.7532278 | 0.4552359 |
| disability | 0.3050451 | 0.1719631 | 1.7738989 | 0.0828465 |
| law_strength_2016_only | -0.1683490 | 0.0320384 | -5.2545962 | 0.0000039 |
| smoking | 0.2598411 | 0.1371695 | 1.8943066 | 0.0646220 |
| unemployment_rate | 1.2927511 | 0.4552585 | 2.8395980 | 0.0067578 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | |
|---|---|---|---|---|---|---|
| value | 0.7565902 | 0.7349538 | 2.506406 | 34.96835 | 0 | 5 |
Comments:
Disability, smoking, and unemployment rate are all positively associated with the firearm crude death rate. Law strength is negatively associated with the firearm crude death rate.
Criteria-based model (4 predictors)

Stepwise model (6 predictors)

Comments: For both models, no severe outliers were observed. Residuals seem to follow a normal distribution. Residuals don’t hold constant over fitted values, but are still clustered around zero and not severely biased.

Comments:
Our four-predictor model seems to have a slightly lower root-mean-squared error (RMSE) value than the six-predictor model. They are both better than the trivial model of y ~ 1. For parsimony, we would ultimately choose the four-predictor model.
Four-predictor model (disability, smoking, unemployment rate, law strength) seems to perform a little better than six-predictor model (drinking, law strength, leisure, self-rated health, smoking and unemployment rate). Model diagnosis shows that the residuals of both models agree with the underlying assumption. Cross validation shows a little higher rmse for six-predictor model than four-predictor model. Disability, smoking, unemployment rate positively associate with the firearm crude death rate. Law strength negatively associate with crude death rate.