Title: | Causal Inference with Super Learner and Deep Neural Networks |
---|---|
Description: | Functions to estimate Conditional Average Treatment Effects (CATE) and Population Average Treatment Effects on the Treated (PATT) from experimental or observational data using the Super Learner (SL) ensemble method and Deep neural networks. The package first provides functions to implement meta-learners such as the Single-learner (S-learner) and Two-learner (T-learner) described in Künzel et al. (2019) <doi:10.1073/pnas.1804597116> for estimating the CATE. The S- and T-learner are each estimated using the SL ensemble method and deep neural networks. It then provides functions to implement the Ottoboni and Poulos (2020) <doi:10.1515/jci-2018-0035> PATT-C estimator to obtain the PATT from experimental data with noncompliance by using the SL ensemble method and deep neural networks. |
Authors: | Nguyen K. Huynh [aut, cre] |
Maintainer: | Nguyen K. Huynh <[email protected]> |
License: | GPL-3 |
Version: | 0.0.104 |
Built: | 2025-02-25 05:48:29 UTC |
Source: | https://github.com/hknd23/deeplearningcausal |
Train model using group exposed to treatment with compliance as binary outcome variable and covariates.
complier_mod( exp.data, complier.formula, treat.var, ID = NULL, SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm") )
complier_mod( exp.data, complier.formula, treat.var, ID = NULL, SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm") )
exp.data |
list object of experimental data. |
complier.formula |
formula to fit compliance model (c ~ x) using complier variable and covariates |
treat.var |
string specifying the binary treatment variable |
ID |
string for name of indentifier variable. |
SL.learners |
vector of strings for ML classifier algorithms. If left
|
model object of trained model.
Predict Compliance from control group in experimental data
complier_predict(complier.mod, exp.data, treat.var, compl.var)
complier_predict(complier.mod, exp.data, treat.var, compl.var)
complier.mod |
output from trained ensemble superlearner model |
exp.data |
|
treat.var |
string specifying the binary treatment variable |
compl.var |
string specifying binary complier variable |
data.frame
object with true compliers, predicted compliers in the
control group, and all compliers (actual + predicted).
Shortened version of survey response data that incorporates a vignette survey experiment. The vignette describes an international crisis between country A and B. After reading this vignette, respondents are randomly assigned to the control group or to one of two treatments: policy prescription to said crisis by strong (populist) leader and centrist (non-populist) leader. The respondents are then asked whether they are willing to support the policy decision to fight a war against country A, which is the dependent variable.
data(exp_data)
data(exp_data)
exp_data
A data frame with 257 rows and 12 columns:
Gender.
Age of participant.
Monthly household income.
Religious denomination
Importance of religion in life.
Educational level of participant.
Political ideology of participant.
Employment status of participant.
Marital status of participant.
Concern about job loss.
Binary treatment measure of leader type.
Binary outcome measure for willingness to fight war.
#' ...
Yadav and Mukherjee (2024)
Extended experiment data with 514 observations
data(exp_data_full)
data(exp_data_full)
exp_data_full
A data frame with 514 rows and 12 columns:
Gender.
Age of participant.
Monthly household income.
Religious denomination
Importance of religion in life.
Educational level of participant.
Political ideology of participant.
Employment status of participant.
Marital status of participant.
Concern about job loss.
Binary treatment measure of leader type.
Binary outcome measure for willingness to fight war.
#' ...
Yadav and Mukherjee (2024)
metalearner_deepneural
implements the S-learner and T-learner for estimating
CATE using Deep Neural Networks. The Resilient back propagation (Rprop)
algorithm is used for training neural networks.
metalearner_deepneural( data, cov.formula, treat.var, meta.learner.type, stepmax = 1e+05, nfolds = 5, algorithm = "rprop+", hidden.layer = c(4, 2), linear.output = FALSE, binary.outcome = FALSE )
metalearner_deepneural( data, cov.formula, treat.var, meta.learner.type, stepmax = 1e+05, nfolds = 5, algorithm = "rprop+", hidden.layer = c(4, 2), linear.output = FALSE, binary.outcome = FALSE )
data |
|
cov.formula |
formula description of the model y ~ x(list of covariates). |
treat.var |
string for the name of treatment variable. |
meta.learner.type |
string specifying is the S-learner and
|
stepmax |
maximum number of steps for training model. |
nfolds |
number of folds for cross-validation. Currently supports up to 5 folds. |
algorithm |
a string for the algorithm for the neural network.
Default set to |
vector of integers specifying layers and number of neurons. |
|
linear.output |
logical specifying regression (TRUE) or classification (FALSE) model. |
binary.outcome |
logical specifying predicted outcome variable will take binary values or proportions. |
metalearner_deepneural
of predicted outcome values and CATEs estimated by the meta
learners for each observation.
# load dataset data(exp_data) # estimate CATEs with S Learner set.seed(123456) slearner_nn <- metalearner_deepneural(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "S.Learner", stepmax = 2e+9, nfolds = 5, algorithm = "rprop+", hidden.layer = c(1), linear.output = FALSE, binary.outcome = FALSE) print(slearner_nn) # load dataset set.seed(123456) # estimate CATEs with T Learner tlearner_nn <- metalearner_deepneural(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "T.Learner", stepmax = 1e+9, nfolds = 5, algorithm = "rprop+", hidden.layer = c(2,1), linear.output = FALSE, binary.outcome = FALSE) print(tlearner_nn)
# load dataset data(exp_data) # estimate CATEs with S Learner set.seed(123456) slearner_nn <- metalearner_deepneural(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "S.Learner", stepmax = 2e+9, nfolds = 5, algorithm = "rprop+", hidden.layer = c(1), linear.output = FALSE, binary.outcome = FALSE) print(slearner_nn) # load dataset set.seed(123456) # estimate CATEs with T Learner tlearner_nn <- metalearner_deepneural(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "T.Learner", stepmax = 1e+9, nfolds = 5, algorithm = "rprop+", hidden.layer = c(2,1), linear.output = FALSE, binary.outcome = FALSE) print(tlearner_nn)
metalearner_ensemble
implements the S-learner and T-learner for
estimating CATE using the super learner ensemble method. The super learner in
this case includes the following machine learning algorithms:
extreme gradient boosting, glmnet (elastic net regression), random forest and
neural nets.
metalearner_ensemble( data, cov.formula, treat.var, meta.learner.type, SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet"), nfolds = 5, binary.outcome = FALSE )
metalearner_ensemble( data, cov.formula, treat.var, meta.learner.type, SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet"), nfolds = 5, binary.outcome = FALSE )
data |
|
cov.formula |
formula description of the model y ~ x(list of covariates) |
treat.var |
string for the name of treatment variable. |
meta.learner.type |
string specifying is the S-learner and
|
SL.learners |
vector for super learner ensemble that includes extreme gradient boosting, glmnet, random forest, and neural nets. |
nfolds |
number of folds for cross-validation. Currently supports up to 5 folds. |
binary.outcome |
logical specifying predicted outcome variable will take binary values or proportions. |
metalearner_ensemble
of predicted outcome values and CATEs estimated by the meta
learners for each observation.
# load dataset data(exp_data) #load SuperLearner package library(SuperLearner) # estimate CATEs with S Learner set.seed(123456) slearner <- metalearner_ensemble(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "S.Learner", SL.learners = c("SL.glm"), nfolds = 5, binary.outcome = FALSE) print(slearner) # estimate CATEs with T Learner set.seed(123456) tlearner <- metalearner_ensemble(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "T.Learner", SL.learners = c("SL.xgboost","SL.ranger", "SL.nnet"), nfolds = 5, binary.outcome = FALSE) print(tlearner)
# load dataset data(exp_data) #load SuperLearner package library(SuperLearner) # estimate CATEs with S Learner set.seed(123456) slearner <- metalearner_ensemble(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "S.Learner", SL.learners = c("SL.glm"), nfolds = 5, binary.outcome = FALSE) print(slearner) # estimate CATEs with T Learner set.seed(123456) tlearner <- metalearner_ensemble(cov.formula = support_war ~ age + income + employed + job_loss, data = exp_data, treat.var = "strong_leader", meta.learner.type = "T.Learner", SL.learners = c("SL.xgboost","SL.ranger", "SL.nnet"), nfolds = 5, binary.outcome = FALSE) print(tlearner)
Train model using group exposed to treatment with compliance as binary outcome variable and covariates.
neuralnet_complier_mod( complier.formula, exp.data, treat.var, algorithm = "rprop+", hidden.layer = c(4, 2), ID = NULL, stepmax = 1e+08 )
neuralnet_complier_mod( complier.formula, exp.data, treat.var, algorithm = "rprop+", hidden.layer = c(4, 2), ID = NULL, stepmax = 1e+08 )
complier.formula |
formula for complier variable as outcome and covariates (c ~ x) |
exp.data |
|
treat.var |
string for treatment variable. |
algorithm |
string for algorithm for training neural networks.
Default set to the Resilient back propagation with weight backtracking
(rprop+). Other algorithms include backprop', rprop-', 'sag', or 'slr'
(see |
vector for specifying hidden layers and number of neurons. |
|
ID |
string for identifier variable |
stepmax |
maximum number of steps. |
trained complier model object
Create counterfactual datasets in the population for compliers and
noncompliers. Then predict potential outcomes using trained model from
neuralnet_response_model
.
neuralnet_pattc_counterfactuals( pop.data, neuralnet.response.mod, ID = NULL, cluster = NULL, binary.outcome = FALSE )
neuralnet_pattc_counterfactuals( pop.data, neuralnet.response.mod, ID = NULL, cluster = NULL, binary.outcome = FALSE )
pop.data |
population data. |
neuralnet.response.mod |
trained model from.
|
ID |
string for identifier variable. |
cluster |
string for clustering variable (currently unused). |
binary.outcome |
logical specifying predicted outcome variable will take binary values or proportions. |
data.frame
of predicted outcomes of response variable from
counterfactuals.
Predicting Compliance from control group experimental data
neuralnet_predict(neuralnet.complier.mod, exp.data, treat.var, compl.var)
neuralnet_predict(neuralnet.complier.mod, exp.data, treat.var, compl.var)
neuralnet.complier.mod |
results from |
exp.data |
|
treat.var |
string for treatment variable |
compl.var |
string for compliance variable |
data.frame
object with true compliers, predicted compliers in the
control group, and all compliers (actual + predicted).
Model Responses from all compliers (actual + predicted) in experimental data using neural network.
neuralnet_response_model( response.formula, exp.data, neuralnet.compliers, compl.var, algorithm = "rprop+", hidden.layer = c(4, 2), stepmax = 1e+08 )
neuralnet_response_model( response.formula, exp.data, neuralnet.compliers, compl.var, algorithm = "rprop+", hidden.layer = c(4, 2), stepmax = 1e+08 )
response.formula |
formula for response variable and covariates (y ~ x) |
exp.data |
|
neuralnet.compliers |
|
compl.var |
string of compliance variable |
algorithm |
neural network algorithm, default set to |
vector specifying hidden layers and number of neurons. |
|
stepmax |
maximum number of steps for training model. |
trained response model object
Create counterfactual datasets in the population for compliers and noncompliers. Then predict potential outcomes from counterfactuals.
pattc_counterfactuals( pop.data, response.mod, ID = NULL, cluster = NULL, binary.outcome = FALSE )
pattc_counterfactuals( pop.data, response.mod, ID = NULL, cluster = NULL, binary.outcome = FALSE )
pop.data |
population dataset |
response.mod |
trained model from |
ID |
string fir identifier variable |
cluster |
string for clustering variable |
binary.outcome |
logical specifying whether predicted outcomes are proportions or binary (0-1). |
data.frame
object of predicted outcomes of counterfactual groups.
estimates the Population Average Treatment Effect of the Treated from experimental data with noncompliers using Deep Neural Networks.
pattc_deepneural( response.formula, exp.data, pop.data, treat.var, compl.var, compl.algorithm = "rprop+", response.algorithm = "rprop+", compl.hidden.layer = c(4, 2), response.hidden.layer = c(4, 2), compl.stepmax = 1e+08, response.stepmax = 1e+08, ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = FALSE, nboot = 1000 )
pattc_deepneural( response.formula, exp.data, pop.data, treat.var, compl.var, compl.algorithm = "rprop+", response.algorithm = "rprop+", compl.hidden.layer = c(4, 2), response.hidden.layer = c(4, 2), compl.stepmax = 1e+08, response.stepmax = 1e+08, ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = FALSE, nboot = 1000 )
response.formula |
formula of response variable as outcome and covariates (y ~ x) |
exp.data |
|
pop.data |
|
treat.var |
string for treatment variable. |
compl.var |
string for compliance variable |
compl.algorithm |
string for algorithim to train neural network for
compliance model. Default set to |
response.algorithm |
string for algorithim to train neural network for
response model. Default set to |
vector for specifying hidden layers and number of neurons in complier model. |
|
vector for specifying hidden layers and number of neurons in response model. |
|
compl.stepmax |
maximum number of steps for complier model |
response.stepmax |
maximum number of steps for response model |
ID |
string for identifier variable |
cluster |
string for cluster variable. |
binary.outcome |
logical specifying predicted outcome variable will take binary values or proportions. |
bootstrap |
logical for bootstrapped PATT-C. |
nboot |
number of bootstrapped samples |
pattc_deepneural
class object of results of t test as PATTC estimate.
# load datasets data(exp_data) #experimental data data(pop_data) #population data # specify models and estimate PATTC set.seed(123456) pattc_neural <- pattc_deepneural(response.formula = support_war ~ age + female + income + education + employed + married + hindu + job_loss, exp.data = exp_data, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.algorithm = "rprop+", response.algorithm = "rprop+", compl.hidden.layer = c(4,2), response.hidden.layer = c(4,2), compl.stepmax = 1e+09, response.stepmax = 1e+09, ID = NULL, cluster = NULL, binary.outcome = FALSE) print(pattc_neural) pattc_neural_boot <- pattc_deepneural(response.formula = support_war ~ age + female + income + education + employed + married + hindu + job_loss, exp.data = exp_data, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.algorithm = "rprop+", response.algorithm = "rprop+", compl.hidden.layer = c(4,2), response.hidden.layer = c(4,2), compl.stepmax = 1e+09, response.stepmax = 1e+09, ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = TRUE, nboot = 2000) print(pattc_neural_boot)
# load datasets data(exp_data) #experimental data data(pop_data) #population data # specify models and estimate PATTC set.seed(123456) pattc_neural <- pattc_deepneural(response.formula = support_war ~ age + female + income + education + employed + married + hindu + job_loss, exp.data = exp_data, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.algorithm = "rprop+", response.algorithm = "rprop+", compl.hidden.layer = c(4,2), response.hidden.layer = c(4,2), compl.stepmax = 1e+09, response.stepmax = 1e+09, ID = NULL, cluster = NULL, binary.outcome = FALSE) print(pattc_neural) pattc_neural_boot <- pattc_deepneural(response.formula = support_war ~ age + female + income + education + employed + married + hindu + job_loss, exp.data = exp_data, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.algorithm = "rprop+", response.algorithm = "rprop+", compl.hidden.layer = c(4,2), response.hidden.layer = c(4,2), compl.stepmax = 1e+09, response.stepmax = 1e+09, ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = TRUE, nboot = 2000) print(pattc_neural_boot)
pattc_ensemble
estimates the Population Average Treatment Effect
of the Treated from experimental data with noncompliers
using the super learner ensemble that includes extreme gradient boosting,
glmnet (elastic net regression), random forest and neural nets.
pattc_ensemble( response.formula, exp.data, pop.data, treat.var, compl.var, compl.SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm"), response.SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm"), ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = FALSE, nboot = 1000 )
pattc_ensemble( response.formula, exp.data, pop.data, treat.var, compl.var, compl.SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm"), response.SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm"), ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = FALSE, nboot = 1000 )
response.formula |
formula for the effects of covariates on outcome variable (y ~ x). |
exp.data |
|
pop.data |
|
treat.var |
string for binary treatment variable. |
compl.var |
string for binary compliance variable. |
compl.SL.learners |
vector of names of ML algorithms used for compliance model. |
response.SL.learners |
vector of names of ML algorithms used for response model. |
ID |
string for name of identifier. (currently not used) |
cluster |
string for name of cluster variable. (currently not used) |
binary.outcome |
logical specifying predicted outcome variable will take binary values or proportions. |
bootstrap |
logical for bootstrapped PATT-C. |
nboot |
number of bootstrapped samples. Only used with
|
pattc_ensemble
object of results of t test as PATTC estimate.
# load datasets data(exp_data_full) # full experimental data data(exp_data) #experimental data data(pop_data) #population data #attach SuperLearner (model will not recognize learner if package is not loaded) library(SuperLearner) set.seed(123456) #specify models and estimate PATTC pattc <- pattc_ensemble(response.formula = support_war ~ age + income + education + employed + job_loss, exp.data = exp_data_full, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.SL.learners = c("SL.glm", "SL.nnet"), response.SL.learners = c("SL.glm", "SL.nnet"), ID = NULL, cluster = NULL, binary.outcome = FALSE) print(pattc) pattc_boot <- pattc_ensemble(response.formula = support_war ~ age + income + education + employed + job_loss, exp.data = exp_data_full, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.SL.learners = c("SL.glm", "SL.nnet"), response.SL.learners = c("SL.glm", "SL.nnet"), ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = TRUE, nboot = 1000) print(pattc_boot)
# load datasets data(exp_data_full) # full experimental data data(exp_data) #experimental data data(pop_data) #population data #attach SuperLearner (model will not recognize learner if package is not loaded) library(SuperLearner) set.seed(123456) #specify models and estimate PATTC pattc <- pattc_ensemble(response.formula = support_war ~ age + income + education + employed + job_loss, exp.data = exp_data_full, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.SL.learners = c("SL.glm", "SL.nnet"), response.SL.learners = c("SL.glm", "SL.nnet"), ID = NULL, cluster = NULL, binary.outcome = FALSE) print(pattc) pattc_boot <- pattc_ensemble(response.formula = support_war ~ age + income + education + employed + job_loss, exp.data = exp_data_full, pop.data = pop_data, treat.var = "strong_leader", compl.var = "compliance", compl.SL.learners = c("SL.glm", "SL.nnet"), response.SL.learners = c("SL.glm", "SL.nnet"), ID = NULL, cluster = NULL, binary.outcome = FALSE, bootstrap = TRUE, nboot = 1000) print(pattc_boot)
World Value Survey (WVS) Data for India in 2022. The variables drawn from the said WVS India data match the covariates from the India survey experiment sample.
data(pop_data)
data(pop_data)
pop_data
A data frame with 846 rows and 13 columns:
Respondent’s Sex.
Age of respondent.
income group of Household.
Religious denomination
Importance of religion in respondent’s life.
Educational level of respondent.
Political ideology of respondent.
Employment status and full-time employee.
Marital status of respondent.
Concern about job loss.
Binary (Yes/No) outcome measure for willingness to fight war.
Binary measure of preference for strong leader.
...
Haerpfer, C., Inglehart, R., Moreno, A., Welzel, C., Kizilova, K., Diez-Medrano J., M. Lagos, P. Norris, E. Ponarin & B. Puranen et al. (eds.). 2020. World Values Survey: Round Seven – Country-Pooled Datafile. Madrid, Spain & Vienna, Austria: JD Systems Institute & WVSA Secretariat. <doi.org/10.14281/18241.1>
Extended World Value Survey (WVS) Data for India in 1995, 2001, 2006, 2012, and 2022.
data(pop_data_full)
data(pop_data_full)
pop_data_full
A data frame with 11,813 rows and 13 columns:
Respondent’s Sex.
Age of respondent.
income group of Household.
Religious denomination
Importance of religion in respondent’s life.
Educational level of respondent.
Political ideology of respondent.
Employment status and full-time employee.
Marital status of respondent.
Concern about job loss.
Binary (Yes/No) outcome measure for willingness to fight war.
Binary measure of preference for strong leader.
...
Haerpfer, C., Inglehart, R., Moreno, A., Welzel, C., Kizilova, K., Diez-Medrano J., M. Lagos, P. Norris, E. Ponarin & B. Puranen et al. (eds.). 2020. World Values Survey: Round Seven – Country-Pooled Datafile. Madrid, Spain & Vienna, Austria: JD Systems Institute & WVSA Secretariat. <doi.org/10.14281/18241.1>
Print method for metalearner_deepneural
## S3 method for class 'metalearner_deepneural' print(x, ...)
## S3 method for class 'metalearner_deepneural' print(x, ...)
x |
|
... |
additional parameter |
list of model results
Print method for metalearner_ensemble
## S3 method for class 'metalearner_ensemble' print(x, ...)
## S3 method for class 'metalearner_ensemble' print(x, ...)
x |
|
... |
additional parameter |
list of model results
Print method for pattc_deepneural
## S3 method for class 'pattc_deepneural' print(x, ...)
## S3 method for class 'pattc_deepneural' print(x, ...)
x |
|
... |
additional parameter |
list of model results
Print method for pattc_ensemble
## S3 method for class 'pattc_ensemble' print(x, ...)
## S3 method for class 'pattc_ensemble' print(x, ...)
x |
|
... |
additional parameter |
list of model results
Train response model (response variable as outcome and covariates) from all compliers (actual + predicted) in experimental data using SL ensemble.
response_model( response.formula, exp.data, compl.var, exp.compliers, family = "binomial", ID = NULL, SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm") )
response_model( response.formula, exp.data, compl.var, exp.compliers, family = "binomial", ID = NULL, SL.learners = c("SL.glmnet", "SL.xgboost", "SL.ranger", "SL.nnet", "SL.glm") )
response.formula |
formula to fit the response model (y ~ x) using binary outcome variable and covariates |
exp.data |
experimental dataset. |
compl.var |
string specifying binary complier variable |
exp.compliers |
|
family |
string for |
ID |
string for identifier variable. |
SL.learners |
vector of names of ML algorithms used for ensemble model. |
trained response model.