Modified Causal Forests#

Welcome to the documentation of mcf, the Python package implementing the Modified Causal Forest introduced by Lechner (2018). This package allows you to estimate heterogeneous treatment effects for binary and multiple treatments from experimental or observational data. Additionally, mcf offers the capability to learn optimal policy allocations.

If you’re new to the mcf package, we recommend following these steps:

Installation Guide: Learn how to install mcf on your system.
Usage Example: Explore a simple example to quickly understand how to apply mcf to your data.
Getting started: Dive into a more detailed example to get a better feel for working with mcf.

For those seeking further information:

The User Guide offers explanations on additional features of the mcf package and provides several example scripts.
Check out the API for details on interacting with the mcf package.
The Algorithm Reference provides a technical description of the methods used in the package.

Installation Guide#

The current version of mcf is compatible with Python 3.11. You can install mcf from PyPI using:

pip install mcf

For a smoother experience and to avoid conflicts with other packages, we strongly recommend using a virtual environment based on conda. First install conda as described here. Next follow the steps below in your Anaconda Prompt (Windows) or terminal (macOS and Linux):

Set up and activate a conda environment named mcf-env:

conda create -n mcf-env

conda activate mcf-env

Install Python 3.11:

conda install Python="3.11"

Finally, install mcf in this environment using pip:

pip install mcf

Note: It is recommended to prioritize conda install for package installations before using pip install. On a Windows machine, if you plan to use Spyder as your IDE, make sure to execute conda install spyder before proceeding with pip install mcf to reduce the risk of errors during installation.

Usage Example#

To demonstrate how to use mcf, let’s simulate some data and apply the Modified Causal Forest:

import numpy as np
import pandas as pd

from mcf import ModifiedCausalForest
from mcf import OptimalPolicy
from mcf import McfOptPolReport

def simulate_data(n: int, seed: int) -> pd.DataFrame:
    """
    Simulate data with a binary treatment 'd', outcome 'y', unordered control
    variable 'female' and two ordered controls 'x1', 'x2'.

    Parameters:
    - n (int): Number of observations in the simulated data.
    - seed (int): Seed for the random number generator.

    Returns:
    pd.DataFrame: Simulated data in a Pandas DataFrame.

    """
    rng = np.random.default_rng(seed)

    d = rng.integers(low=0, high=1, size=n, endpoint=True)
    female = rng.integers(low=0, high=1, size=n, endpoint=True)
    x_ordered = rng.normal(size=(n, 2))
    y = (x_ordered[:, 0] +
        x_ordered[:, 1] * (d == 1) +
        0.5 * female +
        rng.normal(size=n))

    data = {"y": y, "d": d, "female": female}

    for i in range(x_ordered.shape[1]):
        data["x" + str(i + 1)] = x_ordered[:, i]

    return pd.DataFrame(data)

df = simulate_data(n=100, seed=1234)

# Create an instance of class ModifiedCausalForest:
my_mcf = ModifiedCausalForest(
    var_y_name="y",
    var_d_name="d",
    var_x_name_ord=["x1", "x2"],
    var_x_name_unord=["female"],
    _int_show_plots=False
)

# Train the Modified Causal Forest on the simulated data and predict treatment
# effects in-sample:
my_mcf.train(df)
results = my_mcf.predict(df)

# The 'results' dictionary contains the estimated treatment effects:
print(results.keys())

print(results["ate"])  # Average Treatment Effect (ATE)
print(results["ate_se"])  # Standard Error (SE) of the ATE

# DataFrame with Individualized Treatment Effects (IATE) and potential outcomes
print(results["iate_data_df"])


# Create an instance of class OptimalPolicy:
my_optimal_policy = OptimalPolicy(
    var_d_name="d",
    var_polscore_name=["Y_LC0_un_lc_pot", "Y_LC1_un_lc_pot"],
    var_x_name_ord=["x1", "x2"],
    var_x_name_unord=["female"]
    )

# Learn an optimal policy rule using the predicted potential outcomes
alloc_df = my_optimal_policy.solve(results["iate_data_df"])

# Evaluate the optimal policy rule on the simulated data:
my_optimal_policy.evaluate(alloc_df, results["iate_data_df"])

# Compare the optimal policy rule to the observed and a random allocation:
print(alloc_df)

# Produce a PDF-report that summarises the most important results
my_report = McfOptPolReport(mcf=my_mcf, optpol=my_optimal_policy,
                            outputfile='mcf_report')
my_report.report()

For a more detailed example, see the Getting started section.

Source code and contributing#

The Python source code is available on GitHub. If you have questions, want to report bugs, or have feature requests, please use the issue tracker.

References#

Bodory H, Busshoff H, Lechner M. High Resolution Treatment Effects Estimation: Uncovering Effect Heterogeneities with the Modified Causal Forest. Entropy. 2022; 24(8):1039. Read Paper
Lechner M. Modified Causal Forests for Estimating Heterogeneous Causal Effects. 2018. Read Paper
Lechner M, Mareckova J. Modified Causal Forest. 2022. Read Paper

License#

mcf is distributed under the MIT License.