Learning an optimal policy ========================== Different types of policy rules ------------------------------- The Modified Causal Forest refines treatment assignment mechanisms by defining the objective of an assignment rule. Its :py:class:`~optpolicy_functions.OptimalPolicy` class offers two different methods to compute assignment algorithms, namely the policy tree and the best-scores methods. The policy tree method is based on the work of `Zhou, Athey, and Wager (2022) `_ , modifying their approach with respect to policy score calculation, definition of constraints, and handling of multi-valued features. In contrast, the best-score method simply assigns units to the treatment with the highest individual potential outcome. The **mcf** algorithm compares the two methods with random treatment allocations to demonstrate the increase in reward. The following sections demonstrate how to implement these methods for policy learning. The :doc:`Algorithm reference <../algorithm_reference/optimal-policy_algorithm>` provides more details on the computational algorithms. Policy Trees ------------ Generating data ~~~~~~~~~~~~~~~ The code below creates artificial example data for training and prediction. .. code-block:: python import os from mcf.example_data_functions import example_data from mcf.optpolicy_functions import OptimalPolicy from mcf.reporting import McfOptPolReport # Creating data. training_df, prediction_df, name_dict = example_data( obs_y_d_x_iate=1000, obs_x_iate=1000, no_treatments=3) Estimating an optimal policy tree ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Next, we initialize an instance by calling the :py:class:`~optpolicy_functions.OptimalPolicy` class to estimate an optimal policy tree of depth two. .. code-block:: python # Initializing a class instance. myoptp = OptimalPolicy(gen_method='policy tree', var_polscore_name=('y_pot0', 'y_pot1', 'y_pot2'), var_x_name_ord=('x_cont0', 'x_ord0'), pt_depth_tree_1=2, pt_depth_tree_2=0, gen_outpath=os.getcwd() + '/out') Estimating sequentially optimal policy trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alternatively, we can create an instance to estimate sequentially optimal policy trees. This may lead to a reduction in runtime by, for instance, creating two sequentially optimal trees of depth 2+1 instead of a single optimal tree of depth 3. .. code-block:: python # Initializing a class instance. myoptp = OptimalPolicy(gen_method='policy tree', var_polscore_name=('y_pot0', 'y_pot1', 'y_pot2'), var_x_name_ord=('x_cont0', 'x_ord0'), pt_depth_tree_1=2, pt_depth_tree_2=1, gen_outpath=os.getcwd() + '/out') Estimating a constrained optimal policy tree ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following code block creates a constrained optimal policy tree, considering a constraint with respect to the treatment shares. .. code-block:: python # Initializing a class instance. myoptp = OptimalPolicy(gen_method='policy tree', var_polscore_name=('y_pot0', 'y_pot1', 'y_pot2'), var_x_name_ord=('x_cont0', 'x_ord0'), pt_depth_tree_1=2, pt_depth_tree_2=0, other_max_shares=(0.2, 0.8, 0), gen_outpath=os.getcwd() + '/out') Solve, allocate, and evaluate methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After initializing a class instance, we use it to solve for an optimal allocation rule, to allocate units to treatment states, and to evaluate the allocations with potential outcome data. .. code-block:: python # Solve, allocate, and evaluate methods. alloc_train_df, _, _ = myoptp.solve(training_df, data_title='training') results_eva_train, _ = myoptp.evaluate(alloc_train_df, training_df, data_title='training') alloc_pred_df, _ = myoptp.allocate(prediction_df, data_title='prediction') results_eva_pred, _ = myoptp.evaluate(alloc_pred_df, prediction_df, data_title='prediction') Reporting ~~~~~~~~~ Finally, the code creates a PDF report. Please note that the program saves by default information like summary statistics and leaf information for the policy tree in a folder in the current working directory. .. code-block:: python # Creating a PDF report. my_report = McfOptPolReport( optpol=myoptp, outputfile='Report_OptP_' + 'policy tree') my_report.report() Best Policy Scores ------------------ The following code demonstrates how to obtain a policy rule based on the best-score method. Estimating a policy rule using the best-score method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import os from mcf.example_data_functions import example_data from mcf.optpolicy_functions import OptimalPolicy from mcf.reporting import McfOptPolReport .. code-block:: python # Creating data. training_df, prediction_df, name_dict = example_data( obs_y_d_x_iate=1000, obs_x_iate=1000, no_treatments=3) .. code-block:: python # Initializing a class instance. myoptp = OptimalPolicy(gen_method='best_policy_score', var_polscore_name=('y_pot0', 'y_pot1', 'y_pot2'), var_x_name_ord=('x_cont0', 'x_ord0'), gen_outpath=os.getcwd() + '/out') .. code-block:: python # Solve, allocate, and evaluate methods. alloc_train_df, _, _ = myoptp.solve(training_df, data_title='training') results_eva_train, _ = myoptp.evaluate(alloc_train_df, training_df, data_title='training') alloc_pred_df, _ = myoptp.allocate(prediction_df, data_title='prediction') results_eva_pred, _ = myoptp.evaluate(alloc_pred_df, prediction_df, data_title='prediction') .. code-block:: python # Creating a PDF report. my_report = McfOptPolReport( optpol=myoptp, outputfile='Report_OptP_' + 'best_policy_score') my_report.report()