Skip to content

Calibration

What Is It

Calibration (often implemented via raking) is used to adjust weights in a sample to make them representative of some other population of interest. That can be for a survey that's meant to represent the population of a place or a treatment sample that should be representative of some larger group or control group.

The basic idea is to define some set of "moments" that you want your sample to match (such as population controls, share of individuals with certain education levels, share with income in certain bins, etc.) and then reweight your sample so that it matches each of those moments. See the Accelerated Entropy Balancing repository for more details on how it works.

Why Use It

Calibration can help:

  • Address frame bias - A group is overrepresented in the frame, such as if we accidentally oversampled a group with a certain characteristic (e.g., high income households)
  • Address nonresponse bias - Different groups respond at different rates and the results would be biased without accounting for it, as in this example (or any critique of polling, maybe)
  • Increase precision - By using auxiliary information to reduce variance, see Little and Vartivarian (2005)
  • For causal estimates when comparing treatment and control groups - Reweight a group (treatment) to match the characteristics of a different group (control), see Hainmueller (2012)

Implementation

This package uses Carl Sanders's Accelerated Entropy Balancing package to implement calibration via entropy balancing. This implementation is both faster and more robust (converges reliably or can produce "best possible" weights when exact convergence isn't achievable) than other available tools (at least in my anecdotal experience).

Key advantages:

  1. Handles large datasets efficiently
  2. Robust convergence even with challenging constraints.
  3. Supports bounded weights for practical applications where convergence isn't possible (i.e. slightly conflicting constraints)

Advantages 2. and 3. can be incredibly important in practice. Many surveys weight to state x race x age x gender cells (or something like that). Let's say we have 3 race/ethnicity groups and 5 age bins, that's 1,530 moment constraints (51 (50 states + DC) x 3 x 5 x 2). If you start adding in other things (race x income bins, or race x county, or income x county), the number of moments can grow even larger.

Plus, if you're doing replicate weights, you have to repeat this many times (for example, 160 times in the CPS ASEC). We've found other tools to be impractically slow at scale and/or to have convergence issues (i.e. it just doesn't work and you don't get a weight at the end).

API

See the full Calibration API documentation

Example/Tutorial

from pathlib import Path
from survey_kit.utilities.random import RandomData
from survey_kit.utilities.formula_builder import FormulaBuilder
from survey_kit.calibration.moment import Moment
from survey_kit.calibration.calibration import Calibration
from survey_kit.utilities.dataframe import summary
import narwhals as nw
from survey_kit import logger

# %%
logger.info("Generating data for weighting")
n_rows = 100_000
df_population = (
    RandomData(n_rows=n_rows, seed=12332151)
    .index("index")
    .integer("v_1", 1, 10)
    .np_distribution("v_f_continuous_0", "normal", loc=10, scale=2)
    .np_distribution("v_f_continuous_1", "normal", loc=10, scale=2)
    .np_distribution("v_f_continuous_2", "normal", loc=10, scale=2)
    .float("v_extra", -1, 2)
    .np_distribution("weight_0", "normal", loc=10, scale=1)
    .np_distribution("weight_1", "normal", loc=10, scale=1)
    .integer("year", 2016, 2021)
    .integer("month", 1, 12)
    .to_df()
    .lazy()
)

df_treatment = (
    RandomData(n_rows=n_rows, seed=894654)
    .index("index")
    .integer("v_1", 1, 10)
    #   Intentionally set the loc/scale as different than above
    .np_distribution("v_f_continuous_0", "normal", loc=11, scale=4)
    .np_distribution("v_f_continuous_1", "normal", loc=11, scale=4)
    .np_distribution("v_f_continuous_2", "normal", loc=11, scale=4)
    .float("v_extra", -1, 2)
    .np_distribution("weight_0", "normal", loc=10, scale=1)
    .np_distribution("weight_1", "normal", loc=10, scale=1)
    .integer("year", 2016, 2021)
    .integer("month", 1, 12)
    .to_df()
    .lazy()
)

# print(df.describe())

# %%
logger.info("Weighting 'function'")
f = FormulaBuilder(df=df_population, constant=False)
f.continuous(columns=["v_1", "v_f_continuous_*", "v_f_p2_*"])
#   f.simple_interaction(columns=["v_1","v_f_continuous_0"])

logger.info("Define the target moments that the weighting will match")
logger.info("   This can be a dataset or a single row of pop controls")
m = Moment(
    df=df_population,
    formula=f.formula,
    weight="weight_0",
    index="index",
    #    by=["year"],
    rescale=True,
)

logger.info("You can save/reload moments if you want")
# m.save("/my/path/moment")
# m_loaded = Moment.load("/my/path/moment")

# %%
#   Calibrate the data in df_treatment to the moment above
c = Calibration(
    df=df_treatment,
    moments=m,
    weight="weight_1",
    final_weight="weight_final"
)

c.run(
    #   Drop a moment if there are too few observations
    min_obs=5,
    # If it fails to converge, set bounds on the weights 
    #   final weights = (base*ratio) where the bounds are on the ratio
    #   for "best possible" weights   
    bounds=(0.001, 1000)
)

#   Merge the final weights back on the treatment data
df_treatment = c.get_final_weights(df_treatment)

# %%
logger.info("'Population' estimates")
_ = summary(df_population,
        weight="weight_0")

# %%
logger.info("\n\n'Treatment', original weights")
_ = summary(df_treatment,
        weight="weight_1")

# %%
logger.info("\n\n'Treatment', calibrated")
_ = summary(df_treatment,
        weight="weight_final")