Synthetic producer panels and loss attribution • FCIP Calibrated and Synthetic Data Catalogue

Introduction

Two questions recur across FCIP research programs: which peril caused an indemnity, and where within a county is the insured exposure located. Neither is answered directly by public records. Indemnities are reported at the insured-unit level while causes of loss are reported for pools of units; and location is reported no finer than the county, although index-insurance analysis requires placement on RMA’s rainfall-index (RI) grids (Tsiboe, Tack, & Yu, 2023). This article documents the collection that resolves both: a record-linkage procedure attributing unit-level indemnities to perils, and synthetic pseudo-producer panels locatable on RI grids under two alternative treatments of placement uncertainty. The assets are project-agnostic inputs for peril-specific product design, index insurance, loss attribution, and producer-level policy simulation.

Loss attribution as constrained record linkage

Let u index insured units within a pool j (the identifier fields common to the experience and cause-of-loss files), and let c index cause-of-loss records with indemnities I_{jc}. The linkage assigns records to units by a cascade of increasingly coarse identification, each step logically implied by pool accounting rather than assumed:

Exact financial identification. A unit and a record match one-to-one on rounded liability, premium, subsidy, and indemnity; ties on either side are excluded so no record can be assigned twice.
Sole claimant. If a pool contains exactly one indemnified unit, all of the pool’s records belong to it by adding-up.
Sole peril. If a pool contains exactly one record, that record explains every claimant in the pool.

Units resolved by the cascade receive attributed cause shares

s_{uc} = \frac{I_{uc}}{\sum_{c'} I_{uc'}},

where I_{uc} is the indemnity of the records matched to unit u under cause c; shares sum to one within a unit. Units claimed but unresolved (ambiguous pools) receive the pool’s residual shares, the indemnity distribution over causes among unmatched records, a proration that is unbiased under exchangeability of unresolved claimants within a pool. Every unit carries a resolution-status code so analysts can report attribution quality and test robustness to the prorated subset.

A specialization of the same cascade identifies prevented-planting indemnities, restricting attention to pools covered by the prevented-planting provision and containing at least one prevented stage, and recovering the 5- and 10-percent buy-up elections from stage codes. The resulting verdicts and shares support research on the provision’s utilization and design (Tsiboe & Turner, 2025).

Pseudo-producer panels

Attributed unit experience is joined to the calibrated yields (see Calibrating sub-county yields from insurance transactions) and projected prices, pooled into pseudo-producers on the pool-by-contract dimensions, and the revealed coverage election is recovered from the liability ratio:

\hat{\theta}_{i} = \min\!\left(0.85,\; \max\!\left(0.50,\; 0.05 \left\lfloor \frac{L_{i}/L^{pot}_{i}}{0.05} \right\rfloor \right)\right),

where L_{i} is realized liability, L^{pot}_{i} = L_{i}/\theta_{i} the potential (full-coverage) liability accumulated before pooling, and the floor-and-bound operators bin the ratio to the legislated 5-point coverage grid.

Locating producers on index grids

The economic stakes of placement are established by the farm-level evidence on index insurance. Evaluating a broad range of weather- and area-based products against more than 190,000 dryland row-crop yield observations spanning over 7,000 Kansas farms from 1973 to 2018, Tsiboe, Tack, and Yu (2023) find substantial basis risk across agroclimatic indices, limited ability to reduce income variability under fair pricing, and general underperformance relative to area-based yield products, with growth-stage-specific heat indices for corn and soybeans the notable exception. Measuring that basis risk for FCIP participants requires knowing where within a county insured exposure sits relative to the index grid, which the public records do not reveal.

County-to-grid placement is therefore inherently uncertain: the public records identify the county, while RI contracts are written on sub-county grid cells. Placement probabilities \pi_{ig} (the probability that producer i’s exposure lies in grid g) are constructed from the USDA National Agricultural Statistics Service Cropland Data Layer (CDL), a 30-meter satellite-derived classification of U.S. cropland (Boryan et al., 2011). A multi-year CDL stack is masked to each county, pixel classifications are mapped to RMA commodity definitions, and per-pixel crop-cover weights are accumulated on a fine climate-data grid co-registered with the CDL imagery. The gridded weights are then mapped onto the official RMA rainfall-index grid and normalized within each county and commodity combination so that \sum_{g} \pi_{ig} = 1. The probabilities thus measure where a commodity’s cultivated area actually lies within a county, rather than assuming exposure is spread uniformly across it. Two panel constructions treat the remaining uncertainty in complementary ways:

Probability-weighted apportionment. Each producer is divided fractionally across its candidate grids; any exposure quantity x_{i} contributes

x_{g} = \sum_{i} \pi_{ig}\, x_{i}

to grid g. Aggregate exposure is conserved exactly and within-county placement risk is integrated out, the appropriate benchmark when placement noise is a nuisance parameter.

Sampled whole-grid assignment. For research in which placement risk is the object of study (e.g., basis risk in index insurance), each producer is placed entirely in one grid per replication,

G^{(r)}_{i} \sim \text{Categorical}\!\left(\pi_{i1}, \dots, \pi_{iG}\right),

independently across producers, with 1,000 published replications. The random-number contract is explicit: replication r is drawn under seed s_0 + r for a fixed master seed s_0, so any single replication is reproducible in isolation and the replication set can be extended without disturbing existing draws. An integrity key accompanies the assignments so that any change in the underlying producer set invalidates, rather than silently contaminates, previously drawn replications.

Limitations

Pseudo-producers are aggregates and inherit the representative-producer interpretation; the attribution is exact only for cascade-resolved units, and prorated shares are pool-level approximations whose error is bounded by pool heterogeneity. Placement probabilities reflect crop-cover intensity, not verified field locations, so grid-level results should be reported with across-replication dispersion. These are the standard caveats of ecological inference, made explicit here by the resolution codes and the replication design.

Data availability

The assets are distributed in a single public release (collection synthetic_fcip): unit-level prevented-planting identification (pp_units.rds), full-taxonomy cause shares (unit_cause_shares.rds), placement probabilities (crop_ri_candidates.rds), the apportioned panel (synthetic_data.rds), the grid-free panel (agent_panel.rds), and the sampled assignments, bundled 100 replications per archive (agent_grid_assignments_<lo>_<hi>.zip, e.g. agent_grid_assignments_0001_0100.zip) with the integrity key included in every archive so any single archive is self-contained; download only the replication range an analysis requires. The underlying county-level crop-cover weights from which the placement probabilities are built are separately released (countySpatialWeights.rds, repository ftsiboe/rAgroClimate, tag county_spatial_weights) for users who need weights on the soil or climate grids rather than the RI grid:

piggyback::pb_download(
  file = c("agent_panel.rds", "agent_grid_assignments_0001_0100.zip"),
  dest = tempdir(),
  repo = "ftsiboe/USFarmSafetyNetLab", tag = "synthetic_fcip")

utils::unzip(file.path(tempdir(), "agent_grid_assignments_0001_0100.zip"),
             exdir = tempdir())
agents <- readRDS(file.path(tempdir(), "agent_panel.rds"))
rep1   <- readRDS(file.path(tempdir(), "agent_grid_assignments", "rep_001.rds"))
agents_r1 <- merge(agents, rep1, by = "agent_uid")

Recommended citation

Tsiboe, F. (2026). Synthetic producer panels and loss attribution. In FCIP calibrated and synthetic data catalogue. https://ftsiboe.github.io/rfcipCalibrate/articles/synthetic-fcip.html

Data users should additionally cite Tsiboe, Turner, and Yu (2025).

Disclaimer

This product uses data provided by USDA/RMA but is neither endorsed by nor affiliated with USDA or the U.S. Government.

References

Boryan, C., Yang, Z., Mueller, R., & Craig, M. (2011). Monitoring US agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer program. Geocarto International, 26(5), 341–358. https://doi.org/10.1080/10106049.2011.562309

Miranda, M. J. (1991). Area-yield crop insurance reconsidered. American Journal of Agricultural Economics, 73(2), 233–242. https://doi.org/10.2307/1242708

Tsiboe, F., Tack, J., & Yu, J. (2023). Farm-level evaluation of area- and agroclimatic-based index insurance. Journal of the Agricultural and Applied Economics Association, 2(4), 616–633. https://doi.org/10.1002/jaa2.77

Tsiboe, F., & Turner, D. (2025). Incorporating buy-up price loss coverage into the United States farm safety net. Applied Economic Perspectives and Policy, 47. https://doi.org/10.1002/aepp.13536

Tsiboe, F., Turner, D., & Yu, J. (2025). Utilizing large-scale insurance data sets to calibrate sub-county level crop yields. Journal of Risk and Insurance, 92(1), 139–165. https://doi.org/10.1111/jori.12494

U.S. Department of Agriculture, Risk Management Agency (USDA-RMA). (n.d.). Cause of loss, Summary of Business, and rainfall index program information. https://www.rma.usda.gov