survey-kit-data¶
survey-kit-data loads public U.S. survey, administrative, and economic data into
Polars LazyFrame and DataFrame objects.
The package is meant to remove the repetitive parts of working with public data: finding the right source files, downloading them with appropriate cache checks, parsing agency-specific file layouts, and returning tables that are ready for analysis.
Project Scope¶
The package focuses on practical loaders for public datasets whose source files are useful but inconvenient: Excel workbooks with changing layouts, CSV downloads, public APIs, survey microdata files, and agency data organized around fiscal or reporting periods.
The goal is to provide a small, predictable Python API that returns usable Polars tables while preserving enough source context for validation and debugging.
What The Package Does¶
Each loader generally follows the same pattern:
- Locate or accept source URLs for public files or APIs.
- Download the source data, or reuse an existing cache.
- Parse source-specific layouts into Polars tables.
- Apply light cleanup such as date parsing, numeric casting, and geography IDs.
- Cache parsed parquet outputs for faster repeat calls.
The package deliberately avoids becoming a full metadata catalog. Loader outputs are documented enough to make the tables usable, while source-file audit columns are optional where they would otherwise clutter normal analysis.
Current Data Sources¶
| Source | Examples |
|---|---|
| Census | CPS ASEC, ACS5 API helpers |
| Federal Reserve | FRED state panels, SCF |
| BLS | Consumer Expenditure Survey |
| USDA | SNAP national, state, and county/substate history |
| DOL | Weekly UI claims and insured unemployed characteristics |
| HHS | TANF/AFDC caseload workbooks |
See Data Sources for loader-level details.