The problem
A typical 30-minute run produces ~5,000 quantity samples across HR, distance, energy, step count, running power, stride length, ground contact, vertical oscillation, and speed streams. Encoded as one dict per sample, each carryingdevice, sourceRevision, uuid, startDate, endDate, value, unit, and metadata:
device and sourceRevision repeated 5,000 times.
The columnar shape
Each per-stream block becomes:dur— included only if any sample in the stream has non-zero duration. HR samples are typically instant; step-count samples are typically 3-second windows.src— array of source-index integers, one per sample. Omitted entirely if every sample uses source0(the common case for single-device users). When present, references entries in the top-levelsourcesarray.metadata— a sparse object keyed by sample-index-as-string, only present for samples that have metadata. Most samples have none.
What’s gone, and what’s preserved
| Information | Status |
|---|---|
Per-sample value (the actual number) | Preserved verbatim |
Per-sample startDate | Preserved as integer seconds offset from workout startDate |
Per-sample endDate | Preserved when distinct from startDate, as dur |
Per-sample unit | Hoisted once per stream (always the same per stream anyway) |
Per-sample device / sourceRevision | Hoisted to top-level sources, referenced by src |
Per-sample uuid | Dropped. UUIDs are HealthKit-internal handles; no consumer of an export uses them. |
Per-sample metadata | Preserved via sparse map when present, omitted otherwise |
Why integer-second offsets
Apple’s quantity samples are second-aligned in practice — the Apple Watch writes samples at second boundaries. Storing them as offsets from the workout’sstartDate:
- Drops ~22 bytes per timestamp (
"2026-05-09T16:11:55Z") to typically 2-4 bytes (5,120,1800). - Makes the data trivially plottable:
for (i, t) in enumerate(stream.t): print(workout_start + t, stream.v[i]). - Negative offsets are allowed (a sample whose
startDateis just before the workout window — rare, but valid).
Example: a real 33-min run
| Encoding | File size |
|---|---|
| Original per-sample-dict JSON | 2.58 MB |
| After provenance hoist (sources/src) | 760 KB |
| After columnar (this format) | 94 KB |
sampleEncoding: "columnar-v1" is the discriminator. Future versions of the encoding will bump the suffix; consumers should check it before parsing.
Per-stream rules summary
For each entry insamples.*:
unitis always present and constant for the streamtis required (integer seconds, may be empty if no samples)vis required (same length ast)duris included iff any sample has non-zero duration; same length astsrcis included iff any sample’s source differs from0; same length astmetadatais included iff any sample has metadata; keyed by string-form sample index
samples dict entirely.