Session Recording Analysis
Sample failed-conversion replays, replay each one as an event sequence, and have the LLM run a fixed set of behavioral checks to surface friction the aggregate detectors can't see. This is not a step inside the hourly pipeline — it runs on its own schedule and its own queue.
▸Inputs · analyzes
- Failed-conversion sessions that carry frustration signals (rage / dead clicks, JS errors), sampled from PostHog replays
- A rolling 7-day window (
-7d); falls back to older recordings when recent ones are unavailable (e.g. replay billing limits) - Store context (AOV, conversion rate, device split) for grounding — optional, best-effort
⚙Process
- Sample up to 15 failed sessions (minimum 5, configurable) and reconstruct each session's event sequence
- LLM runs 12 standardized behavioral checks over the whole batch
- Emit
session_recording_frictionsignals from the failed checks - Daily batch only: then run diagnosis on all undiagnosed signals and dispatch alerts
✦Outputs · generates
session_recording_frictionsignals →posthog_signals- A per-batch
frictionScore(0–100 = % of checks that failed) - Daily batch only: persisted diagnoses + dispatched alerts
Two ways recordings get analyzed
The same sampling + LLM engine is invoked from two places on two schedules. They share code
(sampleFailedConversionRecordings → analyzeRecordingBatch) but differ in scope and what happens after.
flowchart TB
subgraph DAILY["Daily batch · own queue, ~2am · store-wide"]
direction LR
DCRON["Cloud Scheduler
/recordings/analyze-daily"] --> DSAMPLE["Sample failed
sessions
(-7d, up to 15)"] --> DLLM["LLM: 12
behavioral
checks"] --> DSIG["friction signals
→ posthog_signals"] --> DDIAG["Diagnose
undiagnosed
signals"] --> DALERT["Dispatch
alerts"]
end
subgraph HOURLY["Inline Stage 3b · inside the hourly pipeline · top product page only"]
direction LR
HTOP["Top product_
performance
signal"] --> HSAMPLE["Sample failed
sessions on
that URL"] --> HLLM["LLM: 12
behavioral
checks"] --> HSIG["friction signals → posthog_signals
cluster with the product signal"]
end
DAILY ~~~ HOURLY
style DAILY fill:#fce4ec,stroke:#E91E63
style HOURLY fill:#fff3e0,stroke:#FF9800
The daily batch flow
Enqueued once per project onto the session-recording-analysis queue; runs independently of the hourly pipeline.
flowchart LR
SAMPLE["sampleFailedConversionRecordings
-7d · limit 15 · min 5
fallback to older replays"]
CTX["queryStoreMetrics
AOV · CVR · device split"]
LLM["analyzeRecordingBatch
12 behavioral checks → frictionScore"]
PERSIST["Persist session_recording_friction
→ posthog_signals"]
DIAG["diagnoseSignals + persistDiagnoses
(all active undiagnosed)"]
ALERT["dispatchAlertsForNewDiagnoses"]
SAMPLE --> LLM
CTX --> LLM
LLM --> PERSIST --> DIAG --> ALERT
style PERSIST fill:#fff3e0,stroke:#FF9800
job_type: daily_batch) and the diagnosis / alert steps are non-blocking.The 12 behavioral checks
Each check returns pass, a quantified pattern (e.g. “5/12 sessions…”), specific session IDs +
timestamps, replay URLs, and recommendations. frictionScore is the share of checks that failed.
flowchart TB
subgraph INTERACTION["Interaction"]
C1["1 · navigationFriction"]
C2["2 · formInteractionClarity"]
C3["3 · ctaResponsiveness"]
C4["4 · variantSelectionFlow"]
C5["5 · checkoutProgression"]
end
subgraph TECHNICAL["Technical (co-observation only)"]
C6["6 · performanceImpact"]
C7["7 · errorImpact"]
end
subgraph CONTEXT["Context & intent"]
C8["8 · mobileSpecificIssues"]
C9["9 · trustBarriers"]
C10["10 · decisionParalysis"]
C11["11 · loadingFeedback"]
C12["12 · exitTrigger"]
end
Why it works this way
The sample is drawn from failures only, so anything present in nearly every failed session
(classically a high JS-error count) is consistent with population-wide noise, not causation. The technical
checks (performanceImpact, errorImpact) are therefore treated as co-observations: the LLM records
what it sees but is barred from naming a technical artifact as the primary cause — or emitting a recommendation
from it — unless a corresponding measured signal exists in the same cluster. This keeps the most open-ended LLM
step in the system from inventing root causes the aggregate detectors never corroborated.