Data Health Scans
Getting Started with Data Health Scans.
Data Health Scan Workflows
Example: Pre-Analysis Validation
Scenario: Before running cohort analysis on a new dataset, you want to check for issues that might affect results.
Your Data (with problems)
| customer_id | month | revenue |
|---|---|---|
| alice@co.com | 2024-01 | 100 |
| Alice@co.com | 2024-02 | 100 |
| bob | 2024-01 | $50 |
| bob | 2024-01 | 50 |
| carol | 2024-02 | 200 |
| carol | 200 | |
| dave | 2024-03 | -75 |
Step-by-Step Setup
- Open Jetti and select your data sheet
- Report type: Data Flows
- Run all checks
Expected Results
Quality Score: 45/100 (Poor)
Issues Found:
| Issue | Severity | Details |
|---|---|---|
| Duplicate row | High | Row 4 and 5 are identical (bob, 2024-01, 50) |
| Inconsistent IDs | High | "alice@co.com" and "Alice@co.com" - same person? |
| Text in number column | Medium | "$50" should be numeric |
| Missing value | Medium | Row 7 has empty month |
| Negative value | Low | dave has -75 revenue - intentional? |
How to Fix
- Duplicates: Remove row 5 (exact duplicate)
- Inconsistent IDs: Standardize to lowercase:
=LOWER(A2) - Text numbers: Remove $ symbol:
=SUBSTITUTE(C2,"$","") - Missing values: Fill in or remove row 7
- Negative values: Verify if intentional (refund?) or data error
Re-run After Fixes
After cleaning, run the health scan again. Target: 90+ score before proceeding to analysis.