Scenario: Before running cohort analysis on a new dataset, you want to check for issues that might affect results.
| customer_id | month | revenue |
|---|---|---|
| alice@co.com | 2024-01 | 100 |
| Alice@co.com | 2024-02 | 100 |
| bob | 2024-01 | $50 |
| bob | 2024-01 | 50 |
| carol | 2024-02 | 200 |
| carol | 200 | |
| dave | 2024-03 | -75 |
Quality Score: 45/100 (Poor)
Issues Found:
| Issue | Severity | Details |
|---|---|---|
| Duplicate row | High | Row 4 and 5 are identical (bob, 2024-01, 50) |
| Inconsistent IDs | High | "alice@co.com" and "Alice@co.com" — same person? |
| Text in number column | Medium | "$50" should be numeric |
| Missing value | Medium | Row 7 has empty month |
| Negative value | Low | dave has -75 revenue — intentional? |
=LOWER(A2)=SUBSTITUTE(C2,"$","")After cleaning, run the health scan again. Target: 90+ score before proceeding to analysis.