Guided story
What India's AQI Can See, And What It Can't
AQI is a score from a monitored place, not a map of the air everyone breathes. India's public raw archive is real and deeper than the dashboard suggests, but the inspectable PM2.5 history is late, uneven and city-heavy.
AQI is a score, not the air
India's AQI is the number most people see first. CPCB's official National AQI material describes it as a score built from pollutant sub-indices. In plain English, each pollutant gets converted into its own index score, then the highest available sub-index becomes the overall AQI. The pollutant behind that highest score is reported as the dominant pollutantdominant pollutantThe dominant pollutant is the pollutant whose sub-index drives the overall AQI at that place and time.It reminds the reader that the same AQI number can be driven by different pollutants in different cities or seasons.. CPCB's public categories run from Good and Satisfactory to Moderate, Poor, Very Poor and Severe.
That is useful design. A single score is easier to read than a table of PM2.5, PM10, nitrogen dioxide, sulphur dioxide, carbon monoxide, ozone and other pollutants. But it is also a compression. AQI tells you the public signal from a monitor at a place and time. It does not show every pollutant cell, every missing instrument, every station's history, or whether a public file has enough PM2.5 days for trend work.
This article starts underneath the score. The daily repository audit checked 598 CPCB station records. In 2009, CPCB listed 25 daily station-yearstation-yearA station-year is one station in one calendar year.The article separates a station-year with any PM2.5 from a station-year with enough days to support trend work. files in the audited archive, but only 1 had usable PM2.5. In 2017, there were 504 listed station-year files, but only 80 had usable PM2.5. By 2025, the same audit found 579 listed files and 540 usable PM2.5 station-years. The lesson is simple: files, rows and pollutant cells are different layers of public evidence.
Listed files and usable PM2.5 history diverge
CPCB daily raw repository · listed files, usable PM2.5 years and files with no PM2.5 · 2009-2025
Repository files listed · 2025 · latest point
The daily repository grows to 579 listed station-year files by 2025, but the usable PM2.5 history catches up much later.
This is the article's core chart because it separates three layers: a listed daily file, a station-year with any usable PM2.5, and a listed file with zero PM2.5. The odd 2017 jump is real, but it is a repository-listing finding: CPCB lists 504 daily station-year files, only 80 have any usable PM2.5, and 424 have none. By 2025 the gap is much smaller, with 579 listed files and 540 usable PM2.5 station-years.
What raw data can the public download?
CPCB's public raw repository is broader than a daily CSV archive. On 29 June 2026, a direct probe checked 598 station records against the public dataRepository/file-path endpoint for raw data at 15-minute, hourly, 8-hour and daily frequencies. It returned 4,346 listed 15-minute raw files, 4,367 hourly files, 4,396 8-hour files and 4,455 daily files.
That is a real public surface. It matters because the official system is not only an AQI card on a dashboard. It also exposes file paths that can be reproduced by script. But a listed path is still only a listing. It says CPCB has a raw file path for a station, frequency and year. It does not prove that the CSV has rows. It does not prove PM2.5 or PM10 is present. It does not prove the year is complete enough for a trend.
A narrow spot-check confirmed that sub-daily files are not imaginary. For Sanjay Palace, Agra in 2025, the same public download endpoint returned parseable CSVs at all four frequencies: 35,040 rows at 15-minute frequency, 8,760 hourly rows, 1,095 8-hour rows and 365 daily rows. Each file had PM2.5 and PM10 columns with many non-missing values. Treat that as a proof of access, not a completeness audit for every sub-daily file.
The AQI repository is a different case. The CPCB interface has a /repository/aqi route and app-code strings for AQI repository listing and Excel export. In this run, I could not reproduce a stable public AQI audit endpoint comparable to the raw file-path call. Several dashboard routes used an encrypted browser proxy. So the raw repository is treated as scriptably verified here. The AQI repository is treated as visible in the official interface, but not used for locked historical numbers.
CPCB's raw archive extends beyond daily files
Public raw repository probe · file listings at 15-minute, hourly, 8-hour and daily frequency · 598 station records
CPCB's verified raw endpoint lists thousands of files at 15-minute, hourly, 8-hour and daily frequency.
The file-list probe returned 4,346 listed 15-minute raw files, 4,367 hourly files, 4,396 8-hour files and 4,455 daily files across 598 station records. A separate spot-check for Sanjay Palace, Agra in 2025 confirmed that sub-daily files can be downloaded and parsed, but this chart itself counts listings only. It does not validate row completeness or pollutant cells inside every sub-daily file.
How much PM evidence is actually visible?
The deeper audit downloaded and checked 4,455 daily raw CSVs. Across those files, it found 916,083 non-missing PM2.5 station-days and 872,770 non-missing PM10 station-days. A station-daystation-dayA station-day is one monitoring station with one non-missing pollutant value on one date.Station-days measure how much public observation exists. They do not measure whether pollution rose or fell. is one station with one non-missing pollutant value on one date. It is a coverage unit, not a pollution concentration.
The 2017 spike in listed files is the chart's warning label. CPCB lists 504 daily station-year files for 2017, up from 50 in 2016. But only 80 of the 2017 files have any usable PM2.5, and 424 have zero usable PM2.5. Many are CSV shells with daily timestamp rows and blank particulate columns. They are public files, but not public PM2.5 evidence.
The inspectable particle record becomes much stronger after 2020. PM2.5 station-days rise from 15,384 in 2017 to 74,169 in 2020, 153,321 in 2023 and 173,983 in 2025. PM10 follows the same broad pattern in the audited daily files. Do not read that as worsening air. It means more public observations are visible in the repository.
Usable PM observations become broad only after 2020
Downloaded CPCB daily files · non-missing PM2.5 and PM10 station-days · 2009-2025
PM2.5 station-days · 2025 · latest point
The audited daily files contain 916,083 non-missing PM2.5 station-days and 872,770 PM10 station-days, with most depth arriving after 2020.
A station-day is one station with one non-missing pollutant value on one date. PM2.5 station-days rise from 15,384 in 2017 to 74,169 in 2020 and 173,983 in 2025. That is a rise in inspectable observations, not a rise in pollution concentration. The same logic applies to PM10, which is plotted beside PM2.5 as a second particle-coverage check.
When is a station-year ready for trend work?
A station-year with one PM2.5 day is visible. It is not a serious yearly record. That is why this audit counts four thresholds: any PM2.5 day, 30 or more days, 180 or more days and 300 or more days.
The distinction is large in the older archive. In 2017, 80 station-years had any PM2.5 day, 76 had at least 30 days, 42 had at least 180 days and only 23 had at least 300 days. By 2025, the four counts were 540, 536, 490 and 444. The gap narrows, but it does not disappear.
The 300-day line is not an official CPCB standard. It is an audit threshold for readers who want a fuller annual history. The point is practical. A dashboard can show a station. A file list can show a year. A trend analyst still has to ask how many actual pollutant days sit inside that station-year.
Trend-ready PM2.5 years lag basic visibility
Downloaded CPCB daily files · station-years by minimum count of non-missing PM2.5 days · 2009-2025
Any PM2.5 day · 2025 · latest point
In 2025, 540 PM2.5 station-years had any data, but only 444 had at least 300 non-missing days.
This chart separates minimal visibility from fuller annual evidence. In 2017, 80 station-years had any PM2.5, but only 23 reached 300 days. By 2025 the gap is smaller, but it still tells analysts not to treat any-data counts as ready-made trend panels.
Where does the station map go thin?
The station network measures monitored India. It does not measure all of urban India. The CPCB/OAQ station hierarchy used here has 335 distinct city names. Census 2011 counts 4,041 statutory towns. On that denominator, about 91.7 percent, or about 92 percent, of statutory towns are outside this station-list comparison.
This is a place-visibility denominator. It is not a population-exposure estimate. It does not mean 92 percent of Indians lack AQI data. It also does not mean every statutory townstatutory townA statutory town is a legally urban place in the Census frame, such as a municipality or cantonment board.The town comparison is a place-visibility check. It is not a claim about population exposure or where monitors must be placed. needs its own monitor. Some towns are small, some are near larger airsheds, and station-list names do not map perfectly to Census names.
The narrower finding is still important. AQI is a monitor-based public signal. Before we even ask whether a file has PM2.5 cells, many legally urban places are outside the public station-list frame used in this audit.
The station list covers only a fraction of urban places
CPCB/OAQ city names compared with Census 2011 statutory towns
The CPCB/OAQ hierarchy has 335 city names, while Census 2011 counts 4,041 statutory towns.
This chart is a geography warning, not a monitor-siting prescription. It says about 92 percent of statutory towns are outside this station-list comparison, but it does not say 92 percent of Indians lack AQI data. It also does not say every statutory town needs a monitor, because airsheds and settlement size matter.
Which cities have the deepest PM2.5 archive?
The usable PM2.5 archive is city-heavy. New Delhi has 73,348 PM2.5 station-days in this audit across 25 stations with any usable PM2.5. Mumbai has 41,502 across 28 such stations. Delhi has 39,328, Bengaluru 25,311 and Hyderabad 22,663.
That is not an air-quality ranking. A city can rank high because it has more stations, older stations, fewer missing PM2.5 cells, or all three. A lower-ranked city may be cleaner, dirtier or simply less visible in this public archive.
For national claims, this concentration matters. The public historical archive is strongest where the monitoring network is dense and older. It is weakest where stations arrived late, publish thin histories, or never appear in the public station list used here.
Deep PM2.5 history is concentrated in a few cities
Downloaded CPCB daily files · non-missing PM2.5 station-days by city · 2009-2025
New Delhi, Mumbai and Delhi dominate the usable PM2.5 station-day archive.
New Delhi has 73,348 usable PM2.5 station-days, Mumbai has 41,502 and Delhi has 39,328 in the audited daily files. These counts combine station density, station age and missing-cell patterns. They tell us where the public historical archive is deepest, not where air is cleanest or dirtiest.
When did stations first become usable?
First usable PM2.5 year is stricter than first listed file year. It asks when a station first has any non-missing PM2.5 value in the daily repository files.
Only 200 of the 598 station records had usable PM2.5 before 2020. The recent record is much stronger: 508 station-years had usable PM2.5 in 2023, 524 in 2024 and 540 in 2025. Across the full daily audit, 396 stations had at least 1,000 usable PM2.5 days.
That makes the archive good for many current comparisons and weaker for long historical panels. A station appearing in an older repository year does not automatically give you a PM2.5 trend from that year. Sometimes the file exists, but the particle cells do not.
Most station PM2.5 histories begin late
Downloaded CPCB daily files · first year with any non-missing PM2.5 value, by station
Only 200 of 598 station records had usable PM2.5 before 2020.
First usable year means the first year with any non-missing PM2.5 daily value in the downloaded repository files. It is not the station's commissioning date and not necessarily the first year the station operated. The chart shows why recent PM2.5 comparisons are much easier than long historical panels.
Where do file years outrun PM2.5 years?
Some station histories look long until you count usable PM2.5 years. Airoli in Navi Mumbai has 17 listed daily years but only 5 usable PM2.5 years, a gap of 12. City Railway Station and Sanegurava Halli in Bengaluru have 11 listed years and zero usable PM2.5 years in the audited daily files.
Across all 598 station records, 40 had no usable PM2.5 in daily files. That splits into 22 records with listed daily files but no usable PM2.5, and 18 records with no listed daily repository file in the audit.
This does not prove a station failed. It says the public daily repository does not support a PM2.5 history for those station-years. For a public audit, that is the difference that matters.
Long file histories can hide short PM2.5 records
Downloaded CPCB daily files · listed file years minus years with any usable PM2.5, by station
Some stations have long listed file histories but little or no usable PM2.5 history.
Airoli in Navi Mumbai has 17 listed daily years but only 5 usable PM2.5 years. City Railway Station and Sanegurava Halli in Bengaluru have 11 listed years and zero usable PM2.5 years in the audited daily files. These are concrete examples of the file-versus-pollutant gap.
What does a file with rows but no PM look like?
Some files are not empty. They still do not help with PM2.5 or PM10 history. The examples in this chart have daily rows and zero usable PM2.5 and PM10 days.
Rows are a middle layer of evidence. A file can contain timestamps and gas pollutants such as NO, NO2, SO2, CO or ozone while the PM columns are blank or unusable. For AQI, different pollutants can matter on different days. For a particle-pollution history, PM2.5 and PM10 cells have to be present.
This is why the article keeps file listing, rows and pollutant cells separate. If those are collapsed into one word, the archive looks cleaner than it is.
Daily rows can exist while PM values are blank
Selected CPCB daily files with rows and zero usable PM2.5 or PM10 values
Some daily files contain rows but zero usable PM2.5 and PM10 days.
These representative examples span full-year and partial-year CSVs. A file can contain timestamps and non-PM pollutant columns while both particle columns are blank or unusable. That is why row counts are a middle layer of evidence, not the final answer for PM history.
What does OAQ add?
OAQ is useful because it mirrors current CPCB data in a cleaner API shape. The inspected CPCB snapshot was generated on 23 June 2026 at 17:40 IST. It had 598 station records. At that moment, 203 records had at least one of the eight main pollutant fields, 139 had all eight, 185 had PM2.5 and 183 had PM10. The other 395 records had no main pollutant value in that snapshot.
That is latest visibility, not deep history. It tells you what the mirror saw at one moment. It does not replace the CPCB repository audit of daily historical files.
The station metadata snapshot also has useful public fields. All 598 records had coordinates, station type, a latest-seen timestamp and an agency or operator suffix in the station name. It did not expose clean commissioning or activation dates. The public can inspect where a station is and who appears to operate it. It cannot cleanly inspect every station's start date from this metadata alone.
The latest OAQ mirror is a snapshot, not history
OAQ CPCB latest snapshot · generated 23 Jun 2026 at 17:40 IST
In the inspected OAQ snapshot, 598 CPCB station records were present, but 395 had no main pollutant value at that moment.
OAQ is useful because it mirrors current CPCB data in a cleaner API shape. In the 23 June 2026 snapshot, 203 records had at least one of the eight main pollutant fields and 139 had all eight. That makes OAQ valuable for latest visibility and metadata, but not a replacement for CPCB's historical repository audit.
How should you read these numbers?
Read this as a visibility audit, not as a pollution trend. The daily audit used CPCB's public raw repository to list, download and inspect station-year CSVs. The frequency audit counted raw file listings at 15-minute, hourly, 8-hour and daily frequencies. The sub-daily spot-check proved those downloads can work for one station-year. The PM2.5 and PM10 completeness numbers come from downloaded daily files only.
The public surfaces are different. The all-India AQI portal is the current public signal. Advance search is a human-oriented dashboard and export surface. The CPCB raw repository endpoint is reproducible and audited here. OAQ is a clean latest mirrorlatest mirrorA latest mirror is a current copy of what a data source sees at one time, shaped for easier access.OAQ helps with current CPCB visibility, but it does not replace CPCB's historical repository.. The AQI repository is visible in the CPCB interface and app bundle, but a stable public AQI audit endpoint was not reproduced for this article.
The caveats are real. This audit does not validate calibration, siting, QA flags, monitor downtime or whether a station represents a whole city. The statutory-town comparison is a place denominator, not a population denominator. The 300-day threshold is an audit choice for trend-readiness, not an official rule.
What India can say with confidence is that public AQI sits on a real monitoring system, and CPCB exposes more raw public data than casual dashboard browsing suggests. What it cannot honestly say from these files alone is that the public can inspect a long, even, all-India PM2.5 record.