Good data, good governance

Data is concrete knowledge, and without it, decision makers can only rely on guessing.

Data is ‘in’. The media provides clear percentages and cool visualisations for every kind of statistic, from cricket to elections. Census and survey reports are much discussed by the common public. Open data is in great demand by researchers, marketing professionals, civil society, and administrators alike. And Big Data is the new buzzword in the software and computing industry.


This is a wonderful development; data is concrete knowledge, and without it, decision makers can only rely on guessing. This applies the most in the case of governance and public policy, where the stakes are the highest.

The Government of India has come a long way in collecting data and making it publicly available. First came the RTI Act, which established the basic principle that public data should be in the public domain by default. The National Data Sharing and Accessibility Policy gave shape to this principle. Now, there is the remarkable Open Data Portal,, where to date there are nearly 4000 data sets from over 50 departments of the Government of India.

There are three broad methods of data collection used by the Government. The first, and simplest, is compiling records of past events. Secondly, there are regular sample surveys such as the NSSO, NFHS, DLHS, AHS, SRS, etc., and the decadal Census of India. The third method is the real-time monitoring of schemes through field reporting.

While using past records for decision-making, one has to be sure that the data fields used continue to be of current relevance. For example, in a price index, are the items included actually what people are spending the most on? This is exactly what was discovered in the 68th round of the NSSO household expenditure survey in 2011-12; expenditure on cereals and pulses was falling and that on eggs and diary products was increasing. As a result, the composition of the Wholesale Price Index is under modification.

Just as surveys work as reality checks for simple data collection, the Census of India works as a reality check for surveys. All survey sample frames are based on the Census enumeration frame, i.e. the geographical net of enumeration blocks, each having roughly the same population of about 500. This frame gets renewed every decade with the fresh Census. Also, projections and assumptions used since the last Census can now be replaced with the actual facts, creating a new set of baseline data. For example, the population of NCT Delhi had been projected at 18.7 million for 2011 on the basis of 2001 data. The actual figure from Census 2011 came out to be 16.7 million. This happened because several unanticipated factors arose in the intervening decade, such as the ban on industrial units in residential areas and large-scale slum removal.

When really large data sets are collected, which can only be interpreted with the use of high computing power, they are known as Big Data. The Census of India is, in a sense, Big Data, since it covers many aspects of information abut a very large population. It can come up with complete surprises, revealing hitherto unknown facts, often needing urgent policy intervention. One way in which this comes about is because a pattern is too subtle to show up in ordinary experience. For example, the 1991 Census brought out the falling child sex ratio, which had not been expected to be so critical. It led to the Prenatal Diagnostic Techniques (PNDT) Act being enacted in 1994. Another way in which the Census comes up with surprises is because it counts very rare events too, which would not be detected in smaller samples. This happened in the 2011 Census, in which the presence of a small but definite number of manual scavengers across the country was detected, controverting the claims of public agencies that the practice stood wiped out. As a result, a targeted plan for abolishing this practice and rehabilitating the scavengers is now under way.

The multifaceted data sets from the Census also help in detecting possible linkages between different aspects of data. For example, it has been well established by research across several Censuses that there is a strong correlation between high female literacy and low fertility rates. This has important implications for cross-departmental policy interventions, and points to the necessity of close association between academicians and policy makers so that insight can be turned into successful change on the ground.

The Census, and other surveys, can only be accurate as long as the public perceives them to be neutral and not directly tied to any Government scheme. Otherwise, the responses would be coloured by the anticipation of future benefits. In general, the Census, with its legal backing, confidentiality, and multilateral style of working, involving the Central and State Governments, as well as academia and civil society, is a byword for reliability. However, if there arises a competition among different groups for increasing their numbers, the exercise gets vitiated. This is what happened in the 2001 Census in Nagaland, where the numbers got inflated to such an extent that the results were ultimately rejected; consequently, the universally accepted 2011 Census in Nagaland had a negative growth rate! This had far reaching effects for Nagaland throughout the decade, due to misallocation of resources leading to wastages, the delimitation of constituencies being set aside by the court, and much public unrest.

The third aspect of data collection—real time monitoring of field activities—has a crucial role to play in ensuring not only that the schemes themselves function well, but also that policy making is at all times in touch with reality. A plethora of technological solutions are being utilised for this purpose. The e-Mamta MCH monitoring system under the NRHM is a good example. SMS based reporting and grievance redressal is being used in health schemes in Andhra Pradesh, policing in Kerala, and mid day meals in Uttar Pradesh. Such applications are only limited by the imagination of the scheme administrators. Further, when such data is placed in the public domain, it goes a long way to ensure transparency and accountability.

Thus, an interactive combination of all three methods of data collection—records, surveys, and real time monitoring—is necessary for successful policymaking and implementation. Also, data needs to be studied by not only the concerned department but also by other government agencies, researchers, and civil society, since all of them can contribute in different ways to extract the information hidden in the data. Their findings must be taken into account as well. Only then can correct decisions, which are likely to work out well in the long term, be made.

Photo: Katey Nicosia