Primary and secondary Source of Data| Census, NSS, Digital Research

Contents

Introduction: Nature of Data in Social Research
Primary Sources of Data
Secondary Sources of Data
Introduction to Big Data at National Level
Census of India: The Grand Count
National Sample Survey (NSS): The Deep Dive
New Opportunities in Sociological Research (Kim)
Introduction to Digital Research
Digital Ethnography: The Heart of Digital Research

When sociologists try to understand how society works, they can’t just rely on guesses or personal opinions. They need data—which is simply the raw material of their research. Think of data like the ingredients in a recipe: without them, you can’t cook up any meaningful conclusions.

What is the data, really?

According to Nicholas Walliman (in Research Methods: The Basics), data can be numbers, words, images, sounds, or even people’s gestures. It’s anything that gives us clues about the social world. But raw data alone isn’t very useful—it’s like having a pile of bricks without a plan. The researcher must organize, interpret, and make sense of it.

Primary vs. secondary sources: a simple way to understand

Primary sources are firsthand accounts or original materials collected by the researcher themselves—like doing interviews, running a survey, or observing people in their daily lives. In the History of the Census of India, for example, the actual census forms filled out by households are primary data.

Secondary sources are interpretations or analyses made by someone else. A book like S. Deshpande’s Contemporary India: A Sociological View is a secondary source—it takes census data, historical records, and other studies and offers a fresh sociological interpretation. Both types are valuable, but a good researcher knows which one they need for their questions.

The golden rules: reliability, validity, and context

You can’t just trust any data blindly. Two big ideas matter here:

Reliability means: if someone else repeats your method, would they get similar results? For example, if you ask people about their income, will they give the same answer next week? Unreliable data is shaky ground.

Validity asks: are you actually measuring what you think you’re measuring? Asking how many hours someone watches TV might not truly capture their “social isolation”—that’s a validity problem.

And finally, context is everything. Yeunchul Kim, in his article on new opportunities for sociological research, reminds us that data doesn’t exist in a vacuum. A statistic from 1930s India means something very different from the same number today. Likewise, digital data (like social media posts) has its own context—location, timing, digital culture—which Sarah Pink, Heather Horst, John Postill, Larissa Hjorth, Tania Lewis, and Jo Tacchi emphasize Digital Ethnography. Ignoring context is like reading one sentence of a novel and claiming you understand the whole story.

So, in short: data is the starting point, but smart researchers handle it with care—checking where it came from, whether it’s trustworthy, and what real-world situation it belongs to.

Primary Sources of Data

Primary sources are simply the data that you—the researcher—go out and collect yourself, with your own hands and eyes. It’s firsthand material, gathered specifically to answer your own research questions.

What kinds of primary data exist?

There are two main flavors, and they serve different purposes:

Qualitative data is all about depth and meaning. It tries to capture how people actually feel, think, and behave in real life. Common types include:

Interviews – sitting down with someone and having a real conversation.

Ethnography – immersing yourself in a community or culture for a long time. As Pink and her co-authors explain in Digital Ethnography, this can also happen online, like studying how people interact in gaming communities or on Instagram.

Participant observation – not just watching people but actually joining in their activities.

Case studies – zooming in on one person, one village, one school, or one event in intense detail.

Quantitative data is all about numbers and measurements. It helps you see patterns across many people. Common types include:

Surveys and structured questionnaires – asking the same set of questions to a large group.

Experiments – changing one thing to see its effect on something else.

Statistical measures – averages, percentages, and correlations.

The good parts (strengths):

Primary data is original—you’re the first to touch it. It lets you dive deep into a topic and capture a rich, detailed context that no one else has captured before. If you want to understand why a neighborhood feels unsafe even though crime rates are low, primary data (like talking to residents) is the only way.

The not-so-good parts (limitations):

But honestly, collecting your own data is hard work. It takes a lot of time—sometimes months or years. It can be expensive, especially if you need to travel or pay participants. Sometimes you simply can’t access the people or places you want to study. And there are always ethical concerns: you must protect people’s privacy, get their consent, and make sure you’re not harming anyone. As Kim notes in his Journal of Asian Sociology piece, even well-meaning researchers can accidentally cross ethical lines if they’re not careful.

Secondary Sources of Data

Secondary sources are the opposite. Here, you’re not collecting anything yourself. Instead, you’re using data that someone else has already gathered—sometimes or decades ago.

What counts as secondary data?

Lots of things:

Government reports

Census data (the History of the Census of India is a perfect example—those tables of population figures collected by British and later Indian officials)

NSS (National Sample Survey) reports

Academic journal articles and books

Old letters, diaries, or records kept by institutions

The good parts (strengths):

Secondary data is a lifesaver if you don’t have much money or time. It’s cost-effective—often free or cheap. It gives you access to large-scale information that you could never collect on your own, like national employment figures. And it offers a historical perspective—you can compare how things have changed over 50 or 100 years. Satish Deshpande, in Contemporary India: A Sociological View, leans heavily on secondary data to show how caste and class patterns have shifted across generations.

The not-so-good parts (limitations):

The biggest problem is that you have no control over how the data was collected. Maybe the original survey asked leading questions. Maybe the census undercounted poor neighborhoods. There could be bias baked into the numbers that you can’t remove. And sometimes data is simply outdated using a 1980s survey to understand today’s gig economy would be misleading at best. You also can’t go back and ask follow-up questions, the way you could with primary data. In short, you’re trusting a stranger’s homework, and that’s always a little risky.

Introduction to Big Data at National Level

When we talk about “big data” at the level of an entire country, we’re talking about massive, systematic efforts to count and describe millions of people. In India, two giants stand out: the Census of India and the National Sample Survey (NSS). These are not just piles of numbers—they’re mirrors held up to the nation.

Census of India: The Grand Count

Historical evolution and significance

The Census of India didn’t start after independence—it has a long history going back to the 1870s under British rule. As the Government of India’s History of the Census of India (pages 1–10) explains, the early censuses were often messy, sometimes biased, but they marked the first real attempt to count every single person across a vast and diverse land. After 1947, the census became a massive democratic exercise, held every ten years without fail. Why is it so significant? Because a census isn’t just a headcount—it’s a statement that every person matters enough to be counted.

What does it cover?

The census casts a very wide net:

Population – total numbers, age, sex, rural/urban distribution.

Caste – a deeply sensitive but important category, especially for understanding social justice and reservation policies.

Literacy and education – who can read and write, and up to what level.

Housing – what kind of homes people live in, whether they have electricity, water, toilets.

Migration – who moved from where to where, and why.

How is it used?

For policymakers, the census is like a GPS. Without it, you don’t know where to build schools, how many hospital beds to plan for, or which districts need more food subsidies. For sociological researchers like Deshpande (in Contemporary India: A Sociological View), the census is pure gold—it allows them to track how caste composition changes over decades, how literacy gaps between men and women are shrinking (or not), and whether urbanization is speeding up or slowing down.

National Sample Survey (NSS): The Deep Dive

If the census is a broad map, the NSS is a magnifying glass. Unlike the census, which tries to count everyone, the NSS selects a smaller, carefully chosen sample of households and asks them very detailed questions.

What does it cover?

The NSS runs on a regular cycle—usually every few years—and covers topics that the census cannot go deep into:

Employment – who is working, in what kind of job, for how many hours, for what pays.

Consumption – what do people eat, wear, spend money on? This is how official poverty lines are drawn.

Health – how often do people get sick, where do they go for treatment, how much do they pay.

Education – enrollment rates, dropout rates, quality of schooling.

Why is it so important?

The NSS is the backbone of our understanding of socio-economic trends. Want to know if farmer incomes are rising or falling? NSS data. Want to see whether rural electrification has actually improved lives? NSS data. Want to measure the gap in health spending between rich and poor? NSS data again. Without it, we’d be flying blind.

The challenges (and they are real)

But neither the census nor the NSS is perfect. Let’s be honest:

Data reliability – Can you really count 1.4 billion people without missing millions? The answer is no. Homeless populations, migrant workers, and people in conflict zones are often undercounted. In surveys, people might lie (about income, about caste, about drinking habits) because they feel embarrassed or scared.

Political influence – Numbers can be weapons. Governments have been known to manipulate census or survey findings—rushing them, suppressing them, or tweaking definitions to make things look better than they are. As Kim argues in his Journal of Asian Sociology article, even “official” data carries the fingerprints of power.

Representation – A sample survey is only as good as its sampling method. If certain groups—like nomadic tribes or people without permanent addresses—are left out, the data claims to represent “all Indians” but doesn’t. That’s a quiet but serious form of exclusion.

In short, big national-level data like the Census and NSS are incredibly powerful tools. But wise researchers use them with open eyes, asking not just “what does the data say?” but also “who collected this, how, and what might they have missed?”

New Opportunities in Sociological Research (Kim)

For a long time, sociologists did things the old-school way: they interviewed people, handed out surveys, or sat in a village corner taking notes. All of that is still valuable. But as Yeunchul Kim argues in his Journal of Asian Sociology article (Vol. 48, no. 39, pp. 343–358), something new has arrived. Big data and computational tools are opening doors that used to be firmly shut.

What’s changing? The rise of big data analytics in sociology

Think about the digital breadcrumbs we all leave behind every single day: Google searches, Uber rides, Amazon purchases, Tinder swipes, Twitter rants, Netflix binges. That’s not just noise—that’s data. And it’s massive, messy, and full of patterns waiting to be found. Big data analytics is simply the use of powerful computers and clever algorithms to make sense of these giant piles of information.

Kim’s main argument is that sociologists can no longer afford to ignore these new sources. If you want to understand how people actually behave—not just how they say they behave in an interview—digital footprints are incredibly revealing.

Integration: computational methods meet traditional approaches

This isn’t about throwing away the old tools. Kim is very clear about that. Real magic happens when you combine computational methods with traditional sociology. For example:

Use network analysis (computational) to map who follows whom on social media, then do in-depth interviews (traditional) with a few key people to understand why those connections matter.

Analyze millions of tweets for sentiment during an election (computational), then run a small focus group (traditional) to understand the emotions behind the numbers.

It’s not either/or. It’s both/and.

What new opportunities does this create?

Kim highlights several exciting possibilities:

Studying social networks at scale – In the past, mapping a social network meant asking a small group of people to name their friends. Now you can watch millions of connections from in real time on LinkedIn or Instagram. Who is central? Who is isolated? How does information (or misinformation) spread?

Analyzing digital footprints – Every click, every pause before hitting “send,” every deleted draft tells a story. Digital footprints allow researchers to study behavior without interrupting it. People act naturally because they don’t know they’re being watched (which is both a superpower and an ethical nightmare).

Detecting large-scale behavioral patterns – Want to know if unemployment leads to depression? Traditionally, you’d survey a few hundred people. Now you might analyze search trends for “symptoms of sadness” or “how to find a therapist” across entire cities or countries. The scale is breathtaking.

But wait—there are serious risks too

Kim doesn’t wear rose-colored glasses. He warns that these new opportunities come with real dangers:

Privacy concerns – When researchers can access your location history, your shopping list, or your private messages, where is the line? Just because data exists doesn’t mean we have the right to use it. Kim argues that sociology risks becoming creepy if it doesn’t take privacy extremely seriously.

Ethical dilemmas – Traditional research requires informed consent. But how do you get consent from millions of people whose tweets you’re analyzing? Do you even need to know if the tweets are “public”? There’s no clear answer yet, and different countries have different rules. As Kim notes, ethics committees are struggling to keep up.

Data overload – More data isn’t always better. Sometimes it’s just more noise. Researchers can easily drown in terabytes of information without finding anything meaningful. Knowing what to ignore is just as important as knowing what to study. As Nicholas Walliman (in Research Methods: The Basics) might say, data without a clear question is just cluttering.

Kim concludes that big data is not a replacement for traditional sociology. It’s an addition—a powerful, tricky, exciting addition. The best sociologists of the future will be bilingual: fluent in both the language of human stories (interviews, observation, case studies) and the language of digital traces (algorithms, networks, analytics). But they’ll also carry a strong sense of responsibility, because with so much data comes the power to harm as well as to heal.

Introduction to Digital Research

Think about how much of our lives now happens online scrolling through Instagram, arguing on Twitter, shopping on Amazon, learning on YouTube, or just texting friends. If society has moved online, then research has to move online too. That’s exactly what digital research is all about: studying human behavior, culture, and relationships in digital spaces.

Digital Ethnography: The Heart of Digital Research

Traditional ethnography means packing your bags, moving to a village or a neighborhood, and living with people for months or years. But as Sarah Pink, Heather Horst, John Postill, Larissa Hjorth, Tania Lewis, and Jo Tacchi explain in Digital Ethnography: Principles & Practice (pages 1–18), you can now do ethnography in a Facebook group, a gaming server on Discord, a TikTok trend, or a Reddit forum. It’s the same deep curiosity—just a different field site.

What does digital ethnography study?

Pretty much any human activity that happens through screens:

Online communities – from parenting forums to fanfiction writers to crypto traders.

Social media – how people present themselves, how they argue, how they seek validation through likes and shares.

Digital practices – how we shop, date, work, protest, mourn, or even pray using apps and platforms.

The three core principles (according to Sarah Pink and her team)

Immersion – You can’t understand a digital world from the outside. You must dive in. That means creating an account, following the conversations, learning the inside jokes, understanding the unspoken rules. If you’re studying a gaming community, you probably need to play the game yourself.

Reflexivity – Fancy words, simple meaning: constantly asking yourself, “How am I shaping what I’m seeing?” Your own biases, your own identity, your own presence in that online space changes things. If you join a political chat group as a researcher, people might act differently. A reflexive researcher admits that instead of pretending it doesn’t happen.

Ethical sensitivity – Online spaces are tricky. Is a public tweet fair game for research? What about a private WhatsApp group? People might not even know they’re studying. Digital ethnographers must think carefully about consent, anonymity, and whether they’re invading spaces that feel private to their members.

What tools do digital researchers use?

It’s not just lurking and taking notes anymore:

Online surveys – reaching thousands of people quickly through platforms like Google Forms or SurveyMonkey.

Digital archives – old chat logs, deleted websites, preserved tweets. Historians of the future will love these.

Social media analytics – counting likes, shares, retweets, hashtags, and network connections. Who talks to whom? Which ideas go viral?

Where is all this actually useful?

Youth culture – How do teenagers use Instagram stories to build status? What does “cancel culture” mean to a 16-year-old? Digital ethnography lets researchers see youth worlds that adults rarely enter.

Political mobilization – Remember the protests on social media? Digital research can trace exactly how a hashtag becomes a movement, who the key influencers are, and when outrage turns into action.

Digital inequality – Not everyone has equal access. Some people have fiber-optic broadband; others struggle with a patchy 2G connection on last year’s phone. Some are fluent in digital skills; others feel lost. Digital research shines a light on these gaps—who is included and who is left behind in the digital age.

As Kim points out in his Journal of Asian Sociology piece, digital research opens exciting new doors that traditional methods simply can’t reach. But it also comes with its own headaches: fast-changing platforms, data that can disappear overnight, and the constant challenge of figuring out what “real” even means online. Still, for anyone trying to understand society today, ignoring the digital world is no longer an option.

good to see you are helping us out

The notes are great but I think some of the numbering are wrong which creates a little confusion

Subject: Regarding PDF download Hello Abhishek, We don't provide any PDFs of our notes. So there's no PDF available to…

Pdf cannot be downloaded

Dear Usha, Thank you so much for pointing out the mistake in our notes on *Lihaaf*. You're absolutely right —…

Introduction: Nature of Data in Social Research