Topics & Articles

Home

Culture

Ethnic Groups

History

Issues

Links

Viet Nam




Site Tools
Statistical Methodology

Many readers ask how I got the statistics that I present in the different articles at Asian-Nation. In other words, readers want to know how valid and reliable these numbers are. This page explains the data and methodology I use and will hopefully show you that these statistics are in fact trustworthy.


Data Source

Whether the statistics relate to Asian American demographics, intermarriage, population characteristics, etc., there are some elements that they have in common. The first is the source -- The U.S. Census Bureau. For those of you who are not familiar with the Census Bureau, they are the federal government agency charged with conducting the national census every ten years. They also conduct smaller surveys on a more frequent basis. Read the Bureau's statement on data quality to learn about how it ensures its data are valid and reliable.

The specific data source that I generally use is the five-percent Public Use Microdata Samples (PUMS) (note: large 5 MB download) or the American Community Survey (ACS). Both the PUMS and ACS represent a stratified random sample created by subsampling all households that received the census long form. The five-percent PUMS (a 1 in 20 sample) has a sample size of about 14 million respondents while the 2006 ACS is a 1 in 100 sample with a sample size of about 3 million respondents.

As such, both the PUMS and ACS samples represent as accurate a picture of the U.S. population as possible. Very few datasets can match the PUMS in terms of sample size, comprehensiveness, coast-to-coast national coverage, and overall reliability, while the ACS provides the most up-to-date reliable Census dataset available. That is why both are the overwhelming choices for general demographic research among social scientists.


Creating The Statistics

However, PUMS and ACS data are not in the form of pre-made tables or exisiting analysis, as most other Census data is. Instead, both datasets are in "raw" form that only contains data in a spreadsheet format and require a statistical analysis program to tabulate. I use SPSS but other choices include SAS, Stata, Minitab, or Microsoft Excel, etc. Using any one of these statistical analysis programs is standard procedure for anybody tabulating statistics, since it obviously quite faster than doing it by hand using paper and calculator.

Don't let stats overwhelm you © Corbis

In most cases, each racial/ethnic group represented in those tables are respondents who self-identified themselves with that particular racial/ethnic identity as their first choice. In other words, it includes those who identify as just "Chinese" (for example) as well as those who identify as "Chinese White," or "Chinese White Hispanic," etc. as long as "Chinese" was their first racial/ethnic identity response.

Also, in most cases, the sample only includes those who are at least 25 years of age. This is the standard practice in doing demographic research, since it ensures that the sample is representative of those who are the most representative of the target population. In other words, you can't expect a five-year old to have a college degree yet, so he and others like him who are not part of the target population are excluded.

In creating statistics that summarize demographic characteristics (such as in the Demographics, Model Minority, and Population articles), I define the various measures as such:

Socioeconomic Measure

Explanation

Not Proficient in English

Respondent does not speak English at all or speaks English but "not well."

Less than High School

Respondent does not have a high school diploma or equivalent.

College Degree

Respondent has at least a college degree.

Advanced Degree

Respondent has a professional (law or medical) or doctorate degree.

Median Personal Income

Half of all personal incomes for this racial/ethnic group are above this value and half are below it.

Median Family Income

Half of all family incomes for this racial/ethnic group are above this value and half are below it.

Living in Poverty

Respondent lives below the poverty live (based on federal government calculations of marital status, number of dependent children, and income).

Public Assistance

Respondent receives some form of cash welfare benefit(s).

Married, Spouse Present

Respondent is married and lives with spouse in the same household.

In Labor Force

Respondent is in the labor force (currently employed or actively looking for employment).

High Skill Occupation

Respondent has an executive, professional, technical, or upper management occupation.

Median SEI Score

Occupational prestige score for respondent's occupation. Half of all SEI scores for this racial/ethnic group are above this value and half are below.


Calculating Intermarriage Statistics

However, the statistics that most readers want to know more about are the ones that describe intermarriage rates among Asian Americans, especially within the "USR + USR Only" model that includes only marriages in which both spouses are U.S.-raised. Here is how I calculate those particular numbers.

Once I import the raw data into SPSS, I narrow down the sample to (1) all respondents of any race who are married and then (2) all respondents of any race who are 1.5 generation (immigrated to the U.S. at age 13 or younger) or U.S.-born. Now comes the tricky part. At this point, the data is still organized by single respondent -- a single person has his/her own line of variables.

In order to organize the data by married couple, I then divide the dataset in half -- one half includes only husbands and the other half includes just wives. Second, I rename the variables in the wife sample. Then, using their unique serial numbers that match up by married couples, I recombine them into one merged data file (using SPSS's "Merge Files, Add Variables" function) so that the husbands and wives share one line of variables, instead of having their own separate lines.

Now that the data is organized by married couples, it is much easier to see on one line the race/ethnicity of the husband and that of the wife. Then I recode each possible combination of husband and wife's race/ethnicity, and finally run cross tabulations to get the final proportions. After seeing the statistics about interracial marriage, if readers still don't believe that they are valid or reliable, I also present a list of similar studies that also come up with similar data and patterns.

However, in the end, some people will refuse to believe anything that does not fit with their own perception of reality, no matter how many "facts," stats, or empirical research you present to them. In fact, sociologists call this process making an "exception fallacy" -- concluding that one's own personal observations can be generalized to the aggregate level. This happens when people say, "Those stats can't be true because that's not what I see around me" when in fact, they can"t accept that what they see around them is only one localized, particular sample, rather than a national, aggregate-level population.

In large metro areas such as Los Angeles/Orange County, New York City, etc. that contain a large Asian American population, interracial marriage rates tend to be lower because the pool of potential marriage partners who are Asian American is obviously larger. Therefore, when people say, "The interracial marriage rate that I see all around me is lower" probably live in these large metro areas and that is what they are perceiving. But again, they only represent one particular sample location. The national, aggregate-level coverage of Census data is undisputable.

I hope these descriptions show that the data I present throughout Asian-Nation are valid and reliable. If you would like to learn more about the practice and methods of demographic research, be sure to browse through the books that I list in the left column under "Recommended for Further Reading." And as always, feel free to contact me if you have any further questions.



Author Citation

Copyright © 2001- by C.N. Le. Some rights reserved. Creative Commons License

Suggested reference: Le, C.N. . "Statistical Methodology" Asian-Nation: The Landscape of Asian America. <methodology.shtml> ().


Related Articles and Blog Posts