The Search for Significance: A Crash Course in Statistical Significance Using ACS 2007

If we told you the American Community Survey (ACS) found that 26 percent of Hoosier women between the ages of 35 and 44 had a bachelor's degree or more compared to just 23 percent of men, how can you know if that is a real difference in educational attainment (that is, a statistically significant finding) or just a result of random sampling error? This article provides a brief tutorial on calculating statistical significance for those who want to accurately use ACS data without becoming statisticians.¹

Margins of Error

As with any survey, margins of error are critical—particularly as the size of the population in question decreases (because that typically increases the margin of error). A large margin of error makes the survey estimate less reliable, which can negatively affect your analysis and comparisons. The ACS reports the margin of error for the 90 percent confidence level. Therefore, if we look at the first row in Table 1, we can say that we're 90 percent confident that the number of Hoosier men between the ages of 25 and 34 is between 422,281 and 427,919 (that range—which is the estimate plus or minus the margin of error—is known as the confidence interval). In other words, there's only a 10 percent chance that the actual number of men in that age group falls outside of that range.

Table 1: Educational Attainment and Confidence Intervals for Indiana Men and Women, 2007

Subject	Male			Female
Subject	Estimate	Margin of Error	Confidence Interval	Estimate	Margin of Error	Confidence Interval
Population 25 to 34 years	425,100	+/-2,819	422,281-427,919	412,591	+/-2,879	409,712-415,470
Percent high school graduate or higher	86.7	+/-0.8	85.9-87.5	89.3	+/-0.7	88.6-90
Percent bachelor's degree or higher	23.4	+/-1.0	22.4-24.4	28.1	+/-1.0	27.1-29.1
Population 35 to 44 years	447,489	+/-2,440	445,049-449,929	444,091	+/-2,585	441,506-446,676
Percent high school graduate or higher	87.1	+/-0.9	86.2-88	90.5	+/-0.8	89.7-91.3
Percent bachelor's degree or higher	22.8	+/-0.9	21.9-23.7	25.7	+/-0.9	24.8-26.6
Population 45 to 64 years	796,162	+/-2,157	794,005-798,319	824,930	+/-2,596	822,334-827,526
Percent high school graduate or higher	88.1	+/-0.5	87.6-88.6	89	+/-0.5	88.5-89.5
Percent bachelor's degree or higher	24.2	+/-0.6	23.6-24.8	21.5	+/-0.7	20.8-22.2
Population 65 years and over	328,860	+/-1,151	327,709-330,011	464,296	+/-1,431	462,865-465,727
Percent high school graduate or higher	75.3	+/-1.2	74.1-76.5	73.7	+/-0.9	72.8-74.6
Percent bachelor's degree or higher	18.9	+/-0.9	18-19.8	11.1	+/-0.6	10.5-11.7

Source: IBRC, using data from the U.S. Census Bureau American Community Survey

One might think that this is all the information we need to determine statistical significance: As long as the confidence intervals of two numbers don't overlap, we're good to go, right? Unfortunately, it is a bit more complex than that, and the Census Bureau discourages the use of confidence intervals alone to determine a value's statistical significance. Instead, we should calculate z-scores, which are standardized figures that allow us to make comparisons.

Three Steps to Determining Significance

The first step in determining statistical significance is to convert the margin of error into a standard error. This calculation varies depending on if we are using numbers directly from published ACS tables or if we've done some intermediate calculations on our own, such as calculating a percentage. Since our data do not contain any derived estimates, all we need to do for this step is divide the margin of error value by 1.645.²

The second step is to calculate the z-score itself (see Table 2). If we let A represent the male estimates, use B for the female estimates and use SE(A) and SE(B) for the standard errors of those respective estimates, the formula is as follows:

Table 2: Comparing Male and Female Educational Attainment Z-Scores for Indiana, 2007

Subject	Male (A)			Female (B)			Z-Score Comparing Male and Female Populations*
Subject	Estimate	Margin of Error	Standard Error	Estimate	Margin of Error	Standard Error	Z-Score Comparing Male and Female Populations*
Population 25 to 34 years	425,100	2,819	1,714	412,591	2,879	1,750	5.11
Percent high school graduate or higher	86.7	0.8	0.486	89.3	0.7	0.426	-4.02
Percent bachelor's degree or higher	23.4	1	0.608	28.1	1	0.608	-5.47
Population 35 to 44 years	447,489	2,440	1,483	444,091	2,585	1,571	1.57
Percent high school graduate or higher	87.1	0.9	0.547	90.5	0.8	0.486	-4.64
Percent bachelor's degree or higher	22.8	0.9	0.547	25.7	0.9	0.547	-3.75
Population 45 to 64 years	796,162	2,157	1,311	824,930	2,596	1,578	-14.02
Percent high school graduate or higher	88.1	0.5	0.304	89	0.5	0.304	(2.09)
Percent bachelor's degree or higher	24.2	0.6	0.365	21.5	0.7	0.426	4.82
Population 65 years and over	328,860	1,151	700	464,296	1,431	870	-121.32
Percent high school graduate or higher	75.3	1.2	0.729	73.7	0.9	0.547	1.75
Percent bachelor's degree or higher	18.9	0.9	0.547	11.1	0.6	0.365	11.86

Note: Bold cells are significant at the 99 percent confidence level.
Source: IBRC, using data from the U.S. Census Bureau American Community Survey

Here's an important note for Excel users: When downloading percentage data from American FactFinder, it will format the values as percents (22.8%), which Excel stores in decimal form (0.228). The margins of error, however, are stored as regular numbers (0.9). As one can imagine, mixing those two formats yields utterly meaningless z-scores. Therefore, always make sure to convert any percentages to numeric format (22.8) so they are in the same units as the margin of error before calculating the z-score.

The third step is to use the z-score to determine if the difference between the genders is significant or if random chance can explain the difference. Table 3 provides the z-score thresholds with their corresponding confidence level. Essentially, as the absolute value of the z-score becomes larger, the more confident we are that a real difference in the estimates exists. Looking back at Table 2, we find that nearly all of the values are significant at the 99 percent level, which means that we're 99 percent sure that the difference is not due to random chance.

Table 3: Z-Scores and Levels of Significance

If …	Then the difference between A and B is …
z < - 1.645 or z > 1.645	Significant at the 90 percent confidence level
z < - 1.96 or z > 1.96	Significant at the 95 percent confidence level
z < - 2.576 or z > 2.576	Significant at the 99 percent confidence level

Source: U.S. Census Bureau American Community Survey

For more information, download the Census Bureau's instructions on statistical testing and ACS available at www.census.gov/programs-surveys/acs/guidance.html.

Notes

Data in this article are extracted from Table S1501 in the 2007 American Community Survey dataset, available via American Factfinder at https://data.census.gov.
The denominator is 1.645 for ACS data from 2006 and later; For ACS data from 2005 or earlier, 1.65 should be used. For the Census Bureau recommended calculations for derived estimates, visit http://census.gov/programs-surveys/acs/guidance.html

Rachel Justis, Geodemographic Analyst
Indiana Business Research Center, Kelley School of Business, Indiana University