Data Detectives: American Community Survey release of multiyear data

Monday, December 8, 2008

American Community Survey release of multiyear data

On December 9, 2008, the Census Bureau will release the first set of three-year American Community Survey data for all geographies with populations greater than 20,000. The release will provide the first look at detailed socioeconomic and housing characteristics for geographies between 20,000 and 64,999 since Census 2000. The type of data released and geographies covered can be found here.

Different from a point-in-time estimate

Before I talk about multiyear estimates, it’s important to understand the concept of a period estimate because all ACS estimates are period estimates.

The ACS produces period estimates of socioeconomic and housing characteristics. It is designed to provide estimates that describe the average characteristics of an area over a specific time period. In the case of ACS one-year estimates, the period is the calendar year. For example, the 2007 ACS data describe the population and housing characteristics of an area from January 1, 2007 through December 31, 2007, not for any specific day within the year.

A period estimate is different from a point-in-time estimate. A point-in-time estimate is designed to measure characteristics as of a certain date or narrow time period. For example, the purpose of the decennial census is to count the population living in the United States on a specific date, which is traditionally April 1. Although decennial census data are actually collected over several months, they are designed to provide a snapshot of the U.S. population as of April 1.

Understanding Multiyear Estimates in the American Community Survey

Period for ACS multiyear estimates is either 3 or 5 calendar years. A multiyear estimate is simply a period estimate that encompasses more than one calendar year. In the case of ACS multiyear estimates, the period is either three or five calendar years.

While a one-year estimate includes information collected from independent monthly samples over a 12-month period, a three-year estimate represents data collected from independent samples over a 36-month period, and a five-year estimate includes data collected over a 60-month period. For example, the 2005-2007 ACS three-year estimates describe the population and housing characteristics of an area for the period January 1, 2005 through December 31, 2007, not for any specific day, month, or year within that time period.

The types of ACS estimates published for a particular area or population group are based on established population thresholds. Geographic areas with at least 65,000 people will receive one-, three-, and five-year ACS estimates. Areas with 20,000 or more people will receive three- and five-year estimates. There are a few exceptions to this rule, however. ZIP code tabulation areas, census tracts, and block groups, regardless of their population size, will only receive five-year estimates. Areas with less than 20,000 people, down to the block group level, will only receive five-year estimates.

ACS estimates based on data collected from 2005-2007 should not be labeled "2006" or "2007" estimates. Multiyear estimates do not represent any one year or the midpoint of a period. The correct labeling for multiyear estimate: "The child poverty rate for the 2005-2007 period was X percent."

Perhaps it is obvious, but multiyear estimates must be used when no one-year estimate is available. Unless a geographic area has a population larger than 65,000, that geography will be reliant on multiyear estimates.

Multiyear estimates should also be used when analyzing data for small population groups due to the higher margins of error associated with them. An example of a small population group could be "Families with Female Householder with own Children under 18". The choices posed for using mulityear estimates is more than simply a choice between using the one-year or the multiyear estimates, however, because for many areas there will also be the choice of which multiyear estimate to use, three- or five-year.

For small areas, only five-year estimates are released, but for larger areas, each annual release will provide one-, three-, and five-year estimates. For example, in 2010, there will be three sets of commuting data for San Diego County – one-year estimates for 2009, three-year estimates reflecting 2007-2009, and five-year estimates for the period of 2005-2009. Users need to decide which is the most appropriate for their needs.

In making this choice, one need to consider the tradeoff between currency and reliability. The one-year estimates for an area reflect the most current data but they tend to have higher margins of error than the three- and five-year estimates because they are based on a smaller sample.

The three-year and five-year estimates for an area have larger samples and smaller margins of error than the one-year estimates, but they are less current because the larger samples include data that were collected in earlier years. The main advantage of using multiyear estimates is the increased statistical reliability for smaller geographic areas and small population groups.

There are no hard-and-fast rules on choosing between one-, three-, and five-year data, but the margins of error provided with ACS data can help data users decide on the tradeoff between currency and reliability.

Only compare the same type of estimate:
1-year estimates to other 1-year estimates
3-year estimates to other 3-year estimates
5-year estimates to other 5-year estimates

When comparing estimates from two multiyear periods, it is easier to make comparisons between non-overlapping periods. This is because the difference between two estimates of overlapping periods is driven by the non-overlapping years. To illustrate what I mean, consider the 2005-2007 period and the 2007-2009 period estimates. Both contain the year 2007. Thus, the difference between the 2005-2007 and 2007-2009 estimates is determined by the difference between the 2005 and 2006 estimates versus the 2008 and 2009 estimates.

In this example, the simplest comparison is between the 2005-2007 estimate and the 2008-2010 estimate, which do not include any overlapping years.

There are global differences that exist between the ACS and Census 2000. These include differences in residence rules, universes, and reference periods. For example, the ACS uses a "two-month" residence rule - defined as anyone living for more than two months in the sample unit when the unit is interviewed. On the other hand, Census 2000 used a "usual residence" rule - defined as the place where a person lives or stays most of the time.

The reference periods between the ACS and Census 2000 also differ. For example, the ACS asks respondents to report their income for the 12 months preceding the interview date while Census 2000 asked for a respondent’s income in calendar year 1999.

Also, as discussed earlier, the ACS produces period estimates whereas Census 2000 data are interpreted to be a snapshot of April 1, 2000.

The Census Bureau subject matter specialists have considered all of these differences and have determined that for most population and housing subjects, comparisons can be made. Further information about comparing measures from the ACS and Census 2000 can be found here.

There are other subtlies of ACS data which I'll not touch on, such as controlling to county population estimates.

The ACS Web Site is offering handbooks providing "user-friendly information about the ACS and the new multiyear estimates... Each handbook targets a specific user group including first time ACS data users."

The ACS Compass Presentations, from which this post was partially purloined, can be found here.

Data Analysis and User Education Branch: 301.763.3655

Monday, December 8, 2008

American Community Survey release of multiyear data

No comments: