Share Button

Census-Bureau-Logo

The United States Census is rich with information that can be used by economist to understand demographic and social trends.  A popular way of presenting data from the U.S. Census data by plotting it on a map.  The goal of this post is obtain data from the U.S. Census via an API connection and plotting the data on a map using R. The reason for accessing data via an API as opposed to going to the U.S. Census website is speed, ease of querying the database, and repeatbility which I will demonstrate through a few examples.  Demonstrating R’s mapping feature provides a good start to learning how to map data in general.

ACCESSING DATA VIA THE CENSUS API

In order to access the Census API using R you need to download and install the ACS package in addition to going to requesting a secret key from the U.S. Census. This document contains the steps that got me started and contains more detailed information than is found in this post if you are interested in getting an in-depth knowledge of the ACS package. Once you’ve installed the package and requested your API key from the Census you are ready to get started, type the following commands to gain access to the U.S. Census database in addition to installing other libraries used in this post to manipulate data and create the heat maps:

library(acs)
library(sqldf)
library(ggplot2)
library(maps)

api.key.install(key=”your secret key here”)

One of the great features of the being able to access the Census API via the ACS package in R is that one can search for keywords in all the tables using a function called acs.lookup(). The following code searches for tables that contain the keyword ‘bachelor’ in order to find data tables that contain data about educational attainment.

acs.lookup(endyear = 2012, span = 5, dataset = “acs”, keyword = “bachelor”, case.sensitive=F )

Here is a sample of the output from the acs.lookup() function:

table.name
PLACE OF BIRTH BY EDUCATIONAL ATTAINMENT IN THE UNITED STATES
POVERTY STATUS IN THE PAST 12 MONTHS OF INDIVIDUALS BY SEX BY EDUCATIONAL ATTAINMENT

The next step is grab the data from one of the tables of interest. In order to grab the data from the table we must specify the geography of interest. In other words do we want to look at poverty status table by state, by county, by city, by congressional district, or by zip code or a combination of these geographies.  Please refer to the numerous ways of slicing Census data in the ACS documentation or the link provided above for further details. In this post we want to look at poverty per state, so we use the geo.make() function to create this geography before accessing the data.

states = geo.make(state=”*”)

Once the geography is created, we can use that object in the acs.fetch function to actually retrieve the data from the table for the latest year available of 2012 using the col.names = “pretty” to retrieve the table with textual names as opposed to coded variable names.

 census_poverty = acs.fetch(endyear = 2011, span = 5, geo = states, table.name=”POVERTY STATUS IN THE PAST 12 MONTHS OF INDIVIDUALS BY SEX BY EDUCATIONAL ATTAINMENT”,col.names=”pretty”)

In order to create a plot I export the Census data object created with the acs.fetch function as a csv then reimport it as a data frame.  There is probably a more direct way to do this but I haven’t found it.

census_poverty <- read.csv(“census_poverty.csv”, header=TRUE)

#Removing ‘.’ from dataframe names and replacing with ‘_’.
for (i in 1:length(colnames(census_poverty))) {
colnames(census_poverty)[i] <- gsub(‘[.]’, ‘_’, colnames(census_poverty)[i])
}

I finally calculated the poverty rate from the R data frame census_poverty created from the csv import of the Census data.
census_poverty$poverty_rate <- census_poverty$POVERTY_STATUS_IN_THE_PAST_12_MONTHS_OF_INDIVIDUALS_BY_SEX_BY_EDUCATIONAL_ATTAINMENT___Income_in_the_past_12_months_below_poverty_level__/census_poverty$POVERTY_STATUS_IN_THE_PAST_12_MONTHS_OF_INDIVIDUALS_BY_SEX_BY_EDUCATIONAL_ATTAINMENT___Total__

CREATING HEAT MAPS

I like to clean up the look of ggplots with the following theme:

my_theme <- theme(
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.title.x = element_text(colour = “black”,size = 12),
axis.text.x = element_text(colour = “black”, size = 12),
axis.title.y = element_text(colour = “black”,size = 12),
axis.text.y = element_text(colour = “black”, size = 12),
plot.title = element_text(colour = “black”, size = 16, face = “bold”),
axis.ticks = element_blank(),
axis.text.y = element_blank()

)

Then I import the example map data in order to get the data needed to create a map in ggplot2:

example(map_data)

Then I merge the map_data (called choro) with other data sets that I created from the Census along with the data on poverty created in this post using the sqldf package, one can use the merge function instead of the sqldf library, but it is worth learning the sqldf package if one wants to do more extensive data management using SQL statements in R, though I wouldn’t discount the data manipulation capabilities in R.

output1 <- sqldf(“SELECT lower(census_edu_hisp.X) as region, census_edu_hisp.hispanic_ba,census_edu_white. white_ba, census_poverty.poverty_rate
FROM census_edu_hisp
LEFT JOIN census_edu_white
ON census_edu_hisp.X = census_edu_white.X
LEFT JOIN census_poverty
ON census_edu_hisp.X = census_poverty.X”)

output2 <- sqldf(“SELECT *
FROM output1
LEFT JOIN choro
ON output1.region = choro.region “)

Finally, the data is ready for plotting on a heat map with the following code:

qplot(long,lat,data = output2,group = group, fill = poverty_rate, geom = “polygon”) +
scale_fill_gradient(low=’white’, high = muted(‘red’)) +
ggtitle(“% of Population Living in Poverty”) +
my_theme

The final product is this map that shows the % of people living in poverty by state:

CLICK FOR LARGER IMAGE

Povert Rates

Share Button