Aleksandr Andreev <aleksandr.andreev <at> duke.edu> writes: > > Greetings! > > I am trying plot some data on a map in R. Here's the scenario. > > I have a variable called probworkinghealthy which contains a predicted > probability of employment for every individual in my sample (about > 100,000 observations). > I have another variable, called a001ter, which contains the subject of > residency in the Russian Federation (akin to a US state) for every > individual in the sample. > I have a shape file with the boundaries of all the subjects, called > russia.shp. > > I can plot boxplots of the probability by Federal Subject using > plot(probworkinghealthy ~ a001ter). I can also plot the map using > plot(russia.shp)
So what you need to do is to aggregate the input data frame for individuals, and assign the summary value, here mean, to the correct shapefile geometries. Probably most of what you need is in the maptools package. There is a question about the IDs used to identify the Federal Subject geometries in the shapefile. Assuming that they are named as in the individual data frame, something like: FS <- readShapePoly("russia.shp", IDvar="a001ter") will associate the geometries in the SpatialPolygonsDataFrame with the correct ID - change the IDvar argument value to suit the shapefile. If you do not have IDs in the shapefile to match the unique values of a001ter, you will need to create them. >From there, create a new data frame with row names matching the unique IDs: agg1 <- tapply(probworkinghealthy, a001ter, mean) agg2 <- data.frame(mean_probworkinghealthy=agg1, row.names=names(agg1)) and check row.names(agg2) and sapply(slot(FS, "polygons"), function(x) slot(x, "ID")) for obvious blunders. Merge using: FS1 <- spCbind(FS, agg2) (usually takes a number of tries to get the matching correct). The reason for the complications with IDs is that it is easy to merge data with geometries in the wrong order, and having to provide positive matching should help prevent this. > > Now, I would like to plot the mean probability of employment (i.e. > mean(probworkinghealthy)) on a map of Russia using color coding all > the Federal Subjects. Does anyone know how to do something like that? Then spplot(FS1, "mean_probworkinghealthy") gives a map with default class intervals and colour palette. I suggest that you also consider assigning a coordinate reference system to the geometries as Russia has a large extent, and the correct interpretation of the geometry coordinates may matter. Roger PS. You could consider following up on the R-sig-geo list, or exploring information in the Spatial Task View, or in the list archives. > > Much appreciated, > > Aleks Andreev > Duke University > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.