Hello,

Hierarchical factors are a very common data structure. For instance, one 
might have municipalities within states within countries within 
continents. Other examples include occupational codes, biological 
species, software types (R within statistical software within analytical 
software), etc.

Such data structures commonly use hierarchical coding systems. For 
example, the 2007 North American Industry Classification System (NAICS) 
<http://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2007>has twenty 
two-digit codes (e.g., 42 = Wholesale trade), within each of these 
varying numbers of 3-digit codes (e.g., 423 = Merchant wholesalers, 
durable goods), then varying numbers of 4-digit codes (4231 = Motor 
Vehicle and Motor Vehicle Parts and Supplies Merchant Wholesalers), then 
varying numbers of five-digit codes, varying numbers of six-digit codes, 
etc. At the lowest level (longest code) one can readily tell all the 
higher levels. For example, 441222 is "Boat Dealers" who are part of 
44122, "Motorcycle, Boat, and Other Motor Vehicle Dealers," which is 
part of 4412 (Other Motor Vehicle Dealers), which is part of 441 (Motor 
Vehicle and Parts Dealers), which is part of 44 (Retail Trade). (The US 
Census Bureau has extended the 6-digit NAICS to an even more 
fine-grained 10-digit system.)

I haven't seen any R packages or sample code that handles this kind of 
data, but I don't want to reinvent the wheel and would rather stand on 
the shoulders of you giants. Is there any package or other R-based 
software out there that handles this kind of data structure?

     Thanks,
     Marsh Feldman






        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to