Thanks for getting back so quickly Ista,

I was actually casting about for any examples of R software that deals 
with this kind of structure. But your question is a good one. Here are a 
few things I'd like to be able to do:

    * Store data in R at the finest level of detail but easily refer to
      higher levels of aggregation. If the data include such higher
      levels, this is trivial, but otherwise I'd like to aggregate
      fairly easily. The following is not functioning code, but it
      should give you the idea:

        start with a data frame (call it d) having row.names = to the 6
        digit NAICS code and columns w/ various variables, assume one is
        named employment.
        d[,"employment"]                       # Would print all
        employment data
        d["441222","employment"]        # Would print only Boat Dealer
        employment
        d["44","employment]                 # Would print total
        employment for Retail Trade

    * Recursive nesting. I'm not sure how to convey this except with
      examples. Suppose the data frame also has a "wages" column with
      average weekly wages in the industry, and the industry code is
      also a factor variable (industry). So a simple analysis of
      variance might look like:

                     w <- aov(wages ~ industry, d)

         But now what I'd like to do is to break this down within 
2-digit sectors. Assuming the data frame has another variable, industry 
2, this would look like:

                     w <- aov(wages ~ industry2/industry)

          But what if we either (a) don't want to bother creating 
separate variables for each level of aggregation in industry or (b) want 
to extended the model formula language to include various nesting 
strategies. This might look like:

                     w <- aov(wages ~ industry//*)                    # 
Nest all meaningful levels 
industry/industry2/industry3/industry4/industry5/industry6. If the 
coding system skips some levels, R is smart enough to omit the skipped 
levels.
                     w <- aov(wages ~ industry//levels 2,4,6)     # I'm 
using "//" as a hypothetical extension to the model language that is 
followed by a "levels" keyword and then a list of levels within the 
hierarchy. This example would expand
                                                                         
                # to aov(wages ~ industry2/industry4/industry6)

         One could extend this last example to include a notation 
allowing the analysis to be repeated at varying levels of depth (e.g., 
industry||2,6) would repeat the ANOVA for industry2 and industry6)

    * Since the factor hierarchy is completely nested (i.e., every
      6-digit industry is below a 5 digit industry), a single function
      can operate on the codes recursively. Three variants come to mind.
      In the first, we'd use some kind of apply function to drill down
      to a certain level and return a list of results, one for each level:

                   means <- drill(wages,industry,mean)                
         # Would return a list. The first component would a vector of 
mean wages for industries at the 2-digit level, the second, a vector for 
the 3-digit level, etc.
                   means <- drill(wages,industry,mean,maxlvl=3)         
# Would stop at the 3rd level of the hierarchy (4-digit code). One could 
also imagine a maxdigits optionas an alternative (maxdigits = y means 
stop at the y-digit level)

    Second, suppose we have a data frame like d, only this time it's a
    time series (each row is a different date). Now we might want to
    generate vectors of the rate of change in employment at each
    industry level. It might look like:

         rate <- function(x) { (x - lag(x))/lag(x)) }
         rates <- as.list()
         i <- 1
         rates <- for j %in% levels(industry)  {                      
                              # The levels function parses the
    hierarchical factor into the various levels of its coding system
                         rates[[i]] <- rate(emplyment[,level(industry)
    == j])             # The level function sets a particular one of
    these levels
                         i <- i + 1
                     }

    A third variant would be a genuinely recursive function that keeps
    on calling itself at each level of the factor until it has either
    reached a pre-specified depth or exhausted all levels of the factor.

I hope this gives you a good idea of the sorts of things one might do 
with hierarchical factors.

     Marsh Feldman



On 5/3/2010 9:57 AM, Ista Zahn wrote:
> Hi Marshell,
> What exactly do you mean by "handles this kind of data structure"?
> What do you want R to do?
>
> Best,
> Ista
>
> On Mon, May 3, 2010 at 9:44 AM, Marshall Feldman<ma...@uri.edu>  wrote:
>    
>> Hello,
>>
>> Hierarchical factors are a very common data structure. For instance, one
>> might have municipalities within states within countries within
>> continents. Other examples include occupational codes, biological
>> species, software types (R within statistical software within analytical
>> software), etc.
>>
>> Such data structures commonly use hierarchical coding systems. For
>> example, the 2007 North American Industry Classification System (NAICS)
>> <http://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2007>has twenty
>> two-digit codes (e.g., 42 = Wholesale trade), within each of these
>> varying numbers of 3-digit codes (e.g., 423 = Merchant wholesalers,
>> durable goods), then varying numbers of 4-digit codes (4231 = Motor
>> Vehicle and Motor Vehicle Parts and Supplies Merchant Wholesalers), then
>> varying numbers of five-digit codes, varying numbers of six-digit codes,
>> etc. At the lowest level (longest code) one can readily tell all the
>> higher levels. For example, 441222 is "Boat Dealers" who are part of
>> 44122, "Motorcycle, Boat, and Other Motor Vehicle Dealers," which is
>> part of 4412 (Other Motor Vehicle Dealers), which is part of 441 (Motor
>> Vehicle and Parts Dealers), which is part of 44 (Retail Trade). (The US
>> Census Bureau has extended the 6-digit NAICS to an even more
>> fine-grained 10-digit system.)
>>
>> I haven't seen any R packages or sample code that handles this kind of
>> data, but I don't want to reinvent the wheel and would rather stand on
>> the shoulders of you giants. Is there any package or other R-based
>> software out there that handles this kind of data structure?
>>
>>      Thanks,
>>      Marsh Feldman
>>
>>
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>      
>
>
>    

-- 
Dr. Marshall Feldman, PhD
Director of Research and Academic Affairs
CUSR Logo
Center for Urban Studies and Research
The University of Rhode Island
email: marsh @ uri .edu (remove spaces)


      Contact Information:


        Kingston:

202 Hart House
Charles T. Schmidt Labor Research Center
The University of Rhode Island
36 Upper College Road
Kingston, RI 02881-0815
tel. (401) 874-5953:
fax: (401) 874-5511


        Providence:

206E Shepard Building
URI Feinstein Providence Campus
80 Washington Street
Providence, RI 02903-1819
tel. (401) 277-5218
fax: (401) 277-5464

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to