Dear list,

I came up with a two functions that flatten arbitrary deeply nested lists (as long as they're named; not tested for unnamed) and environments (see attachment; 'flatten_examples.txt' contains some examples).

The paradigm is somewhat similar to that implemented in 'unlist()', yet extends it. I would have very much liked to build upon the superfast functionality of 'unlist()', but there are some obstacles to this (see these two related posts at r-devel from today: https://stat.ethz.ch/pipermail/r-devel/2011-May/061070.html and https://stat.ethz.ch/pipermail/r-devel/2011-May/061071.html).

Therefore, I had to use a recursive looping paradigm. Yet, if anyone has some suggestions on how to speed things up (maybe some Rcpp-people feel "called upon"?!? ;-)), I'd appreciate any pointers. Yet I do hope that what I came up with is at least of some value for those that posted similar questions on how to flexibly flatten nested objects in the past (that's why I'm also referring to this older post below; I also build upon the code provided by Henrique Dallazuanna and Mark Heckmann).

Best regards,
Janko

PS: Maybe this should rather go into a blog-post, but I don't have one yet ;-)

On 19.05.2011 22:16, Janko Thyson wrote:
From: Mark Heckmann <mark.heckmann_at_gmx.de <mailto:mark.heckmann_at_gmx.de?Subject=Re:%20[R]%20how%20to%20flatten%20a%20list%20to%20the%20same%20level?>>
Date: Sat, 09 Jan 2010 13:49:15 +0100

Henrique,

thanks for the code!! It works out fine for vectors. I forgot to mention I also have dataframes as list elements. Thus I want the structure of the list element to be kept intact.

I tried an recursive approach (which unfortunately resulted in some more code) which works.

.getNonListElements <- function(x, env){

        if(class(x)=="list") {
                for(i in seq(along=x)) .getNonListElements(x[[i]], env) # call
recursively
        } else {
                res<- get("res", envir = env)              # get res from other 
env
                res<- c(res, list(x))                                # add one 
list element
                assign("res", res, envir=env)         # assign back to env
        }

}

flattenList <- function(l){

        res<- list()                                         # make list object
        env<- environment()                                  # get current env  
     
        .getNonListElements(l, env)             # search for non list elements 
recursively
        return(res)

}

l <- list(DF=data.frame(A=c(1,2)), vec=c("a", "b")) l <- list(l,l)

> flattenList(l)

[[1]]

   A
1 1
2 2

[[2]]
[1] "a" "b"

[[3]]

   A
1 1
2 2

[[4]]
[1] "a" "b"

I am not sure if one can avoid the wrapper function or still use rapply to simplify the code. I do not know how. One more thing I would like to add are the objects names to the generated list. But I did not succeed in that.

Mark

Am 08.01.2010 um 18:29 schrieb Henrique Dallazuanna:

>  Try something about like this:
>
>  split(unlist(l), rep(1:length(idx<- rapply(l, length)), idx))
>
>  On Fri, Jan 8, 2010 at 1:35 PM, Mark Heckmann<mark.heckmann_at_gmx.de>
>  wrote:
>>  I have a nested list l like:
>>
>>  l<- list(A=c(1,2,3), B=c("a", "b"))
>>  l<- list(l,l, list(l,l))
>>
>>  I want the list to be unlisted, but not on the lowest level of each
>>  "branch".
>>  I want the lowest level of each list branch to remain as it is.
>>  So unlist or unlist(rec=F) do not work here as the level of nesting
>>  may
>>  differ on the elements.
>>  The result should look like:
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  Any ideas?
>>  TIA!
>>
>>  Mark
>>
>>
>>  
------------------------------------------------------------------------------
>>  Mark Heckmann
>>  Dipl. Wirt.-Ing. cand. Psych.
>>  Vorstraße 93 B01
>>  28359 Bremen
>>  Blog:www.markheckmann.de
>>  R-Blog:http://ryouready.wordpress.com
>>
>>  ______________________________________________
>>  R-help_at_r-project.org mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>>  and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- > Henrique Dallazuanna
>  Curitiba-Paraná-Brasil
>  25° 25' 40" S 49° 16' 22" O

------------------------------------------------------------------------------

Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com
--
------------------------------------------------------------------------

*Janko Thyson*
janko.thy...@ku-eichstaett.de <mailto:janko.thy...@ku-eichstaett.de>

Catholic University of Eichstätt-Ingolstadt
Ingolstadt School of Management
Statistics and Quantitative Methods
Auf der Schanz 49
D-85049 Ingolstadt

www.wfi.edu/lsqm <http://www.wfi.edu/lsqm>

Fon: +49 841 937-1923
Fax: +49 841 937-1965

This e-mail and any attachment is for authorized use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by any other party.
If you are not an intended recipient then please promptly delete this
e-mail and any attachment and all copies and inform the sender.



--
------------------------------------------------------------------------

*Janko Thyson*
janko.thy...@googlemail.com <mailto:janko.thy...@googlemail.com>

Jesuitenstraße 3
D-85049 Ingolstadt

Mobile: +49 (0)176 83294257

This e-mail and any attachment is for authorized use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by any other party.
If you are not an intended recipient then please promptly delete this
e-mail and any attachment and all copies and inform the sender.

#' Coerce Environment to List (Recursively).
#'
#' Recursively coerces an \code{environment} to a \code{list}.  
#'
#' @param src A an \code{environment} that should be coerced.
#' @param ... Further args.
#' @return A named \code{list} that corresponds to the recursively coerced
#' initial \code{environment}.
#' @callGraphPrimitives
#' @author Janko Thyson \email{janko.thyson.rstuff@@googlemail.com}
#' @seealso \code{\link{flatten}}
#' @example inst/examples/envirAsList.R
envirAsList <- function(
    src, 
    ...
){
    if(class(src) == "environment"){
        envir   <- new.env()  
        src     <- as.list(src)
        # LOOP OVER ELEMENTS
        out <- lapply(seq(along=src), function(x.src){
                envir$names <- c(envir$names, names(src[x.src]))
                # RECURSIVE FLATTENING
                out <- envirAsList(src[[x.src]])
                return(out)
            })
        names(out) <- envir$names
        # /
    } else {
        out <- src 
    }
    return(out)
}
#' Flatten (Nested) Lists or Environments.
#'
#' Flatten \code{lists} or \code{environments} according to specifications
#' made via arg \code{start.after} and/or arg \code{stop.at}. When keeping 
#' the defaults, the function will traverse arg \code{src} (if \code{src} is 
#' an \code{environment}, it is coerced to a \code{list} 
#' via \code{\link{envirAsList}} first) to retrieve the values at the 
#' respective bottom layers/bottom elements. These values are arranged in a 
#' named \code{list} where the respective names can be interpreted as the
#' the paths to the retrieved values. See examples.   
#'
#' @param src A named (arbitrary deeply nested) \code{list} or an 
#' \code{environment} that should be flattened.
#' @param start.after An \code{integer} specifying the layer after which to 
#' start the flattening. \code{NULL} means to start at the very top. See
#' examples.
#' @param stop.at An \code{integer} specifying the layer at which to stop
#' the flattening. \code{NULL} means there is not stop criterion.
#' @delim.path A \code{character} (length: 1) specifying how the names of 
#' the resulting flattened list should be pasted.
#' @param .do.debug If \code{TRUE}, print information that might be helpful
#' for debugging.
#' @param ... Further args.
#' @return A named \code{list} that features the desired degree of flattening.
#' @callGraphPrimitives
#' @author Janko Thyson \email{janko.thyson.rstuff@@googlemail.com}
#' @seealso \code{\link{envirAsList}}
#' @example inst/examples/flatten.R
flatten <- function(
    src, 
    start.after=NULL, 
    stop.at=NULL, 
    delim.path="/",
    do.warn=TRUE,
    .do.debug=FALSE,
    ...
){
    #---------------------------------------------------------------------------
    # VALIDATE
    #---------------------------------------------------------------------------
    
    if(!is.list(src) & !is.environment(src)){
        stop("Arg 'src' must be a 'list' or an 'environment'.")
    }
    if(!is.null(start.after) & !is.null(stop.at)){
        if(start.after == 1& stop.at == 1){
            msg <- c(
                "Invalid specification:",
                paste("* start.after: ", start.after, sep=""),
                paste("* stop.at:     ", stop.at, sep="")
            )
            stop(cat(msg, sep="\n"))
        }
    }
    # /VALIDATE ----------
    
    #---------------------------------------------------------------------------
    # INNER FUNCTIONS
    #---------------------------------------------------------------------------
    
    .startAfterInner <- function(
        envir,
        nms,
        out.1,
        do.reset=FALSE,
        ...
    ){
        .do.debug <- envir$.do.debug
        idx.diff <- diff(c(envir$start.after, length(envir$counter)))
        if(.do.debug){
            cat(c("", "+++", ""), sep="\n")
            #                print("+++")
            cat("names:", sep="\n")
            print(names(out.1))
            cat("envir$counter:", sep="\n")
            print(envir$counter)
            cat("idx.diff:", sep="\n")
            print(idx.diff)            
        }
        # UPDATE IF DEGREE OF NESTEDNESS EXCEEDS START CRITERION
        if(idx.diff > 0){
            idx.cutoff      <- (
                length(envir$counter)-idx.diff+1):length(envir$counter
            ) 
            idx.left        <- envir$counter[-idx.cutoff]
            nms.1           <- nms[idx.cutoff]
            names(out.1)    <- paste(nms.1, collapse="/") 
            # UPDATE SRC
            idx.append <- sapply(envir$history, function(x.hist){
                    all(idx.left == x.hist)        
                })
            if(.do.debug){
                cat("idx.cutoff:", sep="\n")
                print(idx.cutoff)
                cat("idx.left:", sep="\n")
                print(idx.left)
                cat("idx.append:", sep="\n")
                print(idx.append)
                cat("names remaining:", sep="\n")
                print(names(out.1))
            }
            if(any(idx.append)){                                          
                envir$src[[idx.left]] <- append(envir$src[[idx.left]], 
                    values=out.1)                    
            } else {
                envir$src[[idx.left]] <- out.1
                # UPDATE HISTORY
                envir$history <- c(envir$history, list(idx.left))
            }
            envir$out <- envir$src
            # /             
        } 
        if(idx.diff < 0){
            envir$out <- envir$src
#            if(envir$do.warn & !envir$do.block.warning){
#                warning(paste("Argument 'start.after=", envir$start.after, 
#                    "' exceeds maximum degree of nestedness (=", 
#                    envir$start.after + idx.diff, ").", sep=""))
#                envir$do.block.warning <- TRUE
#            }
        }
        # /
        # RESET
        if(do.reset){
            envir$nms       <- envir$nms[-length(envir$nms)]
            envir$counter   <- envir$counter[-length(envir$counter)]
        }
        # /
        return(TRUE)
    }
    
    .updateOutInner <- function(
        envir,
        out.1,
        do.reset=FALSE,
        ...
    ){
        .do.debug <- envir$.do.debug
        # UPDATE OUT
        out.0           <- get("out", envir = envir)
        out             <- c(out.0, out.1)
        envir$out       <- out
        # /
        # RESET
        if(do.reset){
            envir$nms       <- envir$nms[-length(envir$nms)]
            envir$counter   <- envir$counter[-length(envir$counter)]
        }
        # /
        return(TRUE)
    }
    
    .flattenInner <- function(
        x, 
        envir, 
        ...
    ){
        .do.debug <- envir$.do.debug
        if( (class(x)=="list" & length(x) != 0) |
            (class(x) == "environment" & length(x) != 0)
            ){
            if(class(x) == "environment"){
                x <- as.list(x)
            }
            # UPDATE
            envir$counter.history <- c(envir$counter.history, 
list(envir$counter))
            # EXIT IF DEGREE EXCEEDS CUTOFF
            if(!is.null(envir$stop.at)){
                if(length(envir$counter) > envir$stop.at){ 
                    # THIS
                    nms             <- get("nms", envir=envir)
                    if(.do.debug){
                        cat("names:", sep="\n")
                        print(paste(nms, collapse=envir$delim.path))
                    }
                    out.1           <- list(x)
                    names(out.1)    <- paste(nms, collapse=envir$delim.path)
                    # /
                    # DECISION ON FLATTENING
                    if(!is.null(envir$start.after)){
                        .startAfterInner(envir=envir, nms=nms, out.1=out.1, 
                            do.reset=TRUE)
                        return(NULL)
                        #                    }
                        # /
                    } else {
                        .updateOutInner(envir=envir, out.1=out.1, do.reset=TRUE)
                        return(NULL)
                    }
                }
            }
            # /
            # LOOP OVER ELEMENTS
            for(i in seq(along=x)){
                # UPDATE COUNTER
                envir$counter <- c(envir$counter, i)
                # UPDATE NAMES
                assign("nms", c(get("nms", envir=envir), names(x[i])), 
envir=envir)
                # RECURSIVE FLATTENING
                .flattenInner(x[[i]], envir) # call  recursively
                # RESET COUNTER
                if(i == length(x)){
                    envir$nms       <- envir$nms[-length(envir$nms)]
                    envir$counter   <- envir$counter[-length(envir$counter)]
                }
                # /
            }
            # /
        } else {
            # THIS
            nms             <- get("nms", envir=envir)
            if(.do.debug){
                cat("names:", sep="\n")
                print(paste(nms, collapse=envir$delim.path))
            }
            out.1           <- list(x)
            names(out.1)    <- paste(nms, collapse=envir$delim.path)
            # /
            # DECISION ON FLATTENING
            if(!is.null(envir$start.after)){
                .startAfterInner(envir=envir, nms=nms, out.1=out.1)
            } else {
                .updateOutInner(envir=envir, out.1=out.1)
            }
            if(.do.debug){
                cat("out.1:", sep="\n")
                print(out.1)
            }
            # RESET
            envir$nms       <- envir$nms[-length(envir$nms)]
            envir$counter   <- envir$counter[-length(envir$counter)]
            # /
        }
        return(TRUE)
    }
    
    # /INNER FUNCTIONS ----------
    
    #---------------------------------------------------------------------------
    # ACTUAL PROCESSING
    #---------------------------------------------------------------------------
    
    # COERCE TO LIST
    if(class(src) == "environment"){
        src <- envirAsList(src=src)
    }
    # /
    # PRESERVE ORIGINAL (just in case)
    src.0               <- src
    out                 <- list()
    # ENVIR
    envir               <- new.env()
    envir$.do.debug     <- .do.debug
    envir$counter       <- NULL
    envir$counter.history <- NULL
    envir$delim.path    <- delim.path
    envir$do.warn       <- do.warn
    envir$do.block.warning    <- FALSE
    envir$history       <- NULL
    envir$nms           <- NULL
    envir$out           <- list()
    envir$src           <- src
    envir$start.after   <- start.after
    if(!is.null(stop.at)){
        stop.at.0 <- stop.at
        if(stop.at == 1){
            return(src)
        } else {
            stop.at <- stop.at - 1
        }
    }
    envir$stop.at       <- stop.at
    # /
    # APPLY INNER
    .flattenInner(src, envir)
    
    if(envir$do.warn){
        max.length <- max(sapply(envir$counter.history, function(x){
            length(x)        
        }))
#        if(!envir$do.block.warning){
        if(!is.null(start.after)){            
            if(start.after > max.length){                        
                warning(paste("Argument 'start.after=", start.after, 
                    "' exceeds maximum degree of sublayer nestedness (=", 
                    max.length, ").", sep=""))
            }
        }
        if(!is.null(stop.at)){
            if(stop.at.0 > max.length){
                warning(paste("Argument 'stop.at=", stop.at.0, 
                    "' exceeds maximum degree of sublayer nestedness (=", 
                    max.length, ").", sep=""))    
            }
        }
    }
    
    out <- envir$out
    
    # /ACTUAL PROCESSING ----------
    
    return(out)    
}
#-------------------------------------------------------------------------------
# SIMPLE LISTS
#-------------------------------------------------------------------------------

src <- list(DF=data.frame(A=c(1,2)), vec=c("a", "b")) 
src <- list(src,src)

flatten(src) 

# /SIMPLE LISTS ----------

#-------------------------------------------------------------------------------
# NESTED LISTS
#-------------------------------------------------------------------------------

src <- list(a=list(a.1=list(a.1.1=list(a.1.1.1=NA), a.1.2=5), 
    a.2=list(a.2.1=list())), b=NULL)

flatten(src)
flatten(src, start.after=1)
flatten(src, start.after=1, .do.debug=TRUE)
flatten(src, start.after=2)
flatten(src, start.after=3)
flatten(src, start.after=4)

flatten(src, stop.at=1)
flatten(src, stop.at=2)
flatten(src, stop.at=3)
flatten(src, stop.at=4)

flatten(src, start.after=1, stop.at=1)
flatten(src, start.after=1, stop.at=2)
flatten(src, start.after=1, stop.at=3)
flatten(src, start.after=1, stop.at=4)
flatten(src, start.after=2, stop.at=4)

# /NESTED LISTS ----------

#-------------------------------------------------------------------------------
# ENVIRONMENTS
#-------------------------------------------------------------------------------

envir <- new.env()
envir$a <- new.env()
envir$a$a.1 <- new.env()
envir$a$a.1$a.1.1 <- new.env()
envir$a$a.1$a.1.1$a.1.1.1 <- NA
envir$a$a.1$a.1.2 <- 5
envir$a$a.2 <- new.env()
envir$a$a.2$a.2.1 <- list()
envir$b <- NULL

envirAsList(src=envir)

flatten(envir)
flatten(envir, start.after=1)
flatten(envir, start.after=1, .do.debug=TRUE)
flatten(envir, start.after=2)
flatten(envir, start.after=3)
flatten(envir, start.after=4)

flatten(envir, stop.at=1)
flatten(envir, stop.at=2)
flatten(envir, stop.at=3)
flatten(envir, stop.at=4)

flatten(envir, start.after=1, stop.at=1)
flatten(envir, start.after=1, stop.at=2)
flatten(envir, start.after=1, stop.at=3)
flatten(envir, start.after=1, stop.at=4)

# /ENVIRONMENTS ----------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to