Dear list,
I came up with a two functions that flatten arbitrary deeply nested
lists (as long as they're named; not tested for unnamed) and
environments (see attachment; 'flatten_examples.txt' contains some
examples).
The paradigm is somewhat similar to that implemented in 'unlist()', yet
extends it. I would have very much liked to build upon the superfast
functionality of 'unlist()', but there are some obstacles to this (see
these two related posts at r-devel from today:
https://stat.ethz.ch/pipermail/r-devel/2011-May/061070.html and
https://stat.ethz.ch/pipermail/r-devel/2011-May/061071.html).
Therefore, I had to use a recursive looping paradigm. Yet, if anyone has
some suggestions on how to speed things up (maybe some Rcpp-people feel
"called upon"?!? ;-)), I'd appreciate any pointers. Yet I do hope that
what I came up with is at least of some value for those that posted
similar questions on how to flexibly flatten nested objects in the past
(that's why I'm also referring to this older post below; I also build
upon the code provided by Henrique Dallazuanna and Mark Heckmann).
Best regards,
Janko
PS: Maybe this should rather go into a blog-post, but I don't have one
yet ;-)
On 19.05.2011 22:16, Janko Thyson wrote:
From: Mark Heckmann <mark.heckmann_at_gmx.de
<mailto:mark.heckmann_at_gmx.de?Subject=Re:%20[R]%20how%20to%20flatten%20a%20list%20to%20the%20same%20level?>>
Date: Sat, 09 Jan 2010 13:49:15 +0100
Henrique,
thanks for the code!! It works out fine for vectors. I forgot to
mention I also have dataframes as list elements. Thus I want the
structure of the list element to be kept intact.
I tried an recursive approach (which unfortunately resulted in some
more code) which works.
.getNonListElements <- function(x, env){
if(class(x)=="list") {
for(i in seq(along=x)) .getNonListElements(x[[i]], env) # call
recursively
} else {
res<- get("res", envir = env) # get res from other
env
res<- c(res, list(x)) # add one
list element
assign("res", res, envir=env) # assign back to env
}
}
flattenList <- function(l){
res<- list() # make list object
env<- environment() # get current env
.getNonListElements(l, env) # search for non list elements
recursively
return(res)
}
l <- list(DF=data.frame(A=c(1,2)), vec=c("a", "b")) l <- list(l,l)
> flattenList(l)
[[1]]
A
1 1
2 2
[[2]]
[1] "a" "b"
[[3]]
A
1 1
2 2
[[4]]
[1] "a" "b"
I am not sure if one can avoid the wrapper function or still use
rapply to simplify the code. I do not know how. One more thing I would
like to add are the objects names to the generated list. But I did not
succeed in that.
Mark
Am 08.01.2010 um 18:29 schrieb Henrique Dallazuanna:
> Try something about like this:
>
> split(unlist(l), rep(1:length(idx<- rapply(l, length)), idx))
>
> On Fri, Jan 8, 2010 at 1:35 PM, Mark Heckmann<mark.heckmann_at_gmx.de>
> wrote:
>> I have a nested list l like:
>>
>> l<- list(A=c(1,2,3), B=c("a", "b"))
>> l<- list(l,l, list(l,l))
>>
>> I want the list to be unlisted, but not on the lowest level of each
>> "branch".
>> I want the lowest level of each list branch to remain as it is.
>> So unlist or unlist(rec=F) do not work here as the level of nesting
>> may
>> differ on the elements.
>> The result should look like:
>>
>> $A
>> [1] 1 2 3
>>
>> $B
>> [1] "a" "b"
>>
>> $A
>> [1] 1 2 3
>>
>> $B
>> [1] "a" "b"
>>
>> $A
>> [1] 1 2 3
>>
>> $B
>> [1] "a" "b"
>>
>> $A
>> [1] 1 2 3
>>
>> $B
>> [1] "a" "b"
>>
>> Any ideas?
>> TIA!
>>
>> Mark
>>
>>
>>
------------------------------------------------------------------------------
>> Mark Heckmann
>> Dipl. Wirt.-Ing. cand. Psych.
>> Vorstraße 93 B01
>> 28359 Bremen
>> Blog:www.markheckmann.de
>> R-Blog:http://ryouready.wordpress.com
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
------------------------------------------------------------------------------
Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com
--
------------------------------------------------------------------------
*Janko Thyson*
janko.thy...@ku-eichstaett.de <mailto:janko.thy...@ku-eichstaett.de>
Catholic University of Eichstätt-Ingolstadt
Ingolstadt School of Management
Statistics and Quantitative Methods
Auf der Schanz 49
D-85049 Ingolstadt
www.wfi.edu/lsqm <http://www.wfi.edu/lsqm>
Fon: +49 841 937-1923
Fax: +49 841 937-1965
This e-mail and any attachment is for authorized use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by any other party.
If you are not an intended recipient then please promptly delete this
e-mail and any attachment and all copies and inform the sender.
--
------------------------------------------------------------------------
*Janko Thyson*
janko.thy...@googlemail.com <mailto:janko.thy...@googlemail.com>
Jesuitenstraße 3
D-85049 Ingolstadt
Mobile: +49 (0)176 83294257
This e-mail and any attachment is for authorized use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by any other party.
If you are not an intended recipient then please promptly delete this
e-mail and any attachment and all copies and inform the sender.
#' Coerce Environment to List (Recursively).
#'
#' Recursively coerces an \code{environment} to a \code{list}.
#'
#' @param src A an \code{environment} that should be coerced.
#' @param ... Further args.
#' @return A named \code{list} that corresponds to the recursively coerced
#' initial \code{environment}.
#' @callGraphPrimitives
#' @author Janko Thyson \email{janko.thyson.rstuff@@googlemail.com}
#' @seealso \code{\link{flatten}}
#' @example inst/examples/envirAsList.R
envirAsList <- function(
src,
...
){
if(class(src) == "environment"){
envir <- new.env()
src <- as.list(src)
# LOOP OVER ELEMENTS
out <- lapply(seq(along=src), function(x.src){
envir$names <- c(envir$names, names(src[x.src]))
# RECURSIVE FLATTENING
out <- envirAsList(src[[x.src]])
return(out)
})
names(out) <- envir$names
# /
} else {
out <- src
}
return(out)
}
#' Flatten (Nested) Lists or Environments.
#'
#' Flatten \code{lists} or \code{environments} according to specifications
#' made via arg \code{start.after} and/or arg \code{stop.at}. When keeping
#' the defaults, the function will traverse arg \code{src} (if \code{src} is
#' an \code{environment}, it is coerced to a \code{list}
#' via \code{\link{envirAsList}} first) to retrieve the values at the
#' respective bottom layers/bottom elements. These values are arranged in a
#' named \code{list} where the respective names can be interpreted as the
#' the paths to the retrieved values. See examples.
#'
#' @param src A named (arbitrary deeply nested) \code{list} or an
#' \code{environment} that should be flattened.
#' @param start.after An \code{integer} specifying the layer after which to
#' start the flattening. \code{NULL} means to start at the very top. See
#' examples.
#' @param stop.at An \code{integer} specifying the layer at which to stop
#' the flattening. \code{NULL} means there is not stop criterion.
#' @delim.path A \code{character} (length: 1) specifying how the names of
#' the resulting flattened list should be pasted.
#' @param .do.debug If \code{TRUE}, print information that might be helpful
#' for debugging.
#' @param ... Further args.
#' @return A named \code{list} that features the desired degree of flattening.
#' @callGraphPrimitives
#' @author Janko Thyson \email{janko.thyson.rstuff@@googlemail.com}
#' @seealso \code{\link{envirAsList}}
#' @example inst/examples/flatten.R
flatten <- function(
src,
start.after=NULL,
stop.at=NULL,
delim.path="/",
do.warn=TRUE,
.do.debug=FALSE,
...
){
#---------------------------------------------------------------------------
# VALIDATE
#---------------------------------------------------------------------------
if(!is.list(src) & !is.environment(src)){
stop("Arg 'src' must be a 'list' or an 'environment'.")
}
if(!is.null(start.after) & !is.null(stop.at)){
if(start.after == 1& stop.at == 1){
msg <- c(
"Invalid specification:",
paste("* start.after: ", start.after, sep=""),
paste("* stop.at: ", stop.at, sep="")
)
stop(cat(msg, sep="\n"))
}
}
# /VALIDATE ----------
#---------------------------------------------------------------------------
# INNER FUNCTIONS
#---------------------------------------------------------------------------
.startAfterInner <- function(
envir,
nms,
out.1,
do.reset=FALSE,
...
){
.do.debug <- envir$.do.debug
idx.diff <- diff(c(envir$start.after, length(envir$counter)))
if(.do.debug){
cat(c("", "+++", ""), sep="\n")
# print("+++")
cat("names:", sep="\n")
print(names(out.1))
cat("envir$counter:", sep="\n")
print(envir$counter)
cat("idx.diff:", sep="\n")
print(idx.diff)
}
# UPDATE IF DEGREE OF NESTEDNESS EXCEEDS START CRITERION
if(idx.diff > 0){
idx.cutoff <- (
length(envir$counter)-idx.diff+1):length(envir$counter
)
idx.left <- envir$counter[-idx.cutoff]
nms.1 <- nms[idx.cutoff]
names(out.1) <- paste(nms.1, collapse="/")
# UPDATE SRC
idx.append <- sapply(envir$history, function(x.hist){
all(idx.left == x.hist)
})
if(.do.debug){
cat("idx.cutoff:", sep="\n")
print(idx.cutoff)
cat("idx.left:", sep="\n")
print(idx.left)
cat("idx.append:", sep="\n")
print(idx.append)
cat("names remaining:", sep="\n")
print(names(out.1))
}
if(any(idx.append)){
envir$src[[idx.left]] <- append(envir$src[[idx.left]],
values=out.1)
} else {
envir$src[[idx.left]] <- out.1
# UPDATE HISTORY
envir$history <- c(envir$history, list(idx.left))
}
envir$out <- envir$src
# /
}
if(idx.diff < 0){
envir$out <- envir$src
# if(envir$do.warn & !envir$do.block.warning){
# warning(paste("Argument 'start.after=", envir$start.after,
# "' exceeds maximum degree of nestedness (=",
# envir$start.after + idx.diff, ").", sep=""))
# envir$do.block.warning <- TRUE
# }
}
# /
# RESET
if(do.reset){
envir$nms <- envir$nms[-length(envir$nms)]
envir$counter <- envir$counter[-length(envir$counter)]
}
# /
return(TRUE)
}
.updateOutInner <- function(
envir,
out.1,
do.reset=FALSE,
...
){
.do.debug <- envir$.do.debug
# UPDATE OUT
out.0 <- get("out", envir = envir)
out <- c(out.0, out.1)
envir$out <- out
# /
# RESET
if(do.reset){
envir$nms <- envir$nms[-length(envir$nms)]
envir$counter <- envir$counter[-length(envir$counter)]
}
# /
return(TRUE)
}
.flattenInner <- function(
x,
envir,
...
){
.do.debug <- envir$.do.debug
if( (class(x)=="list" & length(x) != 0) |
(class(x) == "environment" & length(x) != 0)
){
if(class(x) == "environment"){
x <- as.list(x)
}
# UPDATE
envir$counter.history <- c(envir$counter.history,
list(envir$counter))
# EXIT IF DEGREE EXCEEDS CUTOFF
if(!is.null(envir$stop.at)){
if(length(envir$counter) > envir$stop.at){
# THIS
nms <- get("nms", envir=envir)
if(.do.debug){
cat("names:", sep="\n")
print(paste(nms, collapse=envir$delim.path))
}
out.1 <- list(x)
names(out.1) <- paste(nms, collapse=envir$delim.path)
# /
# DECISION ON FLATTENING
if(!is.null(envir$start.after)){
.startAfterInner(envir=envir, nms=nms, out.1=out.1,
do.reset=TRUE)
return(NULL)
# }
# /
} else {
.updateOutInner(envir=envir, out.1=out.1, do.reset=TRUE)
return(NULL)
}
}
}
# /
# LOOP OVER ELEMENTS
for(i in seq(along=x)){
# UPDATE COUNTER
envir$counter <- c(envir$counter, i)
# UPDATE NAMES
assign("nms", c(get("nms", envir=envir), names(x[i])),
envir=envir)
# RECURSIVE FLATTENING
.flattenInner(x[[i]], envir) # call recursively
# RESET COUNTER
if(i == length(x)){
envir$nms <- envir$nms[-length(envir$nms)]
envir$counter <- envir$counter[-length(envir$counter)]
}
# /
}
# /
} else {
# THIS
nms <- get("nms", envir=envir)
if(.do.debug){
cat("names:", sep="\n")
print(paste(nms, collapse=envir$delim.path))
}
out.1 <- list(x)
names(out.1) <- paste(nms, collapse=envir$delim.path)
# /
# DECISION ON FLATTENING
if(!is.null(envir$start.after)){
.startAfterInner(envir=envir, nms=nms, out.1=out.1)
} else {
.updateOutInner(envir=envir, out.1=out.1)
}
if(.do.debug){
cat("out.1:", sep="\n")
print(out.1)
}
# RESET
envir$nms <- envir$nms[-length(envir$nms)]
envir$counter <- envir$counter[-length(envir$counter)]
# /
}
return(TRUE)
}
# /INNER FUNCTIONS ----------
#---------------------------------------------------------------------------
# ACTUAL PROCESSING
#---------------------------------------------------------------------------
# COERCE TO LIST
if(class(src) == "environment"){
src <- envirAsList(src=src)
}
# /
# PRESERVE ORIGINAL (just in case)
src.0 <- src
out <- list()
# ENVIR
envir <- new.env()
envir$.do.debug <- .do.debug
envir$counter <- NULL
envir$counter.history <- NULL
envir$delim.path <- delim.path
envir$do.warn <- do.warn
envir$do.block.warning <- FALSE
envir$history <- NULL
envir$nms <- NULL
envir$out <- list()
envir$src <- src
envir$start.after <- start.after
if(!is.null(stop.at)){
stop.at.0 <- stop.at
if(stop.at == 1){
return(src)
} else {
stop.at <- stop.at - 1
}
}
envir$stop.at <- stop.at
# /
# APPLY INNER
.flattenInner(src, envir)
if(envir$do.warn){
max.length <- max(sapply(envir$counter.history, function(x){
length(x)
}))
# if(!envir$do.block.warning){
if(!is.null(start.after)){
if(start.after > max.length){
warning(paste("Argument 'start.after=", start.after,
"' exceeds maximum degree of sublayer nestedness (=",
max.length, ").", sep=""))
}
}
if(!is.null(stop.at)){
if(stop.at.0 > max.length){
warning(paste("Argument 'stop.at=", stop.at.0,
"' exceeds maximum degree of sublayer nestedness (=",
max.length, ").", sep=""))
}
}
}
out <- envir$out
# /ACTUAL PROCESSING ----------
return(out)
}
#-------------------------------------------------------------------------------
# SIMPLE LISTS
#-------------------------------------------------------------------------------
src <- list(DF=data.frame(A=c(1,2)), vec=c("a", "b"))
src <- list(src,src)
flatten(src)
# /SIMPLE LISTS ----------
#-------------------------------------------------------------------------------
# NESTED LISTS
#-------------------------------------------------------------------------------
src <- list(a=list(a.1=list(a.1.1=list(a.1.1.1=NA), a.1.2=5),
a.2=list(a.2.1=list())), b=NULL)
flatten(src)
flatten(src, start.after=1)
flatten(src, start.after=1, .do.debug=TRUE)
flatten(src, start.after=2)
flatten(src, start.after=3)
flatten(src, start.after=4)
flatten(src, stop.at=1)
flatten(src, stop.at=2)
flatten(src, stop.at=3)
flatten(src, stop.at=4)
flatten(src, start.after=1, stop.at=1)
flatten(src, start.after=1, stop.at=2)
flatten(src, start.after=1, stop.at=3)
flatten(src, start.after=1, stop.at=4)
flatten(src, start.after=2, stop.at=4)
# /NESTED LISTS ----------
#-------------------------------------------------------------------------------
# ENVIRONMENTS
#-------------------------------------------------------------------------------
envir <- new.env()
envir$a <- new.env()
envir$a$a.1 <- new.env()
envir$a$a.1$a.1.1 <- new.env()
envir$a$a.1$a.1.1$a.1.1.1 <- NA
envir$a$a.1$a.1.2 <- 5
envir$a$a.2 <- new.env()
envir$a$a.2$a.2.1 <- list()
envir$b <- NULL
envirAsList(src=envir)
flatten(envir)
flatten(envir, start.after=1)
flatten(envir, start.after=1, .do.debug=TRUE)
flatten(envir, start.after=2)
flatten(envir, start.after=3)
flatten(envir, start.after=4)
flatten(envir, stop.at=1)
flatten(envir, stop.at=2)
flatten(envir, stop.at=3)
flatten(envir, stop.at=4)
flatten(envir, start.after=1, stop.at=1)
flatten(envir, start.after=1, stop.at=2)
flatten(envir, start.after=1, stop.at=3)
flatten(envir, start.after=1, stop.at=4)
# /ENVIRONMENTS ----------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.