Re: [R] How might I work with a data.frame where each physical row represents several logical rows?

Jeff Newmiller Wed, 04 Jan 2017 23:31:51 -0800

Below is my interpretation of one way to achieve your stated goal. I don'tknow what analysis you plan to perform, but this seems unlikely to be bemy preferred analysis format (I think I would probably analyze subsets ofthe records related to specific parts of the transactions).


library(dplyr)
library(tidyr)


fifteenminutes <- 15 * 60

# for education, show intermediate results
# strip out square brackets
View(   test_data
    %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
    )

# split the single column into multiple columns
View(   test_data
    %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
    %>% separate( values, paste0( "value", 0:3 ), ", *" )
    )

# pull separate value columns into one column called value, with a new
# column vcol to hold the name of the original column
View(   test_data
    %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
    %>% separate( values, paste0( "value", 0:3 ), ", *" )
    %>% gather( vcol, value, c( value0, value1, value2, value3 ) )
    )

# create a timestamp column for the individual values
View(   test_data
    %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
    %>% separate( values, paste0( "value", 0:3 ), ", *" )
    %>% gather( vcol, value, c( value0, value1, value2, value3 ) )
    %>% mutate( timestamp = start
                          + fifteenminutes
                            * as.numeric( sub( "value", "", vcol ) ) )
    )

# remove the old vcol column now that timestamp column is created
View(   test_data
    %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
    %>% separate( values, paste0( "value", 0:3 ), ", *" )
    %>% gather( vcol, value, c( value0, value1, value2, value3 ) )
    %>% mutate( timestamp = start
                          + fifteenminutes
                            * as.numeric( sub( "value", "", vcol ) ) )
    %>% select( -vcol )
    )

# unite several columns that currently distinguish various rows
View(   test_data
    %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
    %>% separate( values, paste0( "value", 0:3 ), ", *" )
    %>% gather( vcol, value, c( value0, value1, value2, value3 ) )
    %>% mutate( timestamp = start
                          + fifteenminutes
                            * as.numeric( sub( "value", "", vcol ) ) )
    %>% select( -vcol )
    %>% unite( mname, mtype, nic, tcp_state, limit_type, value_type, name )
    )

# spread values out into separate columns
test_data2 <- (   test_data
              %>% mutate( values = sub( "^\\[(.*)\\]$", "\\1", values ) )
              %>% separate( values, paste0( "value", 0:3 ), ", *" )

%>% gather( vcol, value, c( value0, value1, value2, value3 ))%>% mutate( timestamp = start + fifteenminutes * as.numeric(sub( "value", "", vcol ) ) )

              %>% select( -vcol )

%>% unite( mname, mtype, nic, tcp_state, limit_type,value_type, name )

              %>% spread( mname, value )
              )

View( test_data2 )


On Wed, 4 Jan 2017, David Wolfskill wrote:

On Wed, Jan 04, 2017 at 08:33:46PM -0800, David Winsemius wrote:

 ...
Perhaps something like this:

# function to read the values in 'values':
 parse_values <- function(x) {scan(text= gsub( "\\[|\\]","",x), sep=",") }

# the apply function reads line-by-line
 new_dat <- apply(test_data, 1, function(d) data.frame( as.list(d[!names(d)  %in% 
"values"]), nvals <- parse_values(d['values']) ) )


Hmmm.... OK; that looks a lot better than the stuff that was coming to
my mind -- thanks! :-)

...
# Could suppress the report from scan by adding quiet = TRUE
# now take this list of 4 line data.frames and "rbind" them
# If you wanted these to remain character you would use stringsAsFactors=FALSE 
in the data.frame call

new_df <- do.call("rbind", new_dat)


Aye.

...

(I will also end up collecting all of the records for a given timestamp
and hostname, and creating one very wide record with all of the data
from the set of records thus found.  I already have (yes, Perl) code to
do this -- though if there's a reasonable way to avoid that, I'm
interested.)


I thought you wanted the data in long form.


Sorry; I'm not understanding what you mean: My background is a lot more
toward systems administration than statistical analysis.

The repository I'm using has a rather large number of individual metrics
from a given server -- each provided on a separate row.  (That's why one
of the columns is called "name" -- it provides the (base) "name" of the
metric that corresponds to the "values" on the given row.)  I'll plan to
assemble the rows for a given server & timestamp into a single row --
thuse, I would have the tcp_connection_count for the "last ACK" state
and for the "fin_wait_2" state, as well as CpuSystem, CpuUser, CpuIdle,
... for the given server & timestamp on a single row (eventually).

...


Thanks again!

Peace,
david
--
David H. Wolfskill                              r...@catwhisker.org
Epistemology for post-truthers: How do we select parts of reality to ignore?

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How might I work with a data.frame where each physical row represents several logical rows?

Reply via email to