Hello!

I've time stamp ('time') field in dataset ('dt') with values like "18:10",
"19:43", ....
I need to split time field into hour and minutes and add both as new
columns to dataset.
We are able to do it in bash+awk, but curious to stay within R codebase as
much as possible.

For now we are using such solution:

 tstamp <- strsplit(dt$time, ":")

# constructing hours field
 dt$hr  <- lapply(tstamp, function(v) {v[1] } )

# constructing minutes field
 dt$m   <- lapply(tstamp, function(v) {v[2] } )

It works find on sample (and simple, small) data set.

But while working on real data with several millions of records, it seems
not very practical to make two separate passes on tstamp list.

We've tried to use instead such construction:

dt[c('hr', 'm')] <- strsplit(dt$time, ":")

But the R environment 'consumes' whole system 'memory' - 8Gb, and starts to
swapping while proceeding this statement and 'hangs' for such long time
that we have never had patience to wait for results.

Is it any simple and efficient way to assign several dataset columns with
values computed/prepared on base of set of other columns?


R-egards,
Alex

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to