Try this. First we read a line at a time into L except for the header. Then we use strapply to match on the given pattern. It passes the backreferences (the portions within parentheses in the pattern) to the function (defined via a formula) whose implicit arguments are x, y and z. That function returns two columns which are in the required form so that in the next statement we convert one to chron and the other to numeric. See R News 4/1 for more about dates and times.
library(gsubfn) # strapply library(chron) # as.chron Lines <- "DATETIME FREQ 01/09/2009 59.036 01/09/2009 00:00:01 58.035 01/09/2009 00:00:02 53.035 01/09/2009 00:00:03 47.033 01/09/2009 00:00:04 52.03 01/09/2009 00:00:05 55.025" L <- readLines(Lines)[-1] pat <- "(../../....) (..:..:..){0,1} *([0-9.]+)" s <- strapply(L, pat, ~ c(paste(x, y, "00:00:00"), z), simplify = rbind) fmt <- "%m/%d/%Y %H:%M:%S" DF <- data.frame(Time = as.chron(s[,1], fmt), Freq = as.numeric(s[,2])) DF The final output looks like this: > DF Time Freq 1 (01/09/09 00:00:00) 59.036 2 (01/09/09 00:00:01) 58.035 3 (01/09/09 00:00:02) 53.035 4 (01/09/09 00:00:03) 47.033 5 (01/09/09 00:00:04) 52.030 6 (01/09/09 00:00:05) 55.025 If the times are unique you could consider making a zoo object out of it by replacing the DF<- statement with: library(zoo) z <- zoo(as.numeric(s[,2]), as.chron(s[,1], fmt)) See the three vignettes in the zoo package. On Mon, Oct 5, 2009 at 5:14 PM, esp <davidgary...@gmail.com> wrote: > > Date-Time-Stamp input method to correctly interpret user-specific > formats:coding is 90% there - based on exmple at > http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html > ...anyone got the last 10% please? > > CONTEXT: > > Data is received where one of the columns is a datetimestamp. At midnight, > the value represented as text in this column consists of just the date part, > e.g. "01/09/2009". At other times, the value in the column contains both > date and time e.g. "01/09/2009 00:00:01". The goal is to read it into R as > an appropriate data type, where for example date arithmetic can be > performed. As far as I can tell, the most appropriate such data type is > POSIXct. The trick then is to read in the datetimestamps in the data as > this type. > > PROBLEM: > > POSIXct defaults to a text representation almost but not quite like my > received data. The main difference is that the POSIXct date part is in > reverse order, e.g. "2009-09-01". It is possible to define a different > format where date and time parts look like my data but when encountering > datetimestamps where only the the date part is present (as in the case of my > midnight data) then this is interpreted as NA i.e. undefined. > > SOLUTION (ALMOST): > > There is a workaround (based on example at > http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is possible to > define a class then read the data in as this class. For such a class it is > possible to define a class method, in terms of a function, for translating a > text (character string) representation into a value. In that function, one > can use a conditional expression to treat midnight datetimestamps > differently from those at other times of day. The example below does that. > In order to apply this function over all of the datetimestamp values in the > column, it is necessary to use something like R's 'sapply' function. > > SNAG: > > The function below implements this approach. A datetimestamp with only the > date part, including leading zeroes, is always length 10 (characters). It > correctly interprets the datetimestamp values, but unfortunately translates > them into what appear to be numeric type. I am actually uncertain precisely > what is happening, as I am very new to R and have most certainly stretched > myself in writing this code. I think perhaps it returns a list and > something associated with this aspect makes it "forget" the data type is > POSIXct or at least how such a type should be displayed as text or what to > do about it. > > PLEA: > > Please, can anyone give any help whatsoever, however tenuous? > > CODE, DATA & RESULTS: > > Function to Read required data, intended to make the datetime column of the > data (example given further below) into POSIXct values: > <<< > spot_frequency_readin <- function(file,nrows=-1) { > > # create temp class > setClass("t_class2_", representation("character")) > setAs("character", "t_class2_", function(from) {sapply(from, function(x) { > if (nchar(x)==10) { > as.POSIXct(strptime(x,format="%d/%m/%Y")) > } > else { > as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S")) > } > } > ) > } > ) > > #(for format symbols, see "R Reference Card") > > # read the file (TSV) > file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, > as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_", > "numeric") ) > > # remove it now that we are done with it > removeClass("t_class2_") > > return(file) > } >>>> > This appears to work apart as regards processing each row of data correctly, > but the values returned look like numeric equivalents of POSIXct, as opposed > to the expected character-based (string) equivalents: > > > Example Data: > <<< > DATETIME FREQ > 01/09/2009 59.036 > 01/09/2009 00:00:01 58.035 > 01/09/2009 00:00:02 53.035 > 01/09/2009 00:00:03 47.033 > 01/09/2009 00:00:04 52.03 > 01/09/2009 00:00:05 55.025 >>>> > > > Example Function Call: > <<< >> spot = spot_frequency_readin("mydatafile.txt",4) >>>> > > > Result of Example Function Call: > <<< >> spot[1] > DATETIME > > 1 1251759600 > 2 1251759601 > 3 1251759602 > 4 1251759603 >>>> > > > What I ideally wanted to see (whether or not the time part of the > datetimestamp at midnight was displayed): > <<< >> spot[1] > DATETIME > > 01/09/2009 00:00:00 > 01/09/2009 00:00:01 > 01/09/2009 00:00:02 > 01/09/2009 00:00:03 > 01/09/2009 00:00:04 >>>> > > > For the function as defined above using 'sapply' >> spot[,1] > 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 > 00:00:03 > 1251759600 1251759601 1251759602 > 1251759603 > > This was unexpected - it seems to have displayed the datetimestamp values > both as per my defined character-string representation and as numeric > values. > > Alternatively ifI replace the 'sapply' by a 'lapply' then I get something > closer to what I expect. It is at least what looks like R's default text > representation for POSIXct datetimes, even if it is not in my preferred > format. > <<< >> spot[,1] > > [[1]] > [1] "2009-09-01 BST" > > [[2]] > [1] "2009-09-01 00:00:01 BST" > > [[3]] > [1] "2009-09-01 00:00:02 BST" > > [[4]] > [1] "2009-09-01 00:00:03 BST" >>>> > > -- > View this message in context: > http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.