Hi, all
I've noticed that as.Date can't be applied to Spark data frame. I've created
the following UDF and used dapply to change a integer column "aa" to a date
with origin as 1960-01-01.
change_date<-function(df){ df<-as.POSIXlt(as.Date(df$aa, origin =
"1960-01-01", tz = "UTC")) } customSchema<- structType(structField("rc",
"integer"), .... structField("change_date(x)","timestamp"))
rollup_1_t <- dapply(rollup_1, function(x) { x <-
cbind(x,change_date(x))},schema=customSchema)
It works with a small dataset but it takes forever to finish on a big dataset.
It does not give a result when I used 'head(rollup_1_t).
I guess it is because for "change_date" function, it converts the spark data
frame back to R data frame, which is slow and would potentially fail. Is there
a better solution?
Thanks,Ye