Hi That is great! Thanks
On Mon, Jan 20, 2014 at 12:10 PM, Jim Lemon <j...@bitwrit.com.au> wrote: > On 01/20/2014 11:44 AM, Bill wrote: > >> I am trying to read a csv file with a date-time field. There are many rows >> with the same date but different times. I first want to clear the times so >> that rows from the same day have the same date-time field (called Date). >> There is another field called Text and I want to collapse all the records >> with the same date so that there is only one record for this date and with >> a text field that contains all the strings from all the corresponding text >> fields. At the same time I want to create a new field that has the count >> of >> how many records were collapsed for each date. There is a third field >> called Tw.ID and I was trying to use tapply on this field to do this. >> Later >> I will create a DocumentTermMatrix with the tm package on this dataframe. >> In the code below I have not figured out how to collapse the data so that >> there is only one record for each date and I don't really have a good way >> to add in a count field. Can anyone make any suggestions? >> Thanks. >> >> install.packages(c("tm")) >> library(tm) >> y.df=read.csv("YHOO3000.csv", header=TRUE) >> y.df$Date= as.POSIXlt( y.df$Date) >> ysub14.df=y.df >> ysub14.df$Date=y.df$Date -14*3600 #I pushed the record times back a little >> here. >> ysub14.df$Date=as.Date(ysub14.df$Date, "%Y-%m-%d") >> # might want to use groups<- >> unstack(data.frame(ysub14.df$Text,ysub14.df$Date)) >> # to put all the tweets for one day into a group. This makes a list >> # I think, with the name of the list being the Date and >> # the tweets for that date being stored in a vector. >> countgroup2=tapply(ysub14.df$Tw.ID,ysub14.df$Date,length) >> >> Hi Bill, > Here is one way: > > # get some date-time strings > dates<-paste("2014-01-",10:15," ",sample(0:23,20), > ":",sample(0:60,20),":",sample(0:60,20),sep="") > # function to return stupid text > sillytext<-function(n) { > return(paste(sample(letters[1:26],n),sep="",collapse="")) > } > # get the stupid text > ttext<-sapply(rep(10,20),sillytext) > # make the data frame > y.df<-data.frame(dates,ttext) > # convert the date-time strings to dates > y.df$dates<- > as.Date(format(as.Date(dates,"%Y-%m-%d %H:%M:%S"), > "Y-%m-%d"),"Y-%m-%d") > library(prettyR) > # stretch out all the text strings for each day > y2.df<-stretch_df(y.df,"dates","ttext") > # get the dimension of the resulting data frame > ydim<-dim(y2.df) > # function to count the NAs > nna<-function(x) return(sum(is.na(x))) > # add a column with a count of _not_ NAs > y2.df$nrec<- > (ydim[2]-1)-apply(as.matrix(y2.df[,2:ydim[2]]),1,nna) > > Jim > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.