Re: [R] collapsing records

Bill Wed, 22 Jan 2014 01:32:26 -0800

Hi
That is great!
Thanks


On Mon, Jan 20, 2014 at 12:10 PM, Jim Lemon <j...@bitwrit.com.au> wrote:

> On 01/20/2014 11:44 AM, Bill wrote:
>
>> I am trying to read a csv file with a date-time field. There are many rows
>> with the same date but different times. I first want to clear the times so
>> that rows from the same day have the same date-time field (called Date).
>> There is another field called Text and I want to collapse all the records
>> with the same date so that there is only one record for this date and with
>> a text field that contains all the strings from all the corresponding text
>> fields. At the same time I want to create a new field that has the count
>> of
>> how many records were collapsed for each date. There is a third field
>> called Tw.ID and I was trying to use tapply on this field to do this.
>> Later
>> I will create a DocumentTermMatrix with the tm package on this dataframe.
>> In the code below I have not figured out how to collapse the data so that
>> there is only one record for each date and I don't really have a good way
>> to add in a count field. Can anyone make any suggestions?
>> Thanks.
>>
>> install.packages(c("tm"))
>> library(tm)
>> y.df=read.csv("YHOO3000.csv", header=TRUE)
>> y.df$Date= as.POSIXlt( y.df$Date)
>> ysub14.df=y.df
>> ysub14.df$Date=y.df$Date -14*3600 #I pushed the record times back a little
>> here.
>> ysub14.df$Date=as.Date(ysub14.df$Date, "%Y-%m-%d")
>> # might want to use groups<-
>> unstack(data.frame(ysub14.df$Text,ysub14.df$Date))
>> # to put all the tweets for one day into a group. This makes a list
>> # I think, with the name of the list being the Date and
>> # the tweets for that date being stored in a vector.
>> countgroup2=tapply(ysub14.df$Tw.ID,ysub14.df$Date,length)
>>
>>  Hi Bill,
> Here is one way:
>
> # get some date-time strings
> dates<-paste("2014-01-",10:15," ",sample(0:23,20),
>  ":",sample(0:60,20),":",sample(0:60,20),sep="")
> # function to return stupid text
> sillytext<-function(n) {
>  return(paste(sample(letters[1:26],n),sep="",collapse=""))
> }
> # get the stupid text
> ttext<-sapply(rep(10,20),sillytext)
> # make the data frame
> y.df<-data.frame(dates,ttext)
> # convert the date-time strings to dates
> y.df$dates<-
>  as.Date(format(as.Date(dates,"%Y-%m-%d %H:%M:%S"),
>  "Y-%m-%d"),"Y-%m-%d")
> library(prettyR)
> # stretch out all the text strings for each day
> y2.df<-stretch_df(y.df,"dates","ttext")
> # get the dimension of the resulting data frame
> ydim<-dim(y2.df)
> # function to count the NAs
> nna<-function(x) return(sum(is.na(x)))
> # add a column with a count of _not_ NAs
> y2.df$nrec<-
>  (ydim[2]-1)-apply(as.matrix(y2.df[,2:ydim[2]]),1,nna)
>
> Jim
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] collapsing records

Reply via email to