I am not sure exactly what your data represents. For example, from looking at the data it appears that user1 and user2 have been logged on for about 4 days; is that what the data is saying? If you are keeping track of users, why not write out a file that has the start/end time for each user's session. The first time you see them, put an entry in a table and as soon as they don't show up in your sample, write out a record for them. With that information is it easy to create a report of the number of unique people over time.
On Tue, Jan 11, 2011 at 10:47 AM, Jason Edgecombe <ja...@rampaginggeek.com> wrote: > Hello, > > I have logging information for multiple machines, which I am trying to > summarize and graph. So far, I process each host individually, but I would > like to summarize the user count across multiple hosts. I want to answer the > question "how many unique users logged in on a certain day across a group of > machines"? > > I'm not quite sure how to scale the data frame and analysis to summarize > multiple hosts, though. I'm still getting a feel for using R. > > Here is a snippet of data for one host. the user_count column is generated > from the users column using my custom function "usercount()". the samples > are taken roughly once per minute and only unique samples are recorded. > (i.e. use na.locf() to uncompress the data). Samples may occur twice in the > same minute and are rarely aligned on the same time. > > Here is the original data before I turn t into a zoo series and run > na.locf() over it so I can aggregate a single host by day. I'm open to a > better way. >> foo > users datetime user_count > 1 user1 & user2 2007-03-29 19:16:30 2 > 2 user1 & user2 2007-03-31 00:04:46 2 > 3 user1 & user2 2007-04-02 11:49:20 2 > 4 user1 & user2 2007-04-02 12:02:04 2 > 5 user1 & user2 2007-04-02 12:44:02 2 > 6 user1 & user2 & user3 2007-04-02 16:34:05 3 > >> dput(foo) > structure(list(users = c("user1 & user2", "user1 & user2", "user1 & user2", > "user1 & user2", "user1 & user2", "user1 & user2 & user3"), datetime = > structure(c(1175210190, > 1175313886, 1175528960, 1175529724, 1175532242, 1175546045), class = > c("POSIXt", > "POSIXct"), tzone = "US/Eastern"), user_count = c(2, 2, 2, 2, > 2, 3)), .Names = c("users", "datetime", "user_count"), row.names = c(NA, > 6L), class = "data.frame") > > > Thanks, > Jason > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.