I have (time series) data extracted from a repository that is stored such that each record is for an hour, but each record contains an ordered set of values throughout the hour. In the following exmaple, I'll show sets of 4, depicting 0, 15, 30, and 45 minutes after each "start" point (respectively):
test_data <- structure(list(start = c(1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L, 1482793200L), hostname = c("c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c001.example.net", "c161.example.net", "c161.example.net", "c161.example.net", "c161.example.net"), mtype = c("health", "health", "net", "net", "net", "net", "net", "net", "net", "sys", "sys", "net", "sys", "sys", "sys"), limit_type = c("fill", "serve", "", "", "", "", "", "", "", "", "", "", "", "", ""), hw = c(1.16, 1.16, 1.16, 1.16, 1.16, 1.16, 1.16, 1.16, 1.16, 1.16, 1.16, 1.21, 1.21, 1.21, 1.21), fw = c("2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2017Q1.1.1", "2016Q4.2.13", "2016Q4.2.13", "2016Q4.2.13", "2016Q4.2.13"), tcp_state = c("", "", "", "", "closed", "closing", "fin_wait_2", "last_ack", "syn_rcvd", "", "", "", "", "", ""), value_type = c("limit", "limit", "", "", "", "", "", "", "", "", "", "", "", "", ""), nic = c("all", "all", "", "", "", "", "", "", "", "", "", "mce0", "", "", ""), name = c("in_download_window", "in_download_window", "tcpOutSegs", "tcpRetransSegs", "tcp_connection_count", "tcp_connection_count", "tcp_connection_count", "tcp_connection_count", "tcp_connection_count", "CpuSystem", "CpuUser", "HCOutOctets", "CpuIdle", "CpuSystem", "CpuUser"), values = c("[0.0, 0.0, 0.0, 0.0]", "[0.0, 0.0, 0.0, 0.0]", "[260410.94547698632, 258469.54433635762, 260579.2186617577, 258763.2815145043]", "[18436.311524062934, 18248.952271420356, 18201.62259198662, 17818.39529178736]", "[5.0, 3.0, 3.0, 3.0]", "[3.0, 3.0, 2.0, 2.0]", "[670.0, 677.0, 685.0, 729.0]", "[1162.0, 1192.0, 1148.0, 1110.0]", "[25.0, 60.0, 71.0, 33.0]", "[11.0, 10.0, 11.0, 10.0]", "[2.0, 2.0, 2.0, 2.0]", "[7.873191635959294E9, 7.7377184658927E9, 7.876630519328283E9, 7.714521544912713E9]", "[70.0, 70.0, 70.0, 70.0]", "[27.0, 26.0, 27.0, 26.0]", "[4.0, 4.0, 4.0, 4.0]" )), .Names = c("start", "hostname", "mtype", "limit_type", "hw", "fw", "tcp_state", "value_type", "nic", "name", "values" ), class = "data.frame", row.names = c(NA, -15L)) So of the 15 rows in the above example, row 8 (which depicts the TCP connection counts in the "last ACK" state) has the values: * 1162.0 * 1192.0 * 1148.0 * 1110.0 It seems to me that what is wanted is for each of the existing rows to be replaced by a set of 4 (in this case) rows, where the other columns are the same (save for "start", which is a timestamp, and should be adjusted for the respective times). I'm fairly sure I can write code to do that, but it would end up being something like Perl implemented in R, which seems fairly grotesque: I can't help but think that there ought to be a ... more elegant approach in R (which is why I am sking for help). (I will also end up collecting all of the records for a given timestamp and hostname, and creating one very wide record with all of the data from the set of records thus found. I already have (yes, Perl) code to do this -- though if there's a reasonable way to avoid that, I'm interested.) Once that's all done, I'll be examining various columns, subsetting by attributes of the systems being compared -- but I already have code to do that (that makes use of a different -- and rather more fragile -- approach for extracting the data from its repository). Thanks! Peace, david -- David H. Wolfskill r...@catwhisker.org Epistemology for post-truthers: How do we select parts of reality to ignore? See http://www.catwhisker.org/~david/publickey.gpg for my public key.
signature.asc
Description: PGP signature
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.