Hi Peter & others, Thanks (Peter) - that gets me really close to what I was hoping for.
The one problem I have is that the "cut" approach breaks the data into intervals based on the absolute value of the "Target" data, rather than their frequency. In other words, if the data ranged from 0 to 50, the data would be separated into 0-5, 5-10 and so on, regardless of the frequency within those categories. However I want to get the data into deciles. The code that does this (incorporating Peter's) is: read_data=read.table("C:/Sample table.txt", head = T) read_data$DEC <- with(read_data, cut(Target, breaks=10, labels=1:10)) L <- split(read_data, read_data$DEC) This means that I can get separate data frames, such as L$'10', which comes out tidy, but only containing 2 data items (the sample has 63 rows, so each decile should have 6+ data items): Actual Target DEC 9 0.572 0.3778386 10 31 0.299 0.3546606 10 If I try to adjust this to get deciles using cut2(), I can break the data into deciles as follows: read_data=read.table("C:/Sample table.txt", head = T) read_data$DEC <- with(read_data, cut2(read_data$Target, g=10), labels=1:10) L <- split(read_data, read_data$DEC) However this time, while the data is broken into even data frames, the labels for the separate data frames are unuseable, e.g.: $`[ 0.26477, 0.37784]` Actual Target DEC 6 0.243 0.2650960 [ 0.26477, 0.37784] 9 0.572 0.3778386 [ 0.26477, 0.37784] 10 -0.049 0.3212681 [ 0.26477, 0.37784] 15 0.780 0.2778518 [ 0.26477, 0.37784] 31 0.299 0.3546606 [ 0.26477, 0.37784] 33 0.105 0.2647676 [ 0.26477, 0.37784] Could anyone suggest a way of rearranging this to make the labels useable again? Sample data is reattached http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt . Thanks, Guy Peter Ehlers wrote: > > On 2010-03-08 8:47, Guy Green wrote: >> >> Hello, >> I have a set of data with two columns: "Target" and "Actual". A >> http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is >> attached but the data looks like this: >> >> Actual Target >> -0.125 0.016124906 >> 0.135 0.120799865 >> ... ... >> ... ... >> >> I want to be able to break the data into tables based on quantiles in the >> "Target" column. I can see (using cut2, and also quantile) how to get >> the >> barrier points between the different quantiles, and I can see how I would >> achieve this if I was just looking to split up a vector. However I am >> trying to break up the whole table based on those quantiles, not just the >> vector. >> >> However I would like to be able to break the table into ten separate >> tables, >> each with both "Actual" and "Target" data, based on the "Target" data >> deciles: >> >> top_decile = ...(top decile of "read_data", based on Target data) >> next_decile = ...and so on... >> bottom_decile = ... > > I would just add a factor variable indicating to which decile > a particular observation belongs: > > dat$DEC <- with(dat, cut(Target, breaks=10, labels=1:10)) > > If you really want to have separate data frames you can then > split on the decile: > > L <- split(dat, dat$DEC) > > -Peter Ehlers > -- > Peter Ehlers > University of Calgary > > -- View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585427.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.