try as.numeric(read_data$DEC) this should turn it into a numeric variable that you can work with
hth David Freedman CDC, Atlanta Guy Green wrote: > > Hi Peter & others, > > Thanks (Peter) - that gets me really close to what I was hoping for. > > The one problem I have is that the "cut" approach breaks the data into > intervals based on the absolute value of the "Target" data, rather than > their frequency. In other words, if the data ranged from 0 to 50, the > data would be separated into 0-5, 5-10 and so on, regardless of the > frequency within those categories. However I want to get the data into > deciles. > > The code that does this (incorporating Peter's) is: > > read_data=read.table("C:/Sample table.txt", head = T) > read_data$DEC <- with(read_data, cut(Target, breaks=10, labels=1:10)) > L <- split(read_data, read_data$DEC) > > This means that I can get separate data frames, such as L$'10', which > comes out tidy, but only containing 2 data items (the sample has 63 rows, > so each decile should have 6+ data items): > Actual Target DEC > 9 0.572 0.3778386 10 > 31 0.299 0.3546606 10 > > If I try to adjust this to get deciles using cut2(), I can break the data > into deciles as follows: > > read_data=read.table("C:/Sample table.txt", head = T) > read_data$DEC <- with(read_data, cut2(read_data$Target, g=10), > labels=1:10) > L <- split(read_data, read_data$DEC) > > However this time, while the data is broken into even data frames, the > labels for the separate data frames are unuseable, e.g.: > $`[ 0.26477, 0.37784]` > Actual Target DEC > 6 0.243 0.2650960 [ 0.26477, 0.37784] > 9 0.572 0.3778386 [ 0.26477, 0.37784] > 10 -0.049 0.3212681 [ 0.26477, 0.37784] > 15 0.780 0.2778518 [ 0.26477, 0.37784] > 31 0.299 0.3546606 [ 0.26477, 0.37784] > 33 0.105 0.2647676 [ 0.26477, 0.37784] > > Could anyone suggest a way of rearranging this to make the labels useable > again? Sample data is reattached > http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt . > > Thanks, > Guy > > > > Peter Ehlers wrote: >> >> On 2010-03-08 8:47, Guy Green wrote: >>> >>> Hello, >>> I have a set of data with two columns: "Target" and "Actual". A >>> http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is >>> attached but the data looks like this: >>> >>> Actual Target >>> -0.125 0.016124906 >>> 0.135 0.120799865 >>> ... ... >>> ... ... >>> >>> I want to be able to break the data into tables based on quantiles in >>> the >>> "Target" column. I can see (using cut2, and also quantile) how to get >>> the >>> barrier points between the different quantiles, and I can see how I >>> would >>> achieve this if I was just looking to split up a vector. However I am >>> trying to break up the whole table based on those quantiles, not just >>> the >>> vector. >>> >>> However I would like to be able to break the table into ten separate >>> tables, >>> each with both "Actual" and "Target" data, based on the "Target" data >>> deciles: >>> >>> top_decile = ...(top decile of "read_data", based on Target data) >>> next_decile = ...and so on... >>> bottom_decile = ... >> >> I would just add a factor variable indicating to which decile >> a particular observation belongs: >> >> dat$DEC <- with(dat, cut(Target, breaks=10, labels=1:10)) >> >> If you really want to have separate data frames you can then >> split on the decile: >> >> L <- split(dat, dat$DEC) >> >> -Peter Ehlers >> -- >> Peter Ehlers >> University of Calgary >> >> > > -- View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.