I'm sorry if this has come across as a homework assignment!I was trying to provide a simple example. There are actually 38323 rows of data, each row is an observation of the percent that each of those veg types occupies in a spatial unit - where each line adds to 90 - and values are different every line. I need a way to categorize the data, so I can reduce the number of unique observations.
So instead of 38323 unique observations - I can reduce this to X number of High/Med/Low X number of Med/Low/High X number of Low/High/Med etc... for all combinations I hope this makes it more clear...... thank you all for your responses, JC On Sun, May 29, 2022 at 1:16 PM Avi Gross via R-help <r-help@r-project.org> wrote: > Tom, > You may have a very different impression of what was asked! LOL! > Unless Janet clarifies what seems a bit like a homework assignment, it > seems to be a fairly simple and straightforward assignment with exactly > three rows/columns and asking how to replace the variables, in a sense, by > finding the high and low and perhaps thus identifying the medium, but to do > this for each row without changing the order of the resulting data.frame. > I note most techniques people have used focus on columns, not rows, but an > all-numeric data.frame can be transposed, or converted to a matrix and > later converted back. > If this is HW, the question becomes what has been taught so far and is > supposed to be used in solving it. Can they make their own functions > perhaps to be called three times, once per row or column, to replace that > row/column, or can they use some form of loop to iterate over the columns? > Does it need to sort of be done in place or can they create gradually a > second data.frame and then move the pointer to it and lots of other similar > ideas. > I am not sure, other than as a HW assignment, why this transformation > would need to be done but of course, there may well be a reason. > I note that the particular example shown just happens to create almost a > magic square as the sum of rows and columns and the major diagonal happen > to be 0, albeit the reverse diagonal is all 50's. > Again, there are many solutions imaginable but the goal may be more > specific and I shudder to supply one given that too often questions here > are not detailed enough and are misunderstood. In this case, I thought I > understood until I saw what Tom wrote! LOL! > I will add this. Is it guaranteed that no two items in the same row are > never equal or is there some requirement for how to handle a tie? And note > there are base R functions called min() and max() and you can ask for > things like: > > if ( current == min(mydata[1,])) ... > > > -----Original Message----- > From: Tom Woolman <twool...@ontargettek.com> > To: Janet Choate <jsc....@gmail.com> > Cc: r-help@r-project.org > Sent: Sun, May 29, 2022 3:42 pm > Subject: Re: [R] categorizing data > > > Some ideas: > > You could create a cluster model with k=3 for each of the 3 variables, > to determine what constitutes high/medium/low centroid values for each > of the 3 types of plant types. Centroid values could then be used as the > upper/lower boundary ranges for high/med/low. > > Or utilize a histogram for each variable, and use quantiles or > densities, etc. to determine the natural breaks for the high/med/low > ranges for each of the IVs. > > > > > On 2022-05-29 15:28, Janet Choate wrote: > > Hi R community, > > I have a data frame with three variables, where each row adds up to 90. > > I want to assign a category of low, medium, or high to the values in > > each > > row - where the lowest value per row will be set to 10, the medium > > value > > set to 30, and the high value set to 50 - so each row still adds up to > > 90. > > > > For example: > > Data: Orig > > tree shrub grass > > 32 11 47 > > 23 41 26 > > 49 23 18 > > > > Data: New > > tree shrub grass > > 30 10 50 > > 10 50 30 > > 50 30 10 > > > > I am not attaching any code here as I have not been able to write > > anything > > effective! appreciate help with this! > > thank you, > > JC > > > > -- > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Tague Team Lab Manager 1005 Bren Hall UCSB, Santa Barbara, CA. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.