Re: [R] Adding SORT to UNIQUE

Duncan Murdoch Tue, 21 Dec 2021 10:14:11 -0800

On 21/12/2021 12:53 p.m., Duncan Murdoch wrote:

On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:

It is a very rational choice, not a design flaw. I don't like every choice they 
have made for that class, but this one is very solid, and treating data frames 
as lists of columns consistently helps all of us.

I think outlawing matrix notation is a really bad idea.  It makes code
harder to read, and makes it much harder to switch to matrices, which
sometimes gives a huge speed boost to code.


For example, John Fox posted an example that showed that operations on
whole columns of dataframes is about twice as fast using list notation

as using matrix notation. But for operating on whole rows,


... or on individual elements ...

> matrices are

about 100 times faster than dataframes.  You shouldn't use notation that
makes the switch to matrices more difficult.

Duncan Murdoch


On December 21, 2021 9:02:56 AM PST, Duncan Murdoch <murdoch.dun...@gmail.com> 
wrote:

On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:

Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
design. Data frames are lists of columns.


That's just one of the design flaws in tibbles, but not the worst one.

Duncan Murdoch


On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.dun...@gmail.com> 
wrote:

On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:

On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
decreasing)) :
        undefined columns selected


That's the wrong syntax:  Data[1] is not "column one of Data".  Use
Data[[1]] for that, so

       sort(unique(Data[[1]]))


Actually, I'd probably recommend

     sort(unique(Data[, 1]))

instead.  This treats Data as a matrix rather than as a list.
Dataframes are lists that look like matrices, but to me the matrix
aspect is usually more intuitive.

Duncan Murdoch


I think Rui already pointed out the typo in the quoted text below...

Duncan Murdoch


The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may
be text or numbers. This is a huge analysis effort with data coming at
me from many different sources.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to
beat sort(unique(x)), though there's a fair bit of confusion about
what you actually want.  Maybe you should post an example of the code
you'd use in Python?

Duncan Murdoch


QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
        stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding SORT to UNIQUE

Reply via email to