On 13/11/2018 12:58 PM, William Dunlap wrote:
You also asked about doing this for the rows of a matrix.  unique() give
the unique rows but match operates on a per element, not per row,
basis.  You can use split, which operates on rows of a matrix, to help.

     > m <- cbind( A=c(i=5,ii=5,iii=5,iv=4,v=4,vi=4), B=c(2,3,2,2,2,2) )
     > unique(m)
        A B
    i  5 2
    ii 5 3
    iv 4 2
     > match(m, unique(m)) # bad
      [1] 1 1 1 3 3 3 4 5 4 4 4 4
     > asRows <- function(x) split(x, seq_len(NROW(x))) # convert to
    list of rows
     > match(asRows(m), unique(asRows(m)))
    [1] 1 2 1 3 3 3


For data.frames unique works on rows but match works on columns, and converting to a list of rows does not quite work, because unique looks at the row names.  A
modification of asRoiws works around that:

     > d <- data.frame(m)
     > unique(d)
        A B
    i  5 2
    ii 5 3
    iv 4 2
     > match(d, unique(d))
    [1] NA NA
     > asRows <- function(x) lapply(split(x, seq_len(NROW(x))), as.list)
     > match(asRows(d), unique(asRows(d)))
    [1] 1 2 1 3 3 3


Thanks!  That's very nice.


Is this the sort of issue that Hadley's vectors package is addressing?
I don't know; hopefully someone else will respond...

Duncan Murdoch


Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>

On Tue, Nov 13, 2018 at 2:15 AM, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:

    On 13/11/2018 12:35 AM, Pages, Herve wrote:

        Hi,

        On 11/12/18 17:08, Duncan Murdoch wrote:

            The duplicated() function gives TRUE if an item in a vector
            (or row in
            a matrix, etc.) is a duplicate of an earlier item.  But what
            I would
            like to know is which item does it duplicate?

            For example,

            v <- c("a", "b", "b", "a")
            duplicated(v)

            returns

            [1] FALSE FALSE  TRUE  TRUE

            What I want is a fast way to calculate

               [1] NA NA 2 1

            or (equally useful to me)

               [1] 1 2 2 1

            The result should have the property that if result[i] == j,
            then v[i]
            == v[j], at least for i != j.

            Does this already exist somewhere, or is it easy to write?


        I generally use match() for that:

           > v <- c("a", "b", "b", "a")

           > match(v, v)

        [1] 1 2 2 1


    Yes, this is perfect.  Thanks to you (and the private answer I
    received that suggested the same).

    Duncan Murdoch

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
    To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    <https://stat.ethz.ch/mailman/listinfo/r-help>
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    <http://www.R-project.org/posting-guide.html>
    and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to