[R] Fwd: Selecting rows from a DF where the value in a selected column matches any element of a vector.

Andrew Hoerner Sat, 12 Apr 2014 15:39:07 -0700

Thanks Sarah! That worked!

And you are quite right about the absence of parentheses and "EC07_A1$" 's.
I apologize for sending that code snip -- I am not quite sure how I managed
to do it, since I had already fixed those problems and changed the code in
order to get the error message I posted.


Apropos of nothing in particular, before I could successfully impliment
your fix, I also had to learn another new thing. When saving a CSV file
with write.table, if you use sep=", " (that's double-quote comma space
double-quote) R puts the space _inside_ the quotation marks around
character variables. I'm not sure I would call that a bug, but I bet more
people are surprised by it than expect it.

Again, many thanks!

Andrew


On Sat, Apr 12, 2014 at 6:04 AM, Sarah Goslee <sarah.gos...@gmail.com>wrote:

> You need %in% instead.
>
> This is untested, but something like this should work:
>
>
> ECwork  <-  EC07_A1[ EC07_A1$GEO_ID %in% c("01000US", "04000US06",
> "33000US488",
> "31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000") &
>       EC07_A1$SECTOR %in% c("32", "33", "42", 44", 45", 51", 54", 61",
> "71",
> "81"), ]
>
> (Note that your original code snippet had a shortage of ) and didn't
> specify the data frame from which to take the columns.)
>
> Sarah
>
> On Sat, Apr 12, 2014 at 8:36 AM, Andrew Hoerner <ahoer...@rprogress.org>
> wrote:
> > Dear Folks--
> > I have a file with 3 million-odd rows of data from the 2007 U.S. Economic
> > Census. I am trying to pare it down to a subset of rows that both (1) has
> > any one of a vector of NAICS economic sector codes, and (2) also has any
> > one of a vector of geographic ID codes.
> >
> > Here is the code I am trying to use.
> >
> > ECwork  <-  EC07_A1[ any(GEO_ID == c("01000US", "04000US06",
> "33000US488",
> > "31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000")
> &
> >       any(SECTOR == c("32", "33", "42", 44", 45", 51", 54", 61", "71",
> > "81"), ]
> >
> > I get back the following error:
> >
> > Warning message:
> > In EC07_A1$SECTOR == c("32", "33", "42", "44", "45", "51", "54",  :
> >   longer object length is not a multiple of shorter object length
> >
> > I see what R is doing.  Instead of comparing each element of the column
> > SECTOR to the row vector of codes, and returning a logical vector of the
> > length of SECTOR with rows marked as TRUE that match any of the codes, it
> > is lining my code list up with SECTOR as a column vector and doing
> > element-by-element testing, and then recycling the code list over three
> > million rows. But I am not sure how to make it do what I want -- test the
> > sector code in each row against the vector of code I am looking for. I
> > would be grateful if anyone could suggest an alternative that would
> achieve
> > my ends.
> >
> > Oh, and I would add, if there is a way of correctly using doing this with
> > the extract function [], I would like to know what it is. If not, I guess
> > I'd like to know that too.
> >
> > Sincerely, Andrew Hoerner
> >
> > --
> > J. Andrew Hoerner
> > Director, Sustainable Economics Program
> > Redefining Progress
> > (510) 507-4820
> >
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>



-- 
J. Andrew Hoerner
Director, Sustainable Economics Program
Redefining Progress
(510) 507-4820



-- 
J. Andrew Hoerner
Director, Sustainable Economics Program
Redefining Progress
(510) 507-4820

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: Selecting rows from a DF where the value in a selected column matches any element of a vector.

Reply via email to