Re: [R] How to remove all rows that have a numeric in the first (or any) column

Jeff Newmiller Tue, 14 Sep 2021 22:44:29 -0700

FWIW I use them quite frequently, but not for the purpose of storing 
heterogeneous data... rather for holding complex objects of the same class.


On September 14, 2021 10:25:54 PM PDT, Avi Gross via R-help 
<r-help@r-project.org> wrote:
>My apologies. My reply was to Andrew, not Gregg.
>
>Enough damage for one night. Here is hoping we finally understood a question 
>that could have been better phrased. list columns are not normally considered 
>common data structures but quite possibly will be more as time goes on and the 
>tools to handle them become better or at least better understood.
>
>
>-----Original Message-----
>From: R-help <r-help-boun...@r-project.org> On Behalf Of Avi Gross via R-help
>Sent: Wednesday, September 15, 2021 1:23 AM
>To: R-help@r-project.org
>Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
>any) column
>
>You are correct, Gregg, I am aware of that trick of asking something to not be 
>evaluated in certain ways.
>
> 
>
>And you can indeed use base R to play with contents of beta as defined above.  
>Here is a sort of incremental demo:
>
> 
>
>> sapply(mydf$beta, is.numeric)
>
>[1] FALSE  TRUE  TRUE FALSE
>
>> !sapply(mydf$beta, is.numeric)
>
>[1]  TRUE FALSE FALSE  TRUE
>
>> keeping <- !sapply(mydf$beta, is.numeric)
>
>> mydf[keeping, ]
>
># A tibble: 2 x 2
>
>alpha beta     
>
><int> <list>   
>
>  1     1 <chr [1]>
>
>  2     4 <chr [1]>
>
>  > str(mydf[keeping, ])
>
>tibble [2 x 2] (S3: tbl_df/tbl/data.frame)
>
>$ alpha: int [1:2] 1 4
>
>$ beta :List of 2
>
>..$ : chr "Hello"
>
>..$ : chr "bye"
>
> 
>
>Now for the bad news. The original request was for ANY column. But presumably 
>one way to do it, neither efficiently nor the best, would be to loop on the 
>names of all the columns and starting with the original data.frame, whittle 
>away at it column by column and adjust which column you search each time until 
>what is left had nothing numeric anywhere. 
>
> 
>
>Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to 
>evaluate across a row.
>
> 
>
>Using your technique I made the following data.frame:
>
> 
>
>mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")), 
>
>                   beta=I(list(1, "second", 3.3, "Lasting")))
>
> 
>
>> mydf
>
>alpha    beta
>
>1 first       1
>
>2     2  second
>
>3   3.3     3.3
>
>4  Last Lasting
>
> 
>
>Do we agree only the fourth row should be kept as the others have one or two 
>numeric values?
>
> 
>
>Here is some code I cobbled together that seems to work:
>
> 
>
> 
>
>rowwise(mydf) %>% 
>
>  mutate(alphazoid=!is.numeric(unlist(alpha)), 
>
>         betazoid=!is.numeric(unlist(beta))) %>%
>
>  filter(alphazoid & betazoid) -> result
>
> 
>
>str(result)  
>
>print(result)
>
>result[[1,1]]
>
>result[[1,2]]
>
> 
>
>as.data.frame(result)
>
> 
>
>The results are shown below that only the fourth row was kept:
>
> 
>
>> rowwise(mydf) %>%
>
>  +   mutate(alphazoid=!is.numeric(unlist(alpha)), 
>
>             +          betazoid=!is.numeric(unlist(beta))) %>%
>
>  +   filter(alphazoid & betazoid) -> result
>
>> 
>
>  > str(result)  
>
>rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame)
>
>$ alpha    :List of 1
>
>..$ : chr "Last"
>
>..- attr(*, "class")= chr "AsIs"
>
>$ beta     :List of 1
>
>..$ : chr "Lasting"
>
>..- attr(*, "class")= chr "AsIs"
>
>$ alphazoid: logi TRUE
>
>$ betazoid : logi TRUE
>
>- attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
>
>..$ .rows: list<int> [1:1] 
>
>.. ..$ : int 1
>
>.. ..@ ptype: int(0) 
>
>> print(result)
>
># A tibble: 1 x 4
>
># Rowwise: 
>
>alpha     beta      alphazoid betazoid
>
><I<list>> <I<list>> <lgl>     <lgl>   
>
>  1 <chr [1]> <chr [1]> TRUE      TRUE    
>
>> result[[1,1]]
>
>[[1]]
>
>[1] "Last"
>
> 
>
>> result[[1,2]]
>
>[[1]]
>
>[1] "Lasting"
>
> 
>
>> as.data.frame(result)
>
>alpha    beta alphazoid betazoid
>
>1  Last Lasting      TRUE     TRUE
>
> 
>
>Of course, the temporary columns for alphazoid and betazoid can trivially be 
>removed.
>
> 
>
> 
>
> 
>
> 
>
>From: Andrew Simmons <akwsi...@gmail.com>
>Sent: Wednesday, September 15, 2021 12:44 AM
>To: Avi Gross <avigr...@verizon.net>
>Cc: Gregg Powell via R-help <r-help@r-project.org>
>Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
>any) column
>
> 
>
>I'd like to point out that base R can handle a list as a data frame column, 
>it's just that you have to make the list of class "AsIs". So in your example
>
> 
>
>temp <- list("Hello", 1, 1.1, "bye")
>
> 
>
>data.frame(alpha = 1:4, beta = I(temp)) 
>
> 
>
>means that column "beta" will still be a list.
>
> 
>
> 
>
>On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help <r-help@r-project.org 
><mailto:r-help@r-project.org> > wrote:
>
>Calling something a data.frame does not make it a data.frame.
>
>The abbreviated object shown below is a list of singletons. If it is a column 
>in a larger object that is a data.frame, then it is a list column which is 
>valid but can be ticklish to handle within base R but less so in the tidyverse.
>
>For example, if I try to make a data.frame the normal way, the list gets made 
>into multiple columns and copied to each row. Not what was expected. I think 
>some tidyverse functionality does better.
>
>Like this:
>
>library(tidyverse)
>temp=list("Hello", 1, 1.1, "bye")
>
>Now making a data.frame has an odd result:
>
>> mydf=data.frame(alpha=1:4, beta=temp)
>> mydf
>alpha beta..Hello. beta.1 beta.1.1 beta..bye.
>1     1        Hello      1      1.1        bye
>2     2        Hello      1      1.1        bye
>3     3        Hello      1      1.1        bye
>4     4        Hello      1      1.1        bye
>
>But a tibble handles it:
>
>> mydf=tibble(alpha=1:4, beta=temp)
>> mydf
># A tibble: 4 x 2
>alpha beta     
><int> <list>   
>  1     1 <chr [1]>
>  2     2 <dbl [1]>
>  3     3 <dbl [1]>
>  4     4 <chr [1]>
>
>So if the data does look like this, with a list column, but access can be 
>tricky as subsetting a list with [] returns a list and you need [[]].
>
>I found a somehwhat odd solution like this:
>
>mydf %>%
>   filter(!map_lgl(beta, is.numeric)) -> mydf2 # A tibble: 2 x 2
>alpha beta     
><int> <list>   
>  1     1 <chr [1]>
>  2     4 <chr [1]>
>
>When I saved that result into mydf2, I got this.
>
>Original:
>
>  > str(mydf)
>tibble [4 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:4] 1 2 3 4 $ beta 
>:List of 4 ..$ : chr "Hello"
>..$ : num 1
>..$ : num 1.1
>..$ : chr "bye"
>
>Output when any row with a numeric is removed:
>
>> str(mydf2)
>tibble [2 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:2] 1 4 $ beta :List 
>of 2 ..$ : chr "Hello"
>..$ : chr "bye"
>
>So if you try variations on your code motivated by what I show, good luck. I 
>am sure there are many better ways but I repeat, it can be tricky.
>
>-----Original Message-----
>From: R-help <r-help-boun...@r-project.org 
><mailto:r-help-boun...@r-project.org> > On Behalf Of Jeff Newmiller
>Sent: Tuesday, September 14, 2021 11:54 PM
>To: Gregg Powell <g.a.pow...@protonmail.com <mailto:g.a.pow...@protonmail.com> 
>>
>Cc: Gregg Powell via R-help <r-help@r-project.org 
><mailto:r-help@r-project.org> >
>Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
>any) column
>
>You cannot apply vectorized operators to list columns... you have to use a map 
>function like sapply or purrr::map_lgl to obtain a logical vector by running 
>the function once for each list element:
>
>sapply( VPN_Sheet1$HVA, is.numeric )
>
>On September 14, 2021 8:38:35 PM PDT, Gregg Powell <g.a.pow...@protonmail.com 
><mailto:g.a.pow...@protonmail.com> > wrote:
>>Here is the output:
>>
>>> str(VPN_Sheet1$HVA)
>>List of 2174
>> $ : chr "Email: f...@fffffffffff.com <mailto:f...@fffffffffff.com> "
>> $ : num 1
>> $ : chr "Eloisa Libas"
>> $ : chr "Percival Esquejo"
>> $ : chr "Louchelle Singh"
>> $ : num 2
>> $ : chr "Charisse Anne Tabarno, RN"
>> $ : chr "Sol Amor Mucoy"
>> $ : chr "Josan Moira Paler"
>> $ : num 3
>> $ : chr "Anna Katrina V. Alberto"
>> $ : chr "Nenita Velarde"
>> $ : chr "Eunice Arrances"
>> $ : num 4
>> $ : chr "Catherine Henson"
>> $ : chr "Maria Carla Daya"
>> $ : chr "Renee Ireine Alit"
>> $ : num 5
>> $ : chr "Marol Joseph Domingo - PS"
>> $ : chr "Kissy Andrea Arriesgado"
>> $ : chr "Pia B Baluyut, RN"
>> $ : num 6
>> $ : chr "Gladys Joy Tan"
>> $ : chr "Frances Zarzua"
>> $ : chr "Fairy Jane Nery"
>> $ : num 7
>> $ : chr "Gladys Tijam, RMT"
>> $ : chr "Sarah Jane Aramburo"
>> $ : chr "Eve Mendoza"
>> $ : num 8
>> $ : chr "Gloria Padolino"
>> $ : chr "Joyce Pearl Javier"
>> $ : chr "Ayza Padilla"
>> $ : num 9
>> $ : chr "Walfredson Calderon"
>> $ : chr "Stephanie Anne Militante"
>> $ : chr "Rennua Oquilan"
>> $ : num 10
>> $ : chr "Neil John Nery"
>> $ : chr "Maria Reyna Reyes"
>> $ : chr "Rowella Villegas"
>> $ : num 11
>> $ : chr "Katelyn Mendiola"
>> $ : chr "Maria Riza Mariano"
>> $ : chr "Marie Vallianne Carantes"
>> $ : num 12
>>
>>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>
>>On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller 
>><jdnew...@dcn.davis.ca.us <mailto:jdnew...@dcn.davis.ca.us> > wrote:
>>
>>> An atomic column of data by design has exactly one mode, so if any 
>>> values are non-numeric then the entire column will be non-numeric.
>>> What does
>>> 
>>
>>> str(VPN_Sheet1$HVA)
>>> 
>>
>>> tell you? It is likely either a factor or character data.
>>> 
>>
>>> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help 
>>> r-help@r-project.org <mailto:r-help@r-project.org>  wrote:
>>> 
>>
>>> > > Stuck on this problem - How does one remove all rows in a dataframe 
>>> > > that have a numeric in the first (or any) column?
>>> > 
>>
>>> > > Seems straight forward - but I'm having trouble.
>>> > 
>>
>>> > I've attempted to used:
>>> > 
>>
>>> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
>>> > 
>>
>>> > and
>>> > 
>>
>>> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
>>> > 
>>
>>> > Neither work - Neither throw an error.
>>> > 
>>
>>> > class(VPN_Sheet1$HVA) returns:
>>> > 
>>
>>> > [1] "list"
>>> > 
>>
>>> > So, the HVA column returns a list.
>>> > 
>>
>>> > > Data looks like the attached screen grab -
>>> > 
>>
>>> > > The ONLY rows I need to delete are the rows where there is a numeric in 
>>> > > the HVA column.
>>> > 
>>
>>> > > There are some 5000+ rows in the actual data.
>>> > 
>>
>>> > > Would be grateful for a solution to this problem.
>>> > 
>>
>>> > How to get R to detect whether the value in column 1 is a number so the 
>>> > rows with the number values can be deleted?
>>> > 
>>
>>> > > Thanks in advance to any and all willing to help on this problem.
>>> > 
>>
>>> > > Gregg Powell
>>> > 
>>
>>> > > Sierra Vista, AZ
>>> 
>>
>>> --
>>> 
>>
>>> Sent from my phone. Please excuse my brevity.
>--
>Sent from my phone. Please excuse my brevity.
>
>______________________________________________
>R-help@r-project.org <mailto:R-help@r-project.org>  mailing list -- To 
>UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help@r-project.org <mailto:R-help@r-project.org>  mailing list -- To 
>UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove all rows that have a numeric in the first (or any) column

Reply via email to