Re: [R] Duplicate names in the pivot column

phil Sun, 29 Mar 2020 08:47:10 -0700

Thank you very much, Jim and Jeff. Both of your solutions worksplendidly.


Philip


On 2020-03-29 02:25, Jim Lemon wrote:

Hi Phil,

Sorry it's not in the environment you are using, but perhaps this willhelp:


taby<-table(df$y)
ynames<-names(taby)
for(yval in 1:length(taby)) {
 if(taby[yval] > 1) {
  cat(paste(ynames[yval],1:taby[yval],sep=""),"\n")

df$y[which(df$y ==ynames[yval])]<-paste(ynames[yval],1:taby[yval],sep="")

 }
}

Jim

On Sun, Mar 29, 2020 at 12:19 PM <p...@philipsmith.ca> wrote:

I have a problem involving inefficient coding. My code works, but inmyactual application it takes a very long time to execute. I haveincluded

a reprex here that uses the same code, but with a much smaller-scale
application.

The data frame I am working with (df in my reprex) is in long form andI

want to change it to wide form. My problem is that the pivot column,

column 2 in my reprex, has some duplicate strings, so the pivotdoesn'twork well (df1 in my reprex). I want to find all the duplicates andtag

them so they are no longer duplicates. My code succeeds (df3 in my
reprex). But in the real application there can be over 100 "cases" and
the for loops grind on far too long.

I encounter this problem frequently in the datasets I use, so I am
looking for a general solution that is as efficient as possible. Any
help will be much appreciated.

Philip

``` r
library(tidyverse)
df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),

y=c("A","B","C","B","D","C","A","B","C","B","D","C"),z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)

df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
#> Warning: Values in `z` are not uniquely identified; output will
contain list-cols.
#> * Use `values_fn = list(z = list)` to suppress this warning.

#> * Use `values_fn = list(z = length)` to identify where theduplicates

arise
#> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
fixcol <- function(dfm,cases,per,s,tag) {
   # dfm is the data frame
   # s is the target column number, containing character names
   # tag is a string to be added to a duplicate name
   # cases is the number of rows for a single time period
   # per is the number of time periods
   # all time periods must have the same number of rows
   for (k in 1:per) {
     for (i in (1+(k-1)*cases):(k*cases-1)) {
       for (j in (i+1):(k*cases)) {
         if (dfm[j,s]==dfm[i,s]) { # found a duplicate
           dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
           dfm[j,s]
         }
       }
     }
   }
   return(dfm)
}
df2 <- fixcol(df,6,2,2,"_dup")
df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
```

<sup>Created on 2020-03-28 by the [reprex

package](https://reprex.tidyverse.org)(v0.3.0)</sup>______________________________________________

R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Duplicate names in the pivot column

Reply via email to