Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Boris Steipe Thu, 18 Dec 2014 02:31:07 -0800

What you are describing sounds like a very spreadsheet-y thing. 

- The information is already IN your dataframe, and easy to get out by 
subsetting. Depending on your usecase, that may actually be the "best".


- If the number of CaseIDs is large, I would use a hash of lists (if the data 
is sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so that 
may be the best. (Cf package hash, and explanations there). 

- If it must be the spreadsheet-y thing, you could make a matrix with rownames 
and colnames taken from unique() of your respective dataframe. Instead of 1 and 
NA I probably would use TRUE/FALSE. 

- If it takes less time to wait for the results than to look up how apply() 
works, you can write a simple loop to populate your matrix. Otherwise apply() 
is much faster. 

- You could even use a loop to build the datastructure, checking for every 
cbind() whether the value in column 1 already exists in the table - but that's 
terrible and would make a kitten die somewhere on every iteration.

All of these are possible, and you haven't told us enough about what you want 
to achieve to figure out what the "best" is. If you choose one of the options 
and need help with the code, let us know.

Cheers,
B.





On Dec 17, 2014, at 10:15 PM, bcrombie <bcrom...@utk.edu> wrote:

> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
> 
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
> 
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
> 
> # CaseID’s can be repeated because there can be up to 14 Primary.Viol.Type’s
> per CaseID.
> 
> # I want to transform this dataframe into one that has 15 columns, where the
> first column is CaseID, and the rest are the 14 primary viol. types.  The
> CaseID column will contain a list of the unique CaseID’s (no replicates) and
> for each of their rows, there will be a “1” under  a column corresponding to
> a primary violation type recorded for that CaseID.  So, technically, there
> could be zero to 14 “1’s” in a CaseID’s row.
> 
> # For example, the row for CaseID '1015285' above would have a “1” under
> “AS.Age”, “HS.Hours”, “RK.Records_CL”, and “V.Poster_Other”, but have "NA"
> under the rest of the columns.
> 
> PViol.Type <- c("CaseID",
>                "BW.BackWages",
>           "LD.Liquid_Damages",
>           "MW.Minimum_Wage",
>           "OT.Overtime",
>           "RK.Records_FLSA",
>           "V.Poster_Other",
>           "AS.Age",
>           "BW.WHMIS_BackWages",
>           "HS.Hours",
>           "OA.HazOccupationAg",
>           "ON.HazOccupationNonAg",
>           "R3.Reg3AgeOccupation",
>           "RK.Records_CL",
>           "V.Other")
> 
> PViol.Type.Columns <- t(data.frame(PViol.Type)
> 
> # What is the best way to do this in R?
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Reply via email to