Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Boris Steipe Thu, 18 Dec 2014 08:03:36 -0800

"Make a table that looks like..." sounds like a use case that would benefit 
from some reflection.
Anyway, at least don't put your IDs  *in* the "table".


# Your data
CaseID  <- c('1015285',
'1005317',
'1012281',
'1015285',
'1015285',
'1007183',
'1008833',
'1015315',
'1015322',
'1015285')

Primary.Viol.Type <- c('AS.Age',
'HS.Hours',
'HS.Hours',
'HS.Hours',
'RK.Records_CL',
'OT.Overtime',
'OT.Overtime',
'OT.Overtime',
'V.Poster_Other',
'V.Poster_Other')

# the code
uID <- unique(CaseID)
uVT <- unique(Primary.Viol.Type)

m <- matrix(NA, nrow=length(uID), ncol=length(uVT), dimnames=list(uID, uVT))

for (i in 1:length(CaseID)) {
    m[CaseID[i], Primary.Viol.Type[i]] <- 1
}


# the result
        AS.Age HS.Hours RK.Records_CL OT.Overtime V.Poster_Other
1015285      1        1             1          NA              1
1005317     NA        1            NA          NA             NA
1012281     NA        1            NA          NA             NA
1007183     NA       NA            NA           1             NA
1008833     NA       NA            NA           1             NA
1015315     NA       NA            NA           1             NA
1015322     NA       NA            NA          NA              1



B.



On Dec 18, 2014, at 8:09 AM, Crombie, Burnette N <bcrom...@utk.edu> wrote:

> I want to achieve a table that looks like a grid of 1's for all cases in a 
> survey.  I'm an R beginner and don't have a clue how to do all the things you 
> just suggested.  I really appreciate the time you took to explain all of 
> those options, though.  -- BNC
> 
> -----Original Message-----
> From: Boris Steipe [mailto:boris.ste...@utoronto.ca] 
> Sent: Thursday, December 18, 2014 5:29 AM
> To: Crombie, Burnette N
> Cc: r-help@r-project.org
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then 
> adjust col1 data display
> 
> What you are describing sounds like a very spreadsheet-y thing. 
> 
> - The information is already IN your dataframe, and easy to get out by 
> subsetting. Depending on your usecase, that may actually be the "best". 
> 
> - If the number of CaseIDs is large, I would use a hash of lists (if the data 
> is sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so 
> that may be the best. (Cf package hash, and explanations there). 
> 
> - If it must be the spreadsheet-y thing, you could make a matrix with 
> rownames and colnames taken from unique() of your respective dataframe. 
> Instead of 1 and NA I probably would use TRUE/FALSE. 
> 
> - If it takes less time to wait for the results than to look up how apply() 
> works, you can write a simple loop to populate your matrix. Otherwise apply() 
> is much faster. 
> 
> - You could even use a loop to build the datastructure, checking for every 
> cbind() whether the value in column 1 already exists in the table - but 
> that's terrible and would make a kitten die somewhere on every iteration.
> 
> All of these are possible, and you haven't told us enough about what you want 
> to achieve to figure out what the "best" is. If you choose one of the options 
> and need help with the code, let us know.
> 
> Cheers,
> B.
> 
> 
> 
> 
> 
> On Dec 17, 2014, at 10:15 PM, bcrombie <bcrom...@utk.edu> wrote:
> 
>> # I have a dataframe that contains 2 columns:
>> CaseID  <- c('1015285',
>> '1005317',
>> '1012281',
>> '1015285',
>> '1015285',
>> '1007183',
>> '1008833',
>> '1015315',
>> '1015322',
>> '1015285')
>> 
>> Primary.Viol.Type <- c('AS.Age',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'RK.Records_CL',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'V.Poster_Other',
>> 'V.Poster_Other')
>> 
>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>> 
>> # CaseID's can be repeated because there can be up to 14 
>> Primary.Viol.Type's per CaseID.
>> 
>> # I want to transform this dataframe into one that has 15 columns, 
>> where the first column is CaseID, and the rest are the 14 primary 
>> viol. types.  The CaseID column will contain a list of the unique 
>> CaseID's (no replicates) and for each of their rows, there will be a 
>> "1" under  a column corresponding to a primary violation type recorded 
>> for that CaseID.  So, technically, there could be zero to 14 "1's" in a 
>> CaseID's row.
>> 
>> # For example, the row for CaseID '1015285' above would have a "1" 
>> under "AS.Age", "HS.Hours", "RK.Records_CL", and "V.Poster_Other", but have 
>> "NA"
>> under the rest of the columns.
>> 
>> PViol.Type <- c("CaseID",
>>               "BW.BackWages",
>>          "LD.Liquid_Damages",
>>          "MW.Minimum_Wage",
>>          "OT.Overtime",
>>          "RK.Records_FLSA",
>>          "V.Poster_Other",
>>          "AS.Age",
>>          "BW.WHMIS_BackWages",
>>          "HS.Hours",
>>          "OA.HazOccupationAg",
>>          "ON.HazOccupationNonAg",
>>          "R3.Reg3AgeOccupation",
>>          "RK.Records_CL",
>>          "V.Other")
>> 
>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>> 
>> # What is the best way to do this in R?
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row
>> -of-same-df-then-adjust-col1-data-display-tp4700878.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Reply via email to