Re: [R] reshaping data

Mia Bengtsson Fri, 21 May 2010 03:40:49 -0700

Thank you Dennis and Henrique for your help!

Both solutions work! I just need to find a way of removing the empty "cells" 
from the final "long" dataframe since they are not NAs.


Maybe there is an easier way of doing this of the data is not treated as a 
dataframe? The original data file that is derived from another program (mothur) 
is a textfile with the following format:

red \t A,B,C
green \t D
blue \t E,F

The first column "species" is separated from the "sequences"(A, B, C...) with 
tab, and then the "sequences" are separated from each other with commas.

I imported into R as what I thought was a dataframe using:

test1<-readLines("path/test")
test2<-gsub(pattern= "\t", otu, replacement=",")
test3<-textConnection(test2)
test.df<-read.csv(test3, header=F)

Should I rather have imported it as something else if I want to reshape it into 
a list as described previously?

Thanks a million!

/ Mia Bengtsson


On May 21, 2010, at 2:15 AM, Dennis Murphy wrote:

> Hi:
> 
> 
> On Thu, May 20, 2010 at 10:13 AM, Mia Bengtsson <mia.bengts...@bio.uib.no> 
> wrote:
> Hello,
> 
> I am a relatively new R-user who has a lot to learn. I have a large dataset 
> that is in the following dataframe format:
> 
> red             A       B       C
> green   D
> blue    E       F
> 
> This isn't a data frame in R - if it were, it would have NA (or at least ""/" 
> "padding at the end of each row.
> Data frames are not ragged arrays. To have this type of structure in R, the 
> data would have to be in a list.
> 
> This matters because Henrique's solution with reshape() assumes a data frame 
> as input. A similar solution
> would be to use melt() in the reshape package, something like
> 
> library(reshape)
> longdf <- melt(yourdf, id.var = 'species')
> longdf
> 
> If you have NA padding, the way to get rid of them in the reshaped data frame 
> is (with the above approach)
> 
> longdf[!is.na(longdf$value), -longdf$variable]
> 
> If the padding is with blanks, then Henrique's solution works here, too.
> 
> HTH,
> Dennis
> 
> 
> Where red, green and blue are "species" names and A, B and C are observations 
> (corresponding to DNA sequences). Each observation can only belong to one 
> species. I would like to list the observations in one column, with the 
> species they belong to in the next. Like this:
> 
> A       red
> B       red
> C       red
> D       green
> E       blue
> F       blue
> 
> I have tried using reshape() and stack() but I cannot get my head around it. 
> Any help is highly appreciated!
> 
> Thanks in advance,
> __________________________________
> 
> Mia Bengtsson, PhD-student
> Department of Biology
> University of Bergen
> +47 55584715
> +47 97413634
> mia.bengts...@bio.uib.no
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshaping data

Reply via email to