Re: [R] frequency, count rows, data for heat map

Jan van der Laan Thu, 26 Aug 2010 00:03:51 -0700

Please, reply to the r-help and not only to me personally. That way
others can can also help, or perhaps benefit from the answers.


You can use strplit to remove the last part of the strings. strplit
returns a list of character vectors from which you (if I understand
you correctly) only want to select the first element. I use laply from
the plyr library for this, although there are probably also other ways
of doing this.

library(plyr)
dat$V3 <- laply(strsplit(as.character(dat$V1), '_'), function(l) l[1])

After that you can use daply as I showed in my previous post
[daply(dat, V3 ~ V2, nrow)] or use the methods suggested by Dennis
Murphy to build your table.

Regards,

Jan



On Thu, Aug 26, 2010 at 1:41 AM, Trip Sweeney <tripswee...@gmail.com> wrote:
> Jan,
> Thanks for responding to my post to listeserve about arranging data matrix
> for heat map.
> I am still a beginner, so the below is the code I used for the matrix and
> did not yet learn how to
> input 'data.frame' (which I need to know to use your code). The below code
> works
> and mock.txt file is attached. There is one thing, though. The input in
> column 1 is tricky
> in the mock.txt file. I need it to sum per unique ID based on character
> prior to the "_"
> So, for example the current script call 1079_17891 and 1079_14794 uniques
> when I want
> them to be tallied together since they are both part of same 1079 samples.
> Occasionally
> a sample has three characters before the "_", like 111_463428 etc in
> mock.txt. The substring
> after the "_" is variable length. In the end, it should be one row for 1079,
> one for 111, and one for 5576.
> Can you help me with this modification of the code? Any advice much
> appreciated. Sincerely, Trip
>
> dat<-read.table('mock.txt',sep="\t")
> sumData=matrix(NA,nrow=length(unique(dat[,1])),ncol=length(unique(dat[,2])))
> rownames(sumData)<-unique(dat[,1])
> colnames(sumData)<-unique(dat[,2])
>
> for (i in 1:dim(sumData)[1]){
>   for(j in 1:dim(sumData)[2]){
>      sumData[i,j]<-sum (dat[,1]==unique(dat[,1])[i] &
> dat[,2]==unique(dat[,2])[j])
>   }
> }
>
> write.table(sumData,"SummarizedData.txt",sep="\t",col.names=NA)
>



On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney <tripswee...@gmail.com> wrote:
>
> Hi all,
> I have read posts of heat map creation but I am one step prior --
> Here is what I am trying to do and wonder if you have any tips?
> We are trying to map sequence reads from tumors to viral genomes.
>
> Example input file :
> 111     abc
> 111     sdf
> 111     xyz
> 1079   abc
> 1079   xyz
> 1079   xyz
> 5576   abc
> 5576   sdf
> 5576   sdf
>
> How may xyz's are there for 1079 and 111? How many abc's, etc?
> How many times did reads from sample (1079) align to virus xyz.
> In some cases there are thousands per virus in a give sample, sometimes one.
> The original file (two columns by tens of thousands of rows; 20 MB) is
> text file (tab delimited).
>
> Output file:
>         abc  sdf  xyz
> 111     1      1     1
> 1079   1      0     2
> 5576   1      2     0
>
> Or, other ways to generate this data so I can then use it for heat map
> creation?
>
> Thanks for any help you may have,
>
> rtsweeney
> palo alto, ca
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

1079_346        281416490|ref|NC_013643.1|
1079_346        281416323|ref|NC_013646.1|
1079_378        9629367|ref|NC_001803.1|
1079_588        30984428|ref|NC_004812.1|
1079_1292       9629367|ref|NC_001803.1|
1079_3956       9629357|ref|NC_001802.1|
1079_4736       9629357|ref|NC_001802.1|
1079_7732       21427641|ref|NC_004015.1|
1079_7855       118197620|ref|NC_008584.1|
1079_8618       32453484|ref|NC_004928.1|
1079_11540      10140926|ref|NC_002531.1|
1079_14794      9629367|ref|NC_001803.1|
1079_15738      109255272|ref|NC_008168.1|
1079_17891      299778956|ref|NC_014260.1|
1079_18414      157781212|ref|NC_009823.1|
1079_18414      157781216|ref|NC_009824.1|
1079_20312      9629367|ref|NC_001803.1|
1079_20497      9629357|ref|NC_001802.1|
1079_26750      9629367|ref|NC_001803.1|
1079_27926      9628113|ref|NC_001659.1|
1079_27926      9628113|ref|NC_001659.1|
1079_28033      84662653|ref|NC_007710.1|
1079_30020      47835019|ref|NC_004333.2|
1079_30371      9629367|ref|NC_001803.1|
1079_35750      50313241|ref|NC_001491.2|
1079_35750      50313241|ref|NC_001491.2|
111_463428      56694721|ref|NC_006560.1|
111_464636      114680053|ref|NC_008349.1|
111_464636      9627742|ref|NC_001623.1|
111_465190      9627186|ref|NC_001539.1|
111_467613      51557483|ref|NC_006151.1|
111_467613      51557483|ref|NC_006151.1|
111_467975      9627742|ref|NC_001623.1|
111_467975      114680053|ref|NC_008349.1|
111_467975      23577820|ref|NC_004323.1|
111_469706      21426072|ref|NC_004003.1|
111_469706      21426072|ref|NC_004003.1|
111_469793      146261990|ref|NC_001826.2|
111_470996      203454602|ref|NC_011273.1|
111_473637      281415946|ref|NC_013650.1|
111_473637      203458877|ref|NC_011269.1|
111_473637      109393216|ref|NC_008207.1|
111_473637      203457352|ref|NC_011272.1|
111_473637      203460520|ref|NC_011270.1|
111_473637      29566511|ref|NC_004687.1|
111_473637      204305660|ref|NC_011271.1|
5576_315871     168804017|ref|NC_010356.1|
5576_316443     9629198|ref|NC_001781.1|
5576_324191     148727082|ref|NC_009541.1|
5576_327936     9629267|ref|NC_001798.1|
5576_327936     9629267|ref|NC_001798.1|
5576_327936     9629267|ref|NC_001798.1|
5576_330546     216905965|ref|NC_011645.1|
5576_333512     57659681|ref|NC_006659.1|
5576_333512     57753428|ref|NC_006634.1|
5576_333512     57659681|ref|NC_006659.1|
5576_353878     20522096|ref|NC_003795.1|
5576_354562     9627186|ref|NC_001539.1|
5576_354577     19718363|ref|NC_003461.1|
5576_358444     48696722|ref|NC_005881.1|
5576_358444     48696722|ref|NC_005881.1|
5576_366975     9629178|ref|NC_001753.1|
5576_368020     239505241|ref|NC_012783.1|
5576_371413     48696722|ref|NC_005881.1|
5576_371413     48696722|ref|NC_005881.1|
5576_375881     48696722|ref|NC_005881.1|

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] frequency, count rows, data for heat map

Reply via email to