Read something about parallel processing and how I/O should be done by a
single process.
Suggestion: write a different file from each thread then combine the
results with cat or similar.
Hope it helps
mario
On 04-Jul-11 11:58, Ramzi TEMANNI wrote:
Hi
I'm processing sequencing data trying to collapsing the locations of each
unique sequence and write the results to a file (as storing that in a table
will require 10GB mem at least)
so I wrote a function that, given a sequence id, provide the needed line to
be stored
library(doMC) # load library
registerDoMC(12) # assign the Number of CPU
fileConn<-file(paste(fq_file,"_SeqID.txt",sep=""),open = "at") # open
connection
writeLines(paste("ReadID","Freq","Seq","LOC_UG","Nb_UG_Seq",sep="\t"),
fileConn) # write header
foreach(i=1:length(uniq.Seq)) %dopar% # for eqch unique sequence
{
writeLines(paste(gettable1(uniq.Seq[i]),collapse=" "), fileConn) #write
the the results line
}
close(fileConn)
the code excute well, but the problem is that some lines are wired:
The header and lot of lines are ok :
ReadID Freq Seq LOC_UG Nb_UG_Seq
HWI-EA332_0036:5:16:9530:21025#ATGC/1 XXXXXXXXXXXXXXXXXXXX 2
XXXXX_10130:489:+,XXXXX_10130:489:+ 2
HWI-EA332_0036:5:117:6674:4940#ATGC/1 XXXXXXXXXXXXXXXXXXXX 1
XXXXX:432:-,XXXXX:432:- 2
HWI-EA332_0036:5:62:15592:7375#ATGC/1 XXXXXXXXXXXXXXXXXXXX 2
XXXXX_22660:253:+,XXXXX_22660:253:+ 2
HWI-EA332_0036:5:110:14349:8422#ATGC/1 XXXXXXXXXXXXXXXXXXXX 4
XXXXX_13806:399:+,XXXXX_13806:399:+,XXXXX_27263:481:+,XXXXX_27263:481:+ 4
other looks wired
HWI-EA332_0036:5:17:1400ReadID Freq Seq LOC_UG Nb_UG_Seq
HWI-EA332_0036:5:61:7734:4201ReadID Freq Seq LOC_UG Nb_UG_Seq
HWI-EA332_0036:5:117:5361:10666#ATGReadID Freq Seq LOC_UG
Nb_UG_Seq
HWI-EA332_0036:5:115:7421:20664#ATGC/1 GATCReadID Freq Seq
LOC_UG Nb_UG_Seq
HWI-EA332_0036:5:175:95:- 2
HWI-EA332_0036:5JCVI_35536:444:+ 2
XXXXXXXXX 1 XXXXX_22484:571:-,XXXXX_22484:571:- 2
Is this due to the fact that one process start to write prior the other has
finished ?
Is there a way to solve this problem ?
Any suggestions would be greatly appreciated.
Thanks and have a nice day.
Best,
Ramzi TEMANNI
http://www.linkedin.com/in/ramzitemanni
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Ing. Mario Valle
Data Analysis and Visualization Group | http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.