Hi,
Try this:
lines1<- readLines(textConnection("gene1 or1|1234 or3|56 or4|793
gene4 or2|347
gene5 or3|23 or7|123456789")) 


lines2<-readLines(textConnection(">or1|1234
ATCGGATTCAGG
>or2|347
GAACCTATCGGGGGGGGAATTTATATATTTTA
>or3|56
ATCGGAGATATAACCAATC
>or3|23
AAAATTAACAAGAGAATAGACAAAAAAA
>or4|793
ATCTCTCTCCTCTCTCTCTAAAAA
>or7|123456789
ACGTGTGTACCCCC")) 

lines2New<-unlist(lapply(split(lines2,(seq_along(lines2)-1)%/%2+1),function(x) 
paste(x,collapse="\n")),use.names=FALSE)


res<-lapply(lines1,function(x) {x1<- strsplit(x," ")[[1]]; x1New<-x1[-1];x2<-  
gsub(">(.*)\\n.*","\\1",lines2New);lines3<-lines2New[match(x1New,x2)];write.table(lines3,paste0(x1[1],".txt"),row.names=FALSE,quote=FALSE)})


Attached is one of the files generated by the code.
A.K.


Hi all, 

I have two input files. First file (file1.txt) contains entries in the 
following tab delimited format: 

gene1   or1|1234        or3|56  or4|793 
gene4   or2|347 
gene5   or3|23  or7|123456789 

....... 
.. 


The second file (file2.txt) contains some additional features along with the 
header line of the first file, such as: 

>or1|1234 
ATCGGATTCAGG 
>or2|347 
GAACCTATCGGGGGGGGAATTTA 
TATATTTTA 
>or3|56 
ATCGGAGATATAACCAATC 
>or3|23 
AAAATTAACAAGAGAATAGACAAAAAAA 
>or4|793 
ATCTCTCTCCTCTCTCTCTAAAAA 
>or7|123456789 
ACGTGTGTACCCCC 

.... 
.. 

From these two files, I want to extract entries by row wise 
header matching and rename the output file as the first column in file1.
 For example, in the above case, 3 output files will generate. 

the first output file would named as "gene1.txt" and it contains: 

>or1|1234 
ATCGGATTCAGG 
>or3|56 
ATCGGAGATATAACCAATC 
>or4|793 
ATCTCTCTCCTCTCTCTCTAAAAA 

the second output file would named as "gene4.txt" and it contains: 

>or2|347 
GAACCTATCGGGGGGGGAATTTATATATTTTA 

the third output file would named as "gene5.txt" and it contains: 

>or3|23 
AAAATTAACAAGAGAATAGACAAAAAAA 
>or7|123456789 
ACGTGTGTACCCCC 

Any help in solving the problem is highly appreciated. Thanks in advance. 
x
>or1|1234
ATCGGATTCAGG
>or3|56
ATCGGAGATATAACCAATC
>or4|793
ATCTCTCTCCTCTCTCTCTAAAAA
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to