Hi Marc,

Thanks a lot for your advice.

I think as far as I know the gff3 file is the only way I can use to get
Gmax's latest build for annotation from phytozome(http://www.phytozome.net/).
Now it's publicly available

ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz

And the reason I didn't provide the 'exonRankAttributeName' is that because
there is no explicit numbers which indicate the exon rank directly in that
gff3 file, examples are like

Gm01 phytozome8_0 gene 27643 27977 . - . ID=Glyma01g00210;Name=Glyma01g00210
Gm01 phytozome8_0 mRNA 27643 27977 . - .
ID=PAC:26325839;Name=Glyma01g00210.1;pacid=26325839;longest=1;Parent=Glyma01g00210
Gm01 phytozome8_0 exon 27913 27977 . - .
ID=PAC:26325839.exon.1;Parent=PAC:26325839;pacid=26325839
Gm01 phytozome8_0 CDS 27913 27977 . - 0
ID=PAC:26325839.CDS.1;Parent=PAC:26325839;pacid=26325839
Gm01 phytozome8_0 exon 27643 27811 . - .
ID=PAC:26325839.exon.2;Parent=PAC:26325839;pacid=26325839
Gm01 phytozome8_0 CDS 27643 27811 . - 1
ID=PAC:26325839.CDS.2;Parent=PAC:26325839;pacid=26325839

The ID attributes looks like it has information about the rank, I see
*.exon.1 *.exon.2, so I guess I can extract those information as extra
column manually and specify them in the function of '
makeTranscriptDbFromGFF'.

btw, Is this required? It looks like the GenomicFeatures trying to infer
exon rank if I didn't provide that information, so I thought
'exonRankAttributeName'
is optional at first.

Thanks again

Tengfei





On Fri, Feb 8, 2013 at 6:08 PM, Marc Carlson <mcarl...@fhcrc.org> wrote:

> Hi Tengfei,
>
> Yes that looks like an oversight.  Thanks for reporting that!  I will
> extend makeTxDbPackage so that it's more accommodating of these newer
> transcriptDbs.  If you want to help me out, you could call saveDb() on your
> gmax189 object and send me the .sqlite file that you save it to.
>
> Also, if you have any alternate options for importing your data (other
> than using GFF or GTF): I think you probably should consider it.  The file
> specifications for these filetypes are missing key details and so you can
> very easily get a "legal" GFF or GTF file that is actually missing
> important details from it's contents.  For example, they can commonly lack
> information about the order of the exons for a given transcript, which can
> render them difficult (or impossible) to use for transcript work.   But for
> these specifications, that information is "optional".
>
>
>   Marc
>
>
>
>
> On 02/06/2013 09:46 PM, Tengfei Yin wrote:
>
>> Dear all,
>>
>> I am trying to build a txdb object from gff3 for soybean data and try to
>> make it a package. Code used like this
>>
>> gmax189<- makeTranscriptDbFromGFF("~/**Gmax_189_gene_exons.gff3",
>>                                     format = "gff3", species = "Glycine
>> max",
>>                                     dataSource = "
>> http://www.phytozome.org/";)
>> makeTxDbPackage(txdb = gmax189,
>>                  version = "0.9.1",
>>                  maintainer = "Tengfei Yin",
>>                  author = "Tengfei Yin",
>>                  destDir=".",
>>                  license="Artistic-2.0")
>>
>> Error message:
>> Error in gsub("_", "", pkgName) :
>>    error in evaluating the argument 'x' in selecting a method for function
>> 'gsub': Error: object 'pkgName' not found
>>
>>
>> Looks like my dataSource should be either BioMart or UCSC, otherwise no
>> pkgname will be produced in function .makePackageName?
>>
>> Or should I build annotation package in some other ways?
>>
>> Thanks a lot
>>
>> Tengfei
>>
>> my sessionInfo
>>
>>  sessionInfo()
>>>
>> R Under development (unstable) (2013-01-21 r61728)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10  Biobase_2.19.2
>> [4] GenomicRanges_1.11.28  IRanges_1.17.31        BiocGenerics_0.5.6
>>
>> loaded via a namespace (and not attached):
>>   [1] biomaRt_2.15.0     Biostrings_2.27.10 bitops_1.0-5
>> BSgenome_1.27.1
>>   [5] DBI_0.2-5          RCurl_1.95-3       Rsamtools_1.11.15
>>   RSQLite_0.11.2
>>   [9] rtracklayer_1.19.9 stats4_3.0.0       tools_3.0.0
>>  XML_3.95-0.1
>>
>> [13] zlibbioc_1.5.0
>>
>>
>>
> ______________________________**_________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>



-- 
Tengfei Yin
MCDB PhD student
1620 Howe Hall, 2274,
Iowa State University
Ames, IA,50011-2274

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to