On 06/07/2013 09:39 PM, Steffen Durinck wrote:
Hi Martin,
The original behaviour is offered through bmHeader = FALSE in the getBM query.
Below is the long story why this change came about (it would be good to hear
which solution is preferred by others):
Hi Steffen -- thanks for the response. I saw the bmHeader flag but the
documentation made it sound like something I'd use if the request failed
TRUE. This should only be switched off if the default
behavior results in errors, setting to off might still be
able to retrieve your data in that case
but from the description below it sounds like it is appropriate and safe for
within-database queries when listAttributes() shows that there is a one-to-one
relationship between the 'name' attributes used in the query and the
corresponding 'description' of the attributes.
Martin
In most cases getBM returns the result in the order of the attributes in the
input query. So what getBM used to do is make the attributes vector the column
names of the query result. This return order is however not preserved in
instances where one does a query over multiple datasets e.g. mouse and human.
In that case one can not predict the order of the result and this would make
the column names not match the actually returned fields. So there was a push
that getBM uses the header information provided by the BioMart service which is
available upon request. This ensures that the column names are always correct.
The downside though is that the column names returned by the BioMart service
are not the attribute name but it's description so instead of a column name
'affy_hg_u95av2' we get 'Affy HG U95AV2 probeset'. To keep the column naming
as it used to be, I then would map the attribute description back to the
attribute name and then use the corresponding attribute name as column name for
the query result. This worked until I discovered that the attribute
descriptions are not unique, so there is no one to one mapping from a
description to a attribute name and this made the getBM code crash. I then
decided that the best thing to do is by default to use the headers provided by
the BioMart service to ensure queries never crash due to problems on the R side.
And to enable attribute naming as it originally was done I added the
bmHeader=FALSE option. This will be correct in most uses except for queries
across multple datasets.
Best,
Steffen
On Fri, Jun 7, 2013 at 5:31 PM, Martin Morgan <mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>> wrote:
Hi Steffen --
getBM now returns the 'description' rather than 'name' of biomaRt columns,
e.g.,
mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart<-useDataset("hsapiens___gene_ensembl",mart)
df <- getBM(attributes=c("affy_hg___u95av2", "hgnc_symbol",
"chromosome_name" , "band"),
filters="affy_hg_u95av2",__values=c("1939_at","1503_at","__1454_at"),,
mart=mart)
returns
> df ## devel
Affy HG U95AV2 probeset HGNC symbol Chromosome Name Band
1 1939_at TP53 17 p13.1
2 1503_at BRCA2 13 q13.1
3 1454_at SMAD3 15 q22.33
rather than
> df ## release
affy_hg_u95av2 hgnc_symbol chromosome_name band
1 1939_at TP53 17 p13.1
2 1503_at BRCA2 13 q13.1
3 1454_at SMAD3 15 q22.33
This makes it difficult to access columns via df$... (breaking code in at
least a couple of packages) and it is a little confusing to ask for
'affy_hg_u95av2' but get 'Affy HG U95AV2 probeset'. I wonder if the original
behaviour could be offered, either as an option or as a similarly named
function, or (my preference) the new behavior could be provided by something
like getBiomart() -- fancy function name for fancy column names?
Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel