On 06/15/2013 07:46 AM, Steffen Durinck wrote:
Hi Martin,
I see this change leads to confusion and people having to change their code,
I've changed (biomaRt 2.17.2) the default value of bmHeader to FALSE, so column
naming will be as it used to be.
When a query fails or one is in doubt of the column naming then this parameter
can be set to TRUE to make the query work and get the header as given by the
BioMart server.
Thanks Steffen, this seems (to me!) like a good compromise. Martin
Cheers,
Steffen
On Tue, Jun 11, 2013 at 5:43 PM, Martin Morgan <mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>> wrote:
On 06/07/2013 09:39 PM, Steffen Durinck wrote:
Hi Martin,
The original behaviour is offered through bmHeader = FALSE in the getBM
query.
Below is the long story why this change came about (it would be good to
hear
which solution is preferred by others):
Hi Steffen -- thanks for the response. I saw the bmHeader flag but the
documentation made it sound like something I'd use if the request failed
TRUE. This should only be switched off if the default
behavior results in errors, setting to off might still be
able to retrieve your data in that case
but from the description below it sounds like it is appropriate and safe for
within-database queries when listAttributes() shows that there is a
one-to-one relationship between the 'name' attributes used in the query and
the corresponding 'description' of the attributes.
Martin
In most cases getBM returns the result in the order of the attributes
in the
input query. So what getBM used to do is make the attributes vector the
column
names of the query result. This return order is however not preserved
in
instances where one does a query over multiple datasets e.g. mouse and
human.
In that case one can not predict the order of the result and this
would make
the column names not match the actually returned fields. So there was a
push
that getBM uses the header information provided by the BioMart service
which is
available upon request. This ensures that the column names are always
correct.
The downside though is that the column names returned by the BioMart
service
are not the attribute name but it's description so instead of a column
name
'affy_hg_u95av2' we get 'Affy HG U95AV2 probeset'. To keep the column
naming
as it used to be, I then would map the attribute description back to the
attribute name and then use the corresponding attribute name as column
name for
the query result. This worked until I discovered that the attribute
descriptions are not unique, so there is no one to one mapping from a
description to a attribute name and this made the getBM code crash. I
then
decided that the best thing to do is by default to use the headers
provided by
the BioMart service to ensure queries never crash due to problems on the
R side.
And to enable attribute naming as it originally was done I added the
bmHeader=FALSE option. This will be correct in most uses except for
queries
across multple datasets.
Best,
Steffen
On Fri, Jun 7, 2013 at 5:31 PM, Martin Morgan <mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> wrote:
Hi Steffen --
getBM now returns the 'description' rather than 'name' of biomaRt
columns, e.g.,
mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart<-useDataset("hsapiens_____gene_ensembl",mart)
df <- getBM(attributes=c("affy_hg_____u95av2", "hgnc_symbol",
"chromosome_name" , "band"),
filters="affy_hg_u95av2",____values=c("1939_at","1503_at","____1454_at"),,
mart=mart)
returns
> df ## devel
Affy HG U95AV2 probeset HGNC symbol Chromosome Name Band
1 1939_at TP53 17 p13.1
2 1503_at BRCA2 13 q13.1
3 1454_at SMAD3 15 q22.33
rather than
> df ## release
affy_hg_u95av2 hgnc_symbol chromosome_name band
1 1939_at TP53 17 p13.1
2 1503_at BRCA2 13 q13.1
3 1454_at SMAD3 15 q22.33
This makes it difficult to access columns via df$... (breaking code
in at
least a couple of packages) and it is a little confusing to ask for
'affy_hg_u95av2' but get 'Affy HG U95AV2 probeset'. I wonder if the
original
behaviour could be offered, either as an option or as a similarly
named
function, or (my preference) the new behavior could be provided by
something
like getBiomart() -- fancy function name for fancy column names?
Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel