On 06/15/2013 07:46 AM, Steffen Durinck wrote:
Hi Martin,

I see this change leads to confusion and people having to change their code,
I've changed (biomaRt 2.17.2) the default value of bmHeader to FALSE, so column
naming will be as it used to be.
When a query fails or one is in doubt of the column naming then this parameter
can be set to TRUE to make the query work and get the header as given by the
BioMart server.

Thanks Steffen, this seems (to me!) like a good compromise. Martin


Cheers,
Steffen


On Tue, Jun 11, 2013 at 5:43 PM, Martin Morgan <mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>> wrote:

    On 06/07/2013 09:39 PM, Steffen Durinck wrote:

        Hi Martin,

        The original behaviour is offered through bmHeader = FALSE in the getBM
        query.
        Below is the long story why this change came about (it would be good to 
hear
        which solution is preferred by others):


    Hi Steffen -- thanks for the response. I saw the bmHeader flag but the
    documentation made it sound like something I'd use if the request failed


               TRUE.  This should only be switched off if the default
               behavior results in errors, setting to off might still be
               able to retrieve your data in that case

    but from the description below it sounds like it is appropriate and safe for
    within-database queries when listAttributes() shows that there is a
    one-to-one relationship between the 'name' attributes used in the query  and
    the corresponding 'description' of the attributes.

    Martin


        In most cases getBM returns the result in the order of the attributes 
in the
        input query.  So what getBM used to do is make the attributes vector the
        column
        names of the query result.  This return order is however not preserved 
in
        instances where one does a query over multiple datasets e.g. mouse and
        human.
           In that case one can not predict the order of the result and this
        would make
        the column names not match the actually returned fields.  So there was a
        push
        that getBM uses the header information provided by the BioMart service
        which is
        available upon request.  This ensures that the column names are always
        correct.
           The downside though is that the column names returned by the BioMart
        service
        are not the attribute name but it's description so instead of a column 
name
        'affy_hg_u95av2'  we get 'Affy HG U95AV2 probeset'.  To keep the column
        naming
        as it used to be, I then would map the attribute description back to the
        attribute name and then use the corresponding attribute name as column
        name for
        the query result.  This worked until I discovered that the attribute
        descriptions are not unique, so there is no one to one mapping from a
        description to a attribute name and this made the getBM code crash.  I 
then
        decided that the best thing to do is by default to use the headers
        provided by
        the BioMart service to ensure queries never crash due to problems on the
        R side.
           And to enable attribute naming as it originally was done I added the
        bmHeader=FALSE option.  This will be correct in most uses except for 
queries
        across multple datasets.

        Best,
        Steffen



        On Fri, Jun 7, 2013 at 5:31 PM, Martin Morgan <mtmor...@fhcrc.org
        <mailto:mtmor...@fhcrc.org>
        <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> wrote:

             Hi Steffen --

             getBM now returns the 'description' rather than 'name' of biomaRt
        columns, e.g.,

                   mart <- useMart("ensembl")
                   datasets <- listDatasets(mart)
                   mart<-useDataset("hsapiens_____gene_ensembl",mart)
                   df <- getBM(attributes=c("affy_hg_____u95av2", "hgnc_symbol",
                                            "chromosome_name" , "band"),


        
filters="affy_hg_u95av2",____values=c("1939_at","1503_at","____1454_at"),,

                          mart=mart)

             returns

              > df ## devel
                Affy HG U95AV2 probeset HGNC symbol Chromosome Name   Band
             1                 1939_at        TP53              17  p13.1
             2                 1503_at       BRCA2              13  q13.1
             3                 1454_at       SMAD3              15 q22.33

             rather than

              > df  ## release
                affy_hg_u95av2 hgnc_symbol chromosome_name   band
             1        1939_at        TP53              17  p13.1
             2        1503_at       BRCA2              13  q13.1
             3        1454_at       SMAD3              15 q22.33

             This makes it difficult to access columns via df$... (breaking code
        in at
             least a couple of packages) and it is a little confusing to ask for
             'affy_hg_u95av2' but get 'Affy HG U95AV2 probeset'. I wonder if the
        original
             behaviour could be offered, either as an option or as a similarly 
named
             function, or (my preference) the new behavior could be provided by
        something
             like getBiomart() -- fancy function name for fancy column names?

             Martin
             --
             Computational Biology / Fred Hutchinson Cancer Research Center
             1100 Fairview Ave. N.
             PO Box 19024 Seattle, WA 98109

             Location: Arnold Building M1 B861
             Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
        <tel:%28206%29%20667-2793>




    --
    Computational Biology / Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N.
    PO Box 19024 Seattle, WA 98109

    Location: Arnold Building M1 B861
    Phone: (206) 667-2793 <tel:%28206%29%20667-2793>




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to