Hadley,

It’s sometimes amazing the mistakes I can make. No, it did not do what I 
wanted, which was
read_xml(str_c(with_ns_xml, collapse = “")

Reproducible example follows:
library(stringr)
library(xml2)
## Given the correct argument value for collapse, the next two lines work
no_ns <- read_xml(str_c(no_ns_xml, collapse = ""))
with_ns <- read_xml(str_c(with_ns_xml, collapse = ""))
## The next line finds the node in the XML without a namespace
xml_find_all(no_ns, "//WorkSet//Description")
## With a namespace designated in the XML
## Neither of the next two work, though I thought the second should
xml_find_all(with_ns, "//WorkSet//Description")
xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns))
## Using xml_ns_strip() works as predicted
xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description")
## I was surprised to find the incorrect namespace value did not matter
xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns))
## This also seems to ignore the namespace argument value
xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = 
xml_ns(with_ns))


Full output follows:
> ## Given the correct argument value for collapse, the next two lines work
> no_ns <- read_xml(str_c(no_ns_xml, collapse = ""))
> with_ns <- read_xml(str_c(with_ns_xml, collapse = ""))
> ## The next line finds the node in the XML without a namespace
> xml_find_all(no_ns, "//WorkSet//Description")
{xml_nodeset (1)}
[1] <Description>MFIA 9-Plex (CharlesRiver)</Description>
> ## With a namespace designated in the XML
> ## Neither of the next two work, though I thought the second should
> xml_find_all(with_ns, "//WorkSet//Description")
{xml_nodeset (0)}
> xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns))
{xml_nodeset (0)}
> ## Using xml_ns_strip() works as predicted
> xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description")
{xml_nodeset (1)}
[1] <Description>MFIA 9-Plex (CharlesRiver)</Description>
> ## I was surprised to find the incorrect namespace value did not matter
> xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns))
{xml_nodeset (1)}
[1] <Description>MFIA 9-Plex (CharlesRiver)</Description>
> ## This also seems to ignore the namespace argument value
> xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = 
> xml_ns(with_ns))
{xml_nodeset (1)}
[1] <Description>MFIA 9-Plex (CharlesRiver)</Description>
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Jan 31, 2017, at 5:52 PM, Hadley Wickham <h.wick...@gmail.com> wrote:
>
> I think you want
>
> x <- read_xml('<?xml version="1.0" ?>
>  <WorkSet xmlns="http://labkey.org/etl/xml";>
>  <Description>MFIA 9-Plex (CharlesRiver)</Description>
> </WorkSet>')
>
> The collapse argument do what you think it does.
>
> Hadley
>
> On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp <msh...@txbiomed.org> wrote:
>> Hadley,
>>
>> Thank you. I am able to get the xml_ns_strip() function to work with my file 
>> directly so I will likely be able to reach my immediate goal.
>>
>> However, I still have had no success with understanding the namespace 
>> problem. I am not able to use read_xml() using the object I generated for 
>> the reproducible example, which is simply a character vector of length 4 
>> having the contents of the XML file as produce by readLines(). I then used 
>> dput() to define the structure. The resulting structure apparently is not to 
>> the liking of read_xml(). I have reproduced the necessary code here for your 
>> convenience. There error is below.
>>
>> ##
>> library(xml2)
>> library(stringr)
>> with_ns_xml <- c("<?xml version=\"1.0\" ?>",
>>                 "<WorkSet xmlns=\"http://labkey.org/etl/xml\";>",
>>                 "<Description>MFIA 9-Plex (CharlesRiver)</Description>",
>>                 "</WorkSet>")
>> ## without str_c() collapse it complain of a vector of length > 1 also.
>> read_xml(str_c(with_ns_xml, collapse = TRUE))
>> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html 
>> = as_html,  :
>>  Start tag expected, '<' not found [4]
>>
>> ## produces the following error message.
>> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html 
>> = as_html,  :
>>  Start tag expected, '<' not found [4]
>>
>> I have similar issues with xml2::xml_find_all
>> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
>>
>> ## Produces the following error message.
>> Error in UseMethod("xml_find_all") :
>>  no applicable method for 'xml_find_all' applied to an object of class 
>> "character"
>>
>>
>>
>> R. Mark Sharp, Ph.D.
>> msh...@txbiomed.org
>>
>>
>>
>>
>>
>>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham <h.wick...@gmail.com> wrote:
>>>
>>> See the last example in ?xml2::xml_find_all or use 
>>> xml2::xml2::xml_ns_strip()
>>>
>>> Hadley
>>>
>>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp <msh...@txbiomed.org> wrote:
>>>> I am trying to read a series of XML files that use a namespace and I have 
>>>> failed, thus far, to discover the proper syntax. I have a reproducible 
>>>> example below. I have two XML character strings defined: one without a 
>>>> namespace and one with. I show that I can successfully extract the node 
>>>> using the XML string without the namespace and fail when using the XML 
>>>> string with the namespace.
>>>>
>>>> Mark
>>>> PS I am having the same problem with the xml2 package and am hoping 
>>>> understanding one with help with the other.
>>>>
>>>> ##
>>>> library(XML)
>>>> ## The first XML text (no_ns_xml) does not have a namespace defined
>>>> no_ns_xml <- c("<?xml version=\"1.0\" ?>", "<WorkSet>",
>>>>              "<Description>MFIA 9-Plex (CharlesRiver)</Description>",
>>>>              "</WorkSet>")
>>>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>>>                          useInternalNodes = TRUE)
>>>> ## The node is found
>>>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>>>
>>>> ## The second XML text (with_ns_xml) has a namespace defined
>>>> with_ns_xml <- c("<?xml version=\"1.0\" ?>",
>>>>                "<WorkSet xmlns=\"http://labkey.org/etl/xml\";>",
>>>>                "<Description>MFIA 9-Plex (CharlesRiver)</Description>",
>>>>                "</WorkSet>")
>>>>
>>>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>>>                              useInternalNodes = TRUE)
>>>> ## The node is not found
>>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>>>> ## I attempt to provide the namespace, but fail.
>>>> ns <-  "http://labkey.org/etl/xml";
>>>> names(ns)[1] <- "xmlns"
>>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>>>
>>>> R. Mark Sharp, Ph.D.
>>>> Director of Data Science Core
>>>> Southwest National Primate Research Center
>>>> Texas Biomedical Research Institute
>>>> P.O. Box 760549
>>>> San Antonio, TX 78245-0549
>>>> Telephone: (210)258-9476
>>>> e-mail: msh...@txbiomed.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>> http://hadley.nz
>>
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments 
>> transmitted, may contain privileged and confidential information and is 
>> intended solely for the exclusive use of the individual or entity to whom it 
>> is addressed. If you are not the intended recipient, you are hereby notified 
>> that any review, dissemination, distribution or copying of this e-mail 
>> and/or attachments is strictly prohibited. If you have received this e-mail 
>> in error, please immediately notify the sender stating that this 
>> transmission was misdirected; return the e-mail to sender; destroy all paper 
>> copies and delete all electronic copies from your system without disclosing 
>> its contents.
>
>
>
> --
> http://hadley.nz

CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments 
transmitted, may contain privileged and confidential information and is 
intended solely for the exclusive use of the individual or entity to whom it is 
addressed. If you are not the intended recipient, you are hereby notified that 
any review, dissemination, distribution or copying of this e-mail and/or 
attachments is strictly prohibited. If you have received this e-mail in error, 
please immediately notify the sender stating that this transmission was 
misdirected; return the e-mail to sender; destroy all paper copies and delete 
all electronic copies from your system without disclosing its contents.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to