+1... Cheers, Chris
On 12/20/12 4:23 AM, "Michael McCandless" <luc...@mikemccandless.com> wrote: >Hi Oleg, > >UIMA could be useful for extracting text from XML (I'm not familiar >enough with it...), but I think we should still fix Tika's own XML >extraction. > >Mike McCandless > >http://blog.mikemccandless.com > >On Thu, Dec 20, 2012 at 6:14 AM, Oleg Tikhonov <o...@apache.org> wrote: >> Hi Make, >> >> May be consider using of UIMA ("the rule engine") ? >> >> BR, >> Oleg >> >> >> >> On Thu, Dec 20, 2012 at 1:05 PM, Michael McCandless (JIRA) >> <j...@apache.org>wrote: >> >>> >>> [ >>> >>>https://issues.apache.org/jira/browse/TIKA-1048?page=com.atlassian.jira. >>>plugin.system.issuetabpanels:all-tabpanel] >>> >>> Michael McCandless updated TIKA-1048: >>> ------------------------------------- >>> >>> Attachment: TIKA-1048.patch >>> >>> Patch w/ failing test ... I'm not sure where/how to best fix this yet >>>... >>> >>> > XMLParser should add whitespace between elements >>> > ------------------------------------------------ >>> > >>> > Key: TIKA-1048 >>> > URL: https://issues.apache.org/jira/browse/TIKA-1048 >>> > Project: Tika >>> > Issue Type: Bug >>> > Components: parser >>> > Reporter: Michael McCandless >>> > Fix For: 1.3 >>> > >>> > Attachments: TIKA-1048.patch >>> > >>> > >>> > If the incoming XML is compact (ie doesn't have whitespace between >>> elements), I think we should somehow add whitespace between elements >>>when >>> extracting text? >>> >>> -- >>> This message is automatically generated by JIRA. >>> If you think it was sent incorrectly, please contact your JIRA >>> administrators >>> For more information on JIRA, see: >>>http://www.atlassian.com/software/jira >>>