Re: [jira] [Updated] (TIKA-1048) XMLParser should add whitespace between elements

Mattmann, Chris A (388J) Thu, 20 Dec 2012 12:52:36 -0800

+1...

Cheers,
Chris


On 12/20/12 4:23 AM, "Michael McCandless" <[email protected]>
wrote:

>Hi Oleg,
>
>UIMA could be useful for extracting text from XML (I'm not familiar
>enough with it...), but I think we should still fix Tika's own XML
>extraction.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>On Thu, Dec 20, 2012 at 6:14 AM, Oleg Tikhonov <[email protected]> wrote:
>> Hi Make,
>>
>> May be consider using of UIMA ("the rule engine") ?
>>
>> BR,
>> Oleg
>>
>>
>>
>> On Thu, Dec 20, 2012 at 1:05 PM, Michael McCandless (JIRA)
>> <[email protected]>wrote:
>>
>>>
>>>      [
>>> 
>>>https://issues.apache.org/jira/browse/TIKA-1048?page=com.atlassian.jira.
>>>plugin.system.issuetabpanels:all-tabpanel]
>>>
>>> Michael McCandless updated TIKA-1048:
>>> -------------------------------------
>>>
>>>     Attachment: TIKA-1048.patch
>>>
>>> Patch w/ failing test ... I'm not sure where/how to best fix this yet
>>>...
>>>
>>> > XMLParser should add whitespace between elements
>>> > ------------------------------------------------
>>> >
>>> >                 Key: TIKA-1048
>>> >                 URL: https://issues.apache.org/jira/browse/TIKA-1048
>>> >             Project: Tika
>>> >          Issue Type: Bug
>>> >          Components: parser
>>> >            Reporter: Michael McCandless
>>> >             Fix For: 1.3
>>> >
>>> >         Attachments: TIKA-1048.patch
>>> >
>>> >
>>> > If the incoming XML is compact (ie doesn't have whitespace between
>>> elements), I think we should somehow add whitespace between elements
>>>when
>>> extracting text?
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>>> administrators
>>> For more information on JIRA, see:
>>>http://www.atlassian.com/software/jira
>>>

Re: [jira] [Updated] (TIKA-1048) XMLParser should add whitespace between elements

Reply via email to