Ronan lanore created TIKA-2301:
----------------------------------

             Summary: Tika add char to txt file when parsing with '-J -t' 
options
                 Key: TIKA-2301
                 URL: https://issues.apache.org/jira/browse/TIKA-2301
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.14, 1.11
            Reporter: Ronan lanore
            Priority: Minor


Créate a file with text editor with content 

{code}
Tika txt content
{code}

no return line

Parse it with

{code}
java -jar tika-app-1.14.jar -t 
~/Documents/git/system/nodejs/searchEs/test/ressources/txtFiles/tika-content.txt
{code}

The result appear with tow "\n" at the end of file.
why ?

Parse it with 'J' and -t

{code}
java -jar tika-app-1.14.jar -J -t 
~/Documents/git/system/nodejs/searchEs/test/ressources/txtFiles/tika-content.txt
{code}

Result: 

{code}
[{"Content-Encoding":"ISO-8859-1","Content-Length":"17","Content-Type":"text/plain;
 
charset\u003dISO-8859-1","X-Parsed-By":["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.txt.TXTParser"],"X-TIKA:content":"\n\n\n\n\n\n\n\n\n\nTika
 txt 
content\n\n","X-TIKA:parse_time_millis":"64","resourceName":"tika-content.txt"}]
{code}

They are a lot of '\n' adding at begining of "X-TIKA:content"

It's the same with tika-server with "/rmeta/text" path.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to