[ 
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047095#comment-14047095
 ] 

Tilman Hausherr commented on TIKA-1300:
---------------------------------------

{quote}
Make sure to delete handful of infected files
{quote}
I hope that current antivirus sw detects these files. Is this on purpose from 
that "digitalcorpora" site, or were these (government) files already infected 
at the time they were collected?

> Switch default PDFBox parser to NonSequentialParser
> ---------------------------------------------------
>
>                 Key: TIKA-1300
>                 URL: https://issues.apache.org/jira/browse/TIKA-1300
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Minor
>             Fix For: 1.7
>
>         Attachments: tika_1_6_ClassicsVsNonSeq.zip
>
>
> On TIKA-1298, [~tilman] recommended switching Tika's default to the 
> NonSequentialParser. We added a parameter to use the NonSequentialParser in 
> TIKA-1201, and there's some good discussion there about the benefits.
> Is the community in favor of switching the default now?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to