[ https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047095#comment-14047095 ]
Tilman Hausherr commented on TIKA-1300: --------------------------------------- {quote} Make sure to delete handful of infected files {quote} I hope that current antivirus sw detects these files. Is this on purpose from that "digitalcorpora" site, or were these (government) files already infected at the time they were collected? > Switch default PDFBox parser to NonSequentialParser > --------------------------------------------------- > > Key: TIKA-1300 > URL: https://issues.apache.org/jira/browse/TIKA-1300 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Minor > Fix For: 1.7 > > Attachments: tika_1_6_ClassicsVsNonSeq.zip > > > On TIKA-1298, [~tilman] recommended switching Tika's default to the > NonSequentialParser. We added a parameter to use the NonSequentialParser in > TIKA-1201, and there's some good discussion there about the benefits. > Is the community in favor of switching the default now? -- This message was sent by Atlassian JIRA (v6.2#6252)