[ https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923414#comment-17923414 ]
Tim Allison commented on TIKA-4375: ----------------------------------- I did notice several handfuls of documents that are no longer detected as "comma" (273 files) or "tab" delimited (944) ...just plain text files. I took a look at these two: {{commoncrawl3/HZ/HZ57NY54I7QJIK5CB7U7TAEVAJRR6R2N}} {{commoncrawl3/SA/SAUXGJWDMQ2YMUW4XC7O366UAPVTCYLL}} The problem is that they have a header line without commas and then there's a csv file.... This also happens in 3.1.0... I'm not sure what the best way to handle this is. I don't think this is a showstopper. > Regression tests for 2.9.3 release > ---------------------------------- > > Key: TIKA-4375 > URL: https://issues.apache.org/jira/browse/TIKA-4375 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Attachments: 43R5U3BXJUDJXDZ25OAE33ZU47362WLV.zip, > LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf, RYT4H6OCPKZPFG3YK5PGLETS6Q3SBUDV, > reports-tika-2.9.3-rc1.tgz, tika-2.9.2-v-tika-2.9.3-reports.tgz > > -- This message was sent by Atlassian Jira (v8.20.10#820010)