[ 
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923414#comment-17923414
 ] 

Tim Allison commented on TIKA-4375:
-----------------------------------

I did notice several handfuls of documents that are no longer detected as 
"comma" (273 files) or "tab" delimited (944) ...just plain text files. I took a 
look at these two:

{{commoncrawl3/HZ/HZ57NY54I7QJIK5CB7U7TAEVAJRR6R2N}}
{{commoncrawl3/SA/SAUXGJWDMQ2YMUW4XC7O366UAPVTCYLL}}

The problem is that they have a header line without commas and then there's a 
csv file.... This also happens in 3.1.0... I'm not sure what the best way to 
handle this is. I don't think this is a showstopper.

> Regression tests for 2.9.3 release
> ----------------------------------
>
>                 Key: TIKA-4375
>                 URL: https://issues.apache.org/jira/browse/TIKA-4375
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: 43R5U3BXJUDJXDZ25OAE33ZU47362WLV.zip, 
> LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf, RYT4H6OCPKZPFG3YK5PGLETS6Q3SBUDV, 
> reports-tika-2.9.3-rc1.tgz, tika-2.9.2-v-tika-2.9.3-reports.tgz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to