Tilman Hausherr created TIKA-4278: ------------------------------------- Summary: TextAndCSVParser doesn't detect semicolon separated file Key: TIKA-4278 URL: https://issues.apache.org/jira/browse/TIKA-4278 Project: Tika Issue Type: Bug Components: parser Affects Versions: 2.9.2 Reporter: Tilman Hausherr
I ran the code from the attached SO issue and yes it doesn't detect semicolon separated files. The reason is this line in {{TextAndCSVParser.java}}: {code:java} private static final char[] DEFAULT_DELIMITERS = new char[]\{',', '\t'}; {code} This is later uses by {{CSVSniffer}}. For some reason the other delimiters (pipe, colon and semicolon) aren't in that array, although they are in {{CHAR_TO_STRING_DELIMITER_MAP}}. I modified {{DEFAULT_DELIMITERS}} and now it works for semicolon. Can I change this by adding the missing delimiters or was there a reason that I missed? -- This message was sent by Atlassian Jira (v8.20.10#820010)