beamliu created TIKA-3714:
-----------------------------
Summary: cannot retrieve file correctly which contains non ascii
char in path
Key: TIKA-3714
URL: https://issues.apache.org/jira/browse/TIKA-3714
Project: Tika
Issue Type: Bug
Components: server
Affects Versions: 2.3.0
Reporter: beamliu
Produce:
call a rest to detect the file media type, the file exists in the file system.
{code:java}
curl --verbose -X PUT http://localhost:9998/detect/stream -H "fetcherName:
minio-data" -H "fetchKey: 中文.docx" {code}
but the header fetchKey cannot be processed correctly, it will lead to
FileNotFound exception, as the fetchKey cannot be correctly submitted to server.
According to RFC of the HTTP/1.1 it is not possible sending non US-ASCII
symbols in the HTTP headers, but the current mechanism in tika
pipe(https://cwiki.apache.org/confluence/display/TIKA/tika-pipes#FileSystemEmitter)
is trying to use http header to carry the file path information, it is very
common that the file path contians none ascii chars.
Suggest to support http parameters for fetcherName and fetchKey. The http
parameters can handle none ascii chars correctly.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)