beamliu created TIKA-3714:
-----------------------------

             Summary: cannot retrieve file correctly which contains non ascii 
char in path
                 Key: TIKA-3714
                 URL: https://issues.apache.org/jira/browse/TIKA-3714
             Project: Tika
          Issue Type: Bug
          Components: server
    Affects Versions: 2.3.0
            Reporter: beamliu


Produce:

call a rest to detect the file media type, the file exists in the file system.
{code:java}
curl --verbose -X PUT http://localhost:9998/detect/stream -H "fetcherName: 
minio-data" -H "fetchKey: 中文.docx" {code}
but the header fetchKey cannot be processed correctly, it will lead to 
FileNotFound exception, as the fetchKey cannot be correctly submitted to server.

According to RFC of the HTTP/1.1 it is not possible sending non US-ASCII 
symbols in the HTTP headers, but the current mechanism in tika 
pipe(https://cwiki.apache.org/confluence/display/TIKA/tika-pipes#FileSystemEmitter)
 is trying to use http header to carry the file path information, it is very 
common that the file path contians none ascii chars.

 

Suggest to support http parameters for fetcherName and fetchKey. The http 
parameters can handle none ascii chars correctly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to