On 01.08.2017 04:26, Bruce Huang wrote:
Hi all,
We have placed a file named 檔名.txt into
the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
retrieve the file by an HTTP GET request from the URL, for example,
http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)
This is one of those cases where it can get very confusing very quickly, because of the
multiple opportunities for things to get encoded/decoded or not, and to be /seen/ as
encoded/decoded or not. (Such as : are we really seeing the above URL as you meant to send
it, or are we seeing some other form, as encoded by the email systems in-between ?)
Strictly speaking, according to the relevant Internet HTTP RFCs (and which ones are
relevant can be yet another confusing matter), you MAY NOT include the above Chinese
characters directly in a URL string. The set of characters/bytes allowed in a URL string
is very restrictive, and in any case does not include even the individual bytes which
would result from encoding the above Unicode characters as UTF-8.
(See : https://tools.ietf.org/html/rfc3986#section-2)
Before you send out this URL from the client, you would have to :
- encode the above Chinese characters as a UTF-8 byte sequence. This would probably result
in 3 bytes or more per character, so let's say 6 bytes in total.
- then, for each of the 6 bytes, you would have to check if they are within the range of
bytes allowed in a URL, and if not, /that/ byte should be encoded/escaped as a "%xy"
3-character ASCII byte sequence. (There are many existing functions to do that).
Then on the server side receiving this URL, the opposite transformation should
take place :
- the first step would be to "%-decode" the URL string, to restore the original bytes
which the client wanted to send. To my knowledge, all HTTP servers do that.
- then, the server and the application would have to /assume/ that URLs received from your
clients are always Unicode, UTF-8 encoded. That is (still) not the default in HTTP (the
default is still ISO-8859-1). (And there is no mechanism in the current RFCs, that allow
either client or server to indicate, in the request itself, what character set the request
URL really is written in, or should be).
But you can force Tomcat to assume this, see :
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
--> URIEncoding
(and there is also "useBodyEncodingForURI", but that does not apply in your
particular case)
- the next step would thus be for the application (e.g. the default servlet), to /assume/
that this URL is Unicode/UTF-8, and decode this into a corresponding internal Unicode string.
- and then comes the step of looking for the corresponding file in the filesystem, by the
name you got from the previous step. And depending on the OS and the filesystem, this may
be character-set-agnostic or not, and may be case-agnostic or not.
(But your problem is currently not that it does not find the file; it is that the HTTP
request itself gets rejected as invalid. So your request URI contains bytes which the
server considers - rightly or not - as invalid in a URL.)
[rant]
In other words and basically, no wonder that developers (of servers as well as of
applications) get confused from time to time, and maybe unwittingly introduce bugs when
trying to handle URLs and/or content that is anything else than English.
In that respect, the HTTP protocols are still hopelessly outdated and obnoxious when
handling the vast amounts of languages which are in use in today's real-life Internet.
And it is a never-ending wonder to me why whoever are in charge of these things, have
apparently not yet made a serious attempt at publishing a new set of coordinated HTTP (and
HTML, and CGI, and Javascript etc.) versions which would make Unicode/UTF-8 the default
charset/encoding (for URLs as well as for text content), instead of the long-obsolete
ASCII and ISO-8859-1 character sets. I would bet that millions of useless work-hours would
be saved worldwide every year by such a change.
[end of rant]
When it was on tomcat v8.0.23, everything works fine. However, after we
have migrated to the v8.0.43, the client app will receive response with
HTTP 400 Bad Request.
Most probably, that was a correction in Tomcat, which previously did not properly reject
some URLs which are invalid according to the existing (deficient) RFCs.
The code that our client app used as below. Looks
like that it didn't encode the URL path and only translate the whitespace
to %20.
Exactly. You app has to encode that URL properly before issuing the request.
Is there any solution that we can configure the tomcat 8.0.43 to make this
case works as usual(On tomcat v8.0.23), since there are lots of client
app deployed?
If "as usual" was wrong and/or could cause security issues, your chances are slim, and you
will have to update your app.
SpaceToTwenty(szServerPath, szBuf, MAXURLSIZE);
memset(szServerPath, 0, MAXURLSIZE);
strcpy(szServerPath, szBuf);
memset(szSendBuf, 0, SEND_BUF_SIZE);
// the buffer for sending to the socket
sprintf(szSendBuf, "GET %s HTTP/1.1\r\nHost:%s\r\n"
"Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
"Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3\r\n"
"Accept-Encoding: gzip, deflate\r\n"
"Connection: keep-alive\r\n\r\n",
szServerPath, szServerIP);
LOG(LOG_ERROR, "[DL_Download] szServerPath: %s, szServerIP: %s,",
szServerPath, szServerIP);
// create a socket for sending request
SockCtx = SOCKET_Create(szServerIP, iServerPort, bSSLEnable,
CERTF_PATH);
if (SockCtx == NULL)
{
LOG(LOG_ERROR, "[DL_Download] Socket Create Error!!!\n");
iReturn = _ERROR;
goto FUNC_EXIT;
}
SOCKET_Send(SockCtx, szSendBuf, strlen(szSendBuf));
memset(szRecvBuf, 0, RECV_BUF_SIZE);
iRecvBytes = SOCKET_Recv(SockCtx, szRecvBuf, sizeof(szRecvBuf));
if (iRecvBytes <= 0)
{
LOG(LOG_ERROR, "[DL_Download] Socket Recv Error!!!
iRecvBytes = %d\n",iRecvBytes);
iReturn = _ERROR;
goto FUNC_EXIT;
}
memset(szHttpStatus, 0, sizeof(szHttpStatus));
strncpy(szHttpStatus, szRecvBuf, strstr(szRecvBuf, "\r\n") -
szRecvBuf);
// here it will receive the HTTP 400 Bad Request on the tomcat
v8.0.43
// the szHttpStatus is 400
if (strstr(szHttpStatus, "200 OK") == NULL)
{
LOG(LOG_ERROR, "[DL_Download] Http Status != 200, Status =
%s\n",szHttpStatus);
iReturn = _ERROR;
goto FUNC_EXIT;
}
int SpaceToTwenty(char* szSrc, char* szDst, int iLen)
{
int iReturn = _SUCCESS;
char* c1;
char* c2;
char* c;
int new_string_length = 0;
for (c = szSrc; *c != '\0'; c++)
{
if (*c == ' ')
new_string_length += 2;
new_string_length++;
}
if (new_string_length >= iLen)
func_exit(_ERROR);
memset(szDst, 0, iLen);
for (c1 = szSrc, c2 = szDst; *c1 != '\0'; c1++)
{
if (*c1 == ' ')
{
c2[0] = '%';
c2[1] = '2';
c2[2] = '0';
c2 += 3;
}
else
{
*c2 = *c1;
c2++;
}
}
*c2 = '\0';
FUNC_EXIT:
return iReturn;
}
Thanks,
Bruce
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org