On 01.08.2017 04:26, Bruce Huang wrote:
Hi all,

We have placed a file named 檔名.txt into
the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
retrieve the file by an HTTP GET request from the URL, for example,
http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)

This is one of those cases where it can get very confusing very quickly, because of the multiple opportunities for things to get encoded/decoded or not, and to be /seen/ as encoded/decoded or not. (Such as : are we really seeing the above URL as you meant to send it, or are we seeing some other form, as encoded by the email systems in-between ?)

Strictly speaking, according to the relevant Internet HTTP RFCs (and which ones are relevant can be yet another confusing matter), you MAY NOT include the above Chinese characters directly in a URL string. The set of characters/bytes allowed in a URL string is very restrictive, and in any case does not include even the individual bytes which would result from encoding the above Unicode characters as UTF-8.
(See : https://tools.ietf.org/html/rfc3986#section-2)

Before you send out this URL from the client, you would have to :
- encode the above Chinese characters as a UTF-8 byte sequence. This would probably result in 3 bytes or more per character, so let's say 6 bytes in total. - then, for each of the 6 bytes, you would have to check if they are within the range of bytes allowed in a URL, and if not, /that/ byte should be encoded/escaped as a "%xy" 3-character ASCII byte sequence. (There are many existing functions to do that).

Then on the server side receiving this URL, the opposite transformation should 
take place :
- the first step would be to "%-decode" the URL string, to restore the original bytes which the client wanted to send. To my knowledge, all HTTP servers do that.

- then, the server and the application would have to /assume/ that URLs received from your clients are always Unicode, UTF-8 encoded. That is (still) not the default in HTTP (the default is still ISO-8859-1). (And there is no mechanism in the current RFCs, that allow either client or server to indicate, in the request itself, what character set the request URL really is written in, or should be).
But you can force Tomcat to assume this, see :
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
--> URIEncoding
(and there is also "useBodyEncodingForURI", but that does not apply in your 
particular case)
- the next step would thus be for the application (e.g. the default servlet), to /assume/ that this URL is Unicode/UTF-8, and decode this into a corresponding internal Unicode string. - and then comes the step of looking for the corresponding file in the filesystem, by the name you got from the previous step. And depending on the OS and the filesystem, this may be character-set-agnostic or not, and may be case-agnostic or not. (But your problem is currently not that it does not find the file; it is that the HTTP request itself gets rejected as invalid. So your request URI contains bytes which the server considers - rightly or not - as invalid in a URL.)

[rant]
In other words and basically, no wonder that developers (of servers as well as of applications) get confused from time to time, and maybe unwittingly introduce bugs when trying to handle URLs and/or content that is anything else than English. In that respect, the HTTP protocols are still hopelessly outdated and obnoxious when handling the vast amounts of languages which are in use in today's real-life Internet.

And it is a never-ending wonder to me why whoever are in charge of these things, have apparently not yet made a serious attempt at publishing a new set of coordinated HTTP (and HTML, and CGI, and Javascript etc.) versions which would make Unicode/UTF-8 the default charset/encoding (for URLs as well as for text content), instead of the long-obsolete ASCII and ISO-8859-1 character sets. I would bet that millions of useless work-hours would be saved worldwide every year by such a change.
[end of rant]



When it was on tomcat v8.0.23, everything works fine. However, after we
have migrated to the v8.0.43, the client app will receive response with
HTTP 400 Bad Request.

Most probably, that was a correction in Tomcat, which previously did not properly reject some URLs which are invalid according to the existing (deficient) RFCs.

The code that our client app used as below. Looks
like that it didn't encode the URL path and only translate the whitespace
to %20.

Exactly. You app has to encode that URL properly before issuing the request.


Is there any solution that we can configure the tomcat 8.0.43 to make this
case works as usual(On tomcat v8.0.23), since there are lots of client
app deployed?


If "as usual" was wrong and/or could cause security issues, your chances are slim, and you will have to update your app.


         SpaceToTwenty(szServerPath, szBuf, MAXURLSIZE);

         memset(szServerPath, 0, MAXURLSIZE);

         strcpy(szServerPath, szBuf);



         memset(szSendBuf, 0, SEND_BUF_SIZE);

         // the buffer for sending to the socket

         sprintf(szSendBuf, "GET %s HTTP/1.1\r\nHost:%s\r\n"

                            "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"

                            "Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3\r\n"

                            "Accept-Encoding: gzip, deflate\r\n"

                            "Connection: keep-alive\r\n\r\n",

                            szServerPath, szServerIP);

         LOG(LOG_ERROR, "[DL_Download] szServerPath: %s, szServerIP: %s,",

                         szServerPath, szServerIP);

         // create a socket for sending request

         SockCtx = SOCKET_Create(szServerIP, iServerPort, bSSLEnable,
CERTF_PATH);

         if (SockCtx == NULL)

         {

                 LOG(LOG_ERROR, "[DL_Download] Socket Create Error!!!\n");

                 iReturn = _ERROR;

                 goto FUNC_EXIT;

         }



         SOCKET_Send(SockCtx, szSendBuf, strlen(szSendBuf));

         memset(szRecvBuf, 0, RECV_BUF_SIZE);

         iRecvBytes = SOCKET_Recv(SockCtx, szRecvBuf, sizeof(szRecvBuf));

         if (iRecvBytes <= 0)

         {

                 LOG(LOG_ERROR, "[DL_Download] Socket Recv Error!!!
iRecvBytes = %d\n",iRecvBytes);

                 iReturn = _ERROR;

                 goto FUNC_EXIT;

         }



         memset(szHttpStatus, 0, sizeof(szHttpStatus));

         strncpy(szHttpStatus, szRecvBuf, strstr(szRecvBuf, "\r\n") -
szRecvBuf);

         // here it will receive the HTTP 400 Bad Request on the tomcat
v8.0.43

         // the szHttpStatus is 400

         if (strstr(szHttpStatus, "200 OK") == NULL)

         {

                 LOG(LOG_ERROR, "[DL_Download] Http Status != 200, Status =
%s\n",szHttpStatus);

                 iReturn = _ERROR;

                 goto FUNC_EXIT;

         }



int SpaceToTwenty(char* szSrc, char* szDst, int iLen)

{

         int iReturn = _SUCCESS;

         char* c1;

         char* c2;

         char* c;

         int new_string_length = 0;



         for (c = szSrc; *c != '\0'; c++)

         {

                 if (*c == ' ')

                         new_string_length += 2;

                 new_string_length++;

         }



         if (new_string_length >= iLen)

                 func_exit(_ERROR);



         memset(szDst, 0, iLen);

         for (c1 = szSrc, c2 = szDst; *c1 != '\0'; c1++)

         {

                 if (*c1 == ' ')

                 {

                         c2[0] = '%';

                         c2[1] = '2';

                         c2[2] = '0';

                         c2 += 3;

                 }

                 else

                 {

                         *c2 = *c1;

                         c2++;

                 }

         }

         *c2 = '\0';

FUNC_EXIT:

         return iReturn;

}




Thanks,

Bruce



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to