Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

tomcat Tue, 01 Aug 2017 05:11:00 -0700

On 01.08.2017 04:26, Bruce Huang wrote:

Hi all,


We have placed a file named 檔名.txt into
the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
retrieve the file by an HTTP GET request from the URL, for example,
http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)

This is one of those cases where it can get very confusing very quickly, because of themultiple opportunities for things to get encoded/decoded or not, and to be /seen/ asencoded/decoded or not. (Such as : are we really seeing the above URL as you meant to sendit, or are we seeing some other form, as encoded by the email systems in-between ?)

Strictly speaking, according to the relevant Internet HTTP RFCs (and which ones arerelevant can be yet another confusing matter), you MAY NOT include the above Chinesecharacters directly in a URL string. The set of characters/bytes allowed in a URL stringis very restrictive, and in any case does not include even the individual bytes whichwould result from encoding the above Unicode characters as UTF-8.

(See : https://tools.ietf.org/html/rfc3986#section-2)

Before you send out this URL from the client, you would have to :

- encode the above Chinese characters as a UTF-8 byte sequence. This would probably resultin 3 bytes or more per character, so let's say 6 bytes in total.- then, for each of the 6 bytes, you would have to check if they are within the range ofbytes allowed in a URL, and if not, /that/ byte should be encoded/escaped as a "%xy"3-character ASCII byte sequence. (There are many existing functions to do that).


Then on the server side receiving this URL, the opposite transformation should 
take place :

- the first step would be to "%-decode" the URL string, to restore the original byteswhich the client wanted to send. To my knowledge, all HTTP servers do that.

- then, the server and the application would have to /assume/ that URLs received from yourclients are always Unicode, UTF-8 encoded. That is (still) not the default in HTTP (thedefault is still ISO-8859-1). (And there is no mechanism in the current RFCs, that alloweither client or server to indicate, in the request itself, what character set the requestURL really is written in, or should be).

But you can force Tomcat to assume this, see :
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
--> URIEncoding
(and there is also "useBodyEncodingForURI", but that does not apply in your 
particular case)

- the next step would thus be for the application (e.g. the default servlet), to /assume/that this URL is Unicode/UTF-8, and decode this into a corresponding internal Unicode string.- and then comes the step of looking for the corresponding file in the filesystem, by thename you got from the previous step. And depending on the OS and the filesystem, this maybe character-set-agnostic or not, and may be case-agnostic or not.(But your problem is currently not that it does not find the file; it is that the HTTPrequest itself gets rejected as invalid. So your request URI contains bytes which theserver considers - rightly or not - as invalid in a URL.)


[rant]

In other words and basically, no wonder that developers (of servers as well as ofapplications) get confused from time to time, and maybe unwittingly introduce bugs whentrying to handle URLs and/or content that is anything else than English.In that respect, the HTTP protocols are still hopelessly outdated and obnoxious whenhandling the vast amounts of languages which are in use in today's real-life Internet.

And it is a never-ending wonder to me why whoever are in charge of these things, haveapparently not yet made a serious attempt at publishing a new set of coordinated HTTP (andHTML, and CGI, and Javascript etc.) versions which would make Unicode/UTF-8 the defaultcharset/encoding (for URLs as well as for text content), instead of the long-obsoleteASCII and ISO-8859-1 character sets. I would bet that millions of useless work-hours wouldbe saved worldwide every year by such a change.

[end of rant]


When it was on tomcat v8.0.23, everything works fine. However, after we
have migrated to the v8.0.43, the client app will receive response with
HTTP 400 Bad Request.

Most probably, that was a correction in Tomcat, which previously did not properly rejectsome URLs which are invalid according to the existing (deficient) RFCs.


The code that our client app used as below. Looks

like that it didn't encode the URL path and only translate the whitespace
to %20.


Exactly. You app has to encode that URL properly before issuing the request.


Is there any solution that we can configure the tomcat 8.0.43 to make this
case works as usual(On tomcat v8.0.23), since there are lots of client
app deployed?

If "as usual" was wrong and/or could cause security issues, your chances are slim, and youwill have to update your app.

         SpaceToTwenty(szServerPath, szBuf, MAXURLSIZE);

         memset(szServerPath, 0, MAXURLSIZE);

         strcpy(szServerPath, szBuf);



         memset(szSendBuf, 0, SEND_BUF_SIZE);

         // the buffer for sending to the socket

         sprintf(szSendBuf, "GET %s HTTP/1.1\r\nHost:%s\r\n"

                            "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"

                            "Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3\r\n"

                            "Accept-Encoding: gzip, deflate\r\n"

                            "Connection: keep-alive\r\n\r\n",

                            szServerPath, szServerIP);

         LOG(LOG_ERROR, "[DL_Download] szServerPath: %s, szServerIP: %s,",

                         szServerPath, szServerIP);

         // create a socket for sending request

         SockCtx = SOCKET_Create(szServerIP, iServerPort, bSSLEnable,
CERTF_PATH);

         if (SockCtx == NULL)

         {

                 LOG(LOG_ERROR, "[DL_Download] Socket Create Error!!!\n");

                 iReturn = _ERROR;

                 goto FUNC_EXIT;

         }



         SOCKET_Send(SockCtx, szSendBuf, strlen(szSendBuf));

         memset(szRecvBuf, 0, RECV_BUF_SIZE);

         iRecvBytes = SOCKET_Recv(SockCtx, szRecvBuf, sizeof(szRecvBuf));

         if (iRecvBytes <= 0)

         {

                 LOG(LOG_ERROR, "[DL_Download] Socket Recv Error!!!
iRecvBytes = %d\n",iRecvBytes);

                 iReturn = _ERROR;

                 goto FUNC_EXIT;

         }



         memset(szHttpStatus, 0, sizeof(szHttpStatus));

         strncpy(szHttpStatus, szRecvBuf, strstr(szRecvBuf, "\r\n") -
szRecvBuf);

         // here it will receive the HTTP 400 Bad Request on the tomcat
v8.0.43

         // the szHttpStatus is 400

         if (strstr(szHttpStatus, "200 OK") == NULL)

         {

                 LOG(LOG_ERROR, "[DL_Download] Http Status != 200, Status =
%s\n",szHttpStatus);

                 iReturn = _ERROR;

                 goto FUNC_EXIT;

         }



int SpaceToTwenty(char* szSrc, char* szDst, int iLen)

{

         int iReturn = _SUCCESS;

         char* c1;

         char* c2;

         char* c;

         int new_string_length = 0;



         for (c = szSrc; *c != '\0'; c++)

         {

                 if (*c == ' ')

                         new_string_length += 2;

                 new_string_length++;

         }



         if (new_string_length >= iLen)

                 func_exit(_ERROR);



         memset(szDst, 0, iLen);

         for (c1 = szSrc, c2 = szDst; *c1 != '\0'; c1++)

         {

                 if (*c1 == ' ')

                 {

                         c2[0] = '%';

                         c2[1] = '2';

                         c2[2] = '0';

                         c2 += 3;

                 }

                 else

                 {

                         *c2 = *c1;

                         c2++;

                 }

         }

         *c2 = '\0';

FUNC_EXIT:

         return iReturn;

}




Thanks,

Bruce



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

Reply via email to