[
https://issues.apache.org/jira/browse/HTTPCORE-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761800#comment-17761800
]
ASF subversion and git services commented on HTTPCORE-757:
----------------------------------------------------------
Commit fa6dbd47bdb222fedc7f50ee8be6c7399d2d8603 in httpcomponents-core's branch
refs/heads/HTTPCORE-757 from Oleg Kalnichevski
[ https://gitbox.apache.org/repos/asf?p=httpcomponents-core.git;h=fa6dbd47b ]
HTTPCORE-757: AbstractCharDataConsumer fails to correctly handle incomplete
UTF8 encoded data split across multiple data packets
> AbstractCharDataConsumer jams up with incomplete UTF-8 data
> -----------------------------------------------------------
>
> Key: HTTPCORE-757
> URL: https://issues.apache.org/jira/browse/HTTPCORE-757
> Project: HttpComponents HttpCore
> Issue Type: Bug
> Affects Versions: 5.2.2
> Reporter: Simon White
> Priority: Major
> Fix For: 5.2.3, 5.3-alpha1
>
>
> While streaming UTF-8-encoded data with the async HTTP client, we observed
> the following behaviour:
> * After several minutes of consuming from our stream, the client jammed up
> permanently and did not recover without a restart
> Upon closer inspection, we realised that `AbstractCharDataConsumer` (which we
> were extending to parse our data) was receiving incomplete UTF-8 characters
> from the end of the stream (i.e. the last character in the stream was
> multi-byte and we hadn't yet received all bytes for it), and this was causing
> it to go into an infinite loop on the following code:
> {code:java}
> @Override
> public final void consume(final ByteBuffer src) throws IOException {
> final CharsetDecoder charsetDecoder = getCharsetDecoder();
> while (src.hasRemaining()) {
> checkResult(charsetDecoder.decode(src, charBuffer, false));
> doDecode(false);
> }
> }{code}
> This was fairly time-consuming to figure out and required us to go deep into
> the brain of the library.
> We don't know how this could be improved exactly, but a couple of thoughts:
> * If this class expects a completely valid text string in the buffer with no
> trailing bytes:
> ** Then it should throw some exception once it detects that it's failing to
> completely process the buffer
> ** And the caller could deal with this somehow (either by catching this
> exception and waiting for more data, or otherwise ensuring that the input is
> valid before calling the consumer - though it's not clear how it could do
> that without also having knowledge of the encoding)
> ** Alternatively, the caller could simply bubble up the exception and let us
> know that we shouldn't be using this class when there is only partial data.
> That would also have helped us to diagnose the issue
> * OTOH if this class is expected to be able to handle partially complete
> input:
> ** Then it should store the trailing unprocessable bytes into a buffer, and
> prepend them to the beginning of the next input (hopefully resulting in a
> valid UTF-8 string, though it would also have to handle the case where it
> didn't)
> ** This was roughly how we solved the issue on our side - we extended `
> AbstractBinDataConsumer` instead and handled the encoding ourselves
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]