On 7 September 2011 00:43, Richard L. Barnes <[email protected]> wrote:
>>> Section 5.6, "Note that a particular text frame might include a partial 
>>> UTF-8 sequence, however the whole message MUST contain valid UTF-8"
>>> This requirement is meaningless, since the concept of a "message" is not 
>>> defined here.  Suggest going back to a requirement that a frame MUST 
>>> contain valid UTF-8 (i.e., that it breaks at code-point boundaries).
>>
>> No please. This has been already discussed.
>>
>> Imagine I must send a very big WS UTF-8 message and due to max frame
>> size requeriments (still to know how such requiremente is
>> "negotiated") I need to split it in N frames. This feature would work
>> at the very transport core layer.
>>
>> Probably I have a function that splits the whole WS message into
>> chunks of N bytes (I mean "bytes" because I do know the max frame size
>> in *bytes*), so such function just counts N bytes from the WS message
>> and generates a frame. Please don't force such function to be
>> Unicode/UTF-8 aware, no please.
>
> Clearly it already has to be WebSocket aware, and it already has to read the 
> opcode in order to distinguish data frames from control frames.  Adding on a 
> requirement to break at code point boundaries does not seem hugely onerous.  
> It's three lines of C:

It is more difficult than that.

For example, I currently fragment frames when a buffer fills up.
During the filling I'm not looking at the bytes and don't care if it
is a text or binary frame.      If I have to fragment on a utf-8 char
boundary, then I'll have to a) handle text and binary differently b)
actually inspect the bytes rather than just bulk copy them c) deal
with residue bytes left over that could not be put into the fragement

This would be a throw it all out and start again kind of change for me.
_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Reply via email to