Richard L. Barnes wrote:

Section 5.6, "Note that a particular text frame might include a partial UTF-8 
sequence, however the whole message MUST contain valid UTF-8"
This requirement is meaningless, since the concept of a "message" is not 
defined here.  Suggest going back to a requirement that a frame MUST contain valid UTF-8 
(i.e., that it breaks at code-point boundaries).
No please. This has been already discussed.

Imagine I must send a very big WS UTF-8 message and due to max frame
size requeriments (still to know how such requiremente is
"negotiated") I need to split it in N frames. This feature would work
at the very transport core layer.

Probably I have a function that splits the whole WS message into
chunks of N bytes (I mean "bytes" because I do know the max frame size
in *bytes*), so such function just counts N bytes from the WS message
and generates a frame. Please don't force such function to be
Unicode/UTF-8 aware, no please.

Clearly it already has to be WebSocket aware, and it already has to read the 
opcode in order to distinguish data frames from control frames.  Adding on a 
requirement to break at code point boundaries does not seem hugely onerous.  
It's three lines of C:

/* uint8_t *new_frame_start = *old_frame_start;
new_frame_start += DESIRED_FRAME_LENGTH;
*/
if (opcode & 0x0f == 0x01) { /* If this is a text frame */
   while (*new_frame_start & 0xc0 == 0x80) { /* While inside a code point */
       new_frame_start--; /* Back up one octet */
   }
   /* new_frame_start is now at the beginning of a code point */
}

In contrast, *not* requiring breaking at UTF-8 code points means that clients 
can't do any meaningful validation on text frames.

That is not entirely true. One can check if a full Unicode character was received at the end of a frame and just buffer enough from the previous frame to continue validation.

There are also other types of possible UTF-8 brokeness, as overlong sequences, invalid bytes, etc. These can also be detected early.

Which means you might as well get rid of text frames entirely.


_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Reply via email to