About the approch, I see one possible drawback: with this API, we
couldn't work on partial string, and we have to manage the whole string
in memory. Depending of the usage, it could be a problem (for large block
processing for example).
On Fri, Sep 04, 2015 at 03:17:31PM +1000, Damien Miller wrote:
>
> +/*
> + * Attempt to encode a UCS character as a UTF-8 sequence. Returns the number
> + * of characters used or -1 on error (insufficient space or bad code).
> + */
> +static int
> +encode_utf8(u_int32_t c, char *s, size_t slen)
> +{
> + size_t i, need;
> + u_char h;
> +
> + if (c < 0x80) {
> + if (slen >= 1) {
> + s[0] = (char)c;
> + }
> + return 1;
I think an error should be returned if slen < 1
> + } else if (c < 0x800) {
> + need = 2;
> + h = 0xc0;
> + } else if (c < 0x10000) {
> + need = 3;
> + h = 0xe0;
> + } else if (c < 0x200000) {
shouldn't be <= 0x10FFFF instead of < 0x200000 ?
> + need = 4;
> + h = 0xf0;
> + } else {
> + /* Invalid code point > U+10FFFF */
> + return -1;
> + }
> + if (need > slen)
> + return -1;
> + for (i = 0; i < need; i++) {
> + s[i] = (i == 0 ? h : 0x80);
> + s[i] |= (c >> (need - i - 1) * 6) & 0x3f;
> + }
> + return need;
> +}
--
Sebastien Marie