Package: librecode0
Version: 3.6-12
Severity: normal

According to the info page, recode_perform_task() should return the
error code RECODE_UNTRANSLATABLE in task->error_so_far if the input
contains characters that cannot be represented in the output charset.

However, it returns RECODE_INVALID_INPUT when trying to translate
certain chars from utf8 to latin1, even if the input is valid utf8.

Below's an example C program that show the bug.  It tries to translate
the string "á ç  α ζ" from utf8 into latin1.  The á and ç work fine,
but it chokes on the alpha (as it should, because latin1 doesn't
contain an alpha).  However, the error code it returns is 4
(==RECODE_INVALID_INPUT) instead of 3 (==RECODE_UNTRANSLATABLE).

This bug obviously makes it impossible to distinguish between invalid
inputs (which, in a user application, should throw an error) or
characters that simple cannot be represented in the desired charset
(which could be replaced by a ? for example).

#include <stdio.h>
#include <stdbool.h>
#include <recodext.h>
#include <string.h>

int
main ()
{
    /* utf8 test string: 2 chars ('a, ,c) representable in latin1,
     * followed by 2 chars (alpha, zeta) that cannot be represented 
     * in latin1 */
    char greek_utf_str[] = "\303\241 \303\247  \316\261 \316\266";
    char buf[100] = "";

    RECODE_OUTER outer = recode_new_outer (false);
    RECODE_REQUEST request = recode_new_request (outer);
    RECODE_TASK task;
    bool success;

    recode_scan_request (request, "utf-8..latin1");

    task = recode_new_task (request);
    task->input.buffer = &(greek_utf_str[0]);
    task->input.cursor = task->input.buffer;
    task->input.limit  = task->input.buffer + sizeof(greek_utf_str);
    task->output.buffer = &(buf[0]);
    task->output.cursor = task->output.buffer;
    task->output.limit = task->output.buffer + sizeof(buf);

    success = recode_perform_task (task);

    printf("task completed with error %i\n", task->error_so_far);
    printf("output buffer: ");
    while (task->output.buffer < task->output.cursor) {
        printf("%02X ", (unsigned char) *(task->output.buffer++));
    }
    printf("\n");

    return 0;
}


-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.14.3
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)

Versions of packages librecode0 depends on:
ii  libc6                         2.3.5-6    GNU C Library: Shared libraries an

librecode0 recommends no packages.

-- no debconf information

Reply via email to