Jakub Narebski writes:
> I think the problem is not with aligning, otherwise we would simply get
> bad aling, and not visible corruption. The ACTUAL PROBLEM is most
> probably because of concatenating strings marked as UTF-8 and strings
> not marked as UTF-8. Strange things happen then in Perl,
> One solution would be to force conversion to UTF-8 on input via "open"
> pragma (e.g. "use open ':encoding(UTF-8)';"). But there is no
> UTF-8-with_fallback encoding available - we would have to write one, and
> install it as module (or fake it via Perl trickery). This mechanism is
> almost the
Junio C Hamano writes:
> Shin Kojima writes:
>
>> Offset positions should not be counted by byte length, but by actual
>> character length.
>> ...
>> # escape tabs (convert tabs to spaces)
>> sub untabify {
>> -my $line = shift;
>> +my $line = to_utf8(shift);
>>
>> while ((my $po
> ideally we should be able to say "function X takes non-UTF8 and
> works on it", "function Y takes UTF8 and works on it", and "function
> Z takes non-UTF8 and gives UTF8 data back" for each functions
> clearly, not "function W can take either UTF8 or any other garbage
> and tries to return UTF8".
Shin Kojima writes:
> Offset positions should not be counted by byte length, but by actual
> character length.
> ...
> # escape tabs (convert tabs to spaces)
> sub untabify {
> - my $line = shift;
> + my $line = to_utf8(shift);
>
> while ((my $pos = index($line, "\t")) != -1) {
Offset positions should not be counted by byte length, but by actual
character length.
>5183 # We need to untabify lines before split()'ing them;
>5184 # otherwise offsets would be invalid.
Horizontal tab is not the only case we need to consider. Please excuse
me for using yo
6 matches
Mail list logo