On Tue, Dec 24, 2013 at 10:31:37PM +, Thorsten Glaser wrote:
> Strake dixit:
>
> >Use wchar.h functions and a sane libc, e.g. musl, which has a pure
> >UTF-8 C locale, which ISO C explicitly allows [1].
> >
> >The 8-bit clarity what POSIX wants [1] seems nonsense to me, as one
> >can use byte
Strake dixit:
>Use wchar.h functions and a sane libc, e.g. musl, which has a pure
>UTF-8 C locale, which ISO C explicitly allows [1].
>
>The 8-bit clarity what POSIX wants [1] seems nonsense to me, as one
>can use byte functions for that, but I may be wrong.
^^
Not always, see
On Tue, Dec 24, 2013 at 01:07:10PM -0500, Strake wrote:
> On 24/12/2013, Silvan Jegen wrote:
> > So I guess the question boils down to whether you would rather use
> > libutf or the standardized, POSIX-locale-dependent wchar.h functions for
> > the UTF-8 conversion. I see one advantage of the wch
On 24/12/2013, Silvan Jegen wrote:
> So I guess the question boils down to whether you would rather use
> libutf or the standardized, POSIX-locale-dependent wchar.h functions for
> the UTF-8 conversion. I see one advantage of the wchar.h functions:
> If we use them we could avoid adding an extern
On Tue, Dec 24, 2013 at 05:20:08PM +0100, Silvan Jegen wrote:
> So I guess the question boils down to whether you would rather use
> libutf or the standardized, POSIX-locale-dependent wchar.h functions for
> the UTF-8 conversion. I see one advantage of the wchar.h functions:
> If we use them we co
On Thu, Nov 28, 2013 at 12:45:40PM +0200, sin wrote:
> On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote:
> > If you you would rather not take this version, what approach would
> > you take for the character set mapping when using UTF-8? A hashmap-,
> > or B-tree-based solution or someth
On Sat, Nov 30, 2013 at 12:38:21PM +0100, Silvan Jegen wrote:
> BTW, the most recently updated version of
> the library seems to be at https://github.com/cls/libutf/commits/master
> and not at http://git.suckless.org/libutf/ for some reason.
I'll rebase the github repo and push it at some point so
Silvan Jegen dixit:
>That sounds reasonable but requires that we convert UTF-8 to UTF-32
>which should not be strictly necessary when we only map one UTF-8 value
>to another.
Arrgh, no. UTF-8 and UTF-32/UCS-4 are encodings of numerical Unicode
codepoints. When working with text documents, you alw
On Thu, Nov 28, 2013 at 12:45:40PM +0200, sin wrote:
> On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote:
> > Hi
> >
> > This is a braindead and incomplete implementation of tr that only
> > works for one-byte encodings. Do you think it makes sense to use this
> > implementation as some
On Thu, Nov 28, 2013 at 01:24:40PM -0500, Strake wrote:
> [..]
>
> > UTF-32 is an encoding that is identical to the unicode point as far as
> > I know. So what I am thinking is that one would either use the UTF-8
> > representation of the Unicode point as an index, or the unicode point
> > itself.
On Thu, Nov 28, 2013 at 07:01:17PM +, Thorsten Glaser wrote:
> Silvan Jegen dixit:
>
> >If I understand correctly you would use mmap to allocate a sparse
> >memory area into which we could then directly index (either using
> >UTF-8 or UTF-32 indices), right? Since mmap needs a file descriptor
On Thu, Nov 28, 2013 at 8:21 PM, Gregor Best wrote:
>> [...]
>> anon = (char*)mmap(NULL, 4096, PROT_READ|PROT_WRITE,
>> MAP_ANON|MAP_SHARED, -1, 0);
>>
>> that probably means it may not be that portable after all. Thanks for
>> making me aware of it in any case.
>> [...]
>
> *BSD has
> [...]
> anon = (char*)mmap(NULL, 4096, PROT_READ|PROT_WRITE,
> MAP_ANON|MAP_SHARED, -1, 0);
>
> that probably means it may not be that portable after all. Thanks for
> making me aware of it in any case.
> [...]
*BSD has it, and one of the Gentoo machines I have access to has it to
Silvan Jegen dixit:
>If I understand correctly you would use mmap to allocate a sparse
>memory area into which we could then directly index (either using
>UTF-8 or UTF-32 indices), right? Since mmap needs a file descriptor
I think that wouldn’t help much.
>Sadly, I do not follow. I recognize tha
On 28/11/2013, Silvan Jegen wrote:
> On Thu, Nov 28, 2013 at 11:45:33AM -0500, Strake wrote:
>> > (either using UTF-8 or UTF-32 indices), right?
>>
>> I meant Unicodepoints; those are just Unicodecs.
>
> UTF-32 is an encoding that is identical to the unicode point as far as
> I know. So what I am
On Thu, Nov 28, 2013 at 11:45:33AM -0500, Strake wrote:
> > (either using UTF-8 or UTF-32 indices), right?
>
> I meant Unicodepoints; those are just Unicodecs.
UTF-32 is an encoding that is identical to the unicode point as far as
I know. So what I am thinking is that one would either use the UTF
On 28/11/2013, Silvan Jegen wrote:
> If I understand correctly you would use mmap to allocate a sparse
> memory area into which we could then directly index
Yes.
> (either using UTF-8 or UTF-32 indices), right?
I meant Unicodepoints; those are just Unicodecs.
> Since mmap needs a file descript
Thanks for the comments!
On Tue, Nov 26, 2013 at 11:40 PM, Thorsten Glaser wrote:
> Strake dixit:
>>On 26/11/2013, Silvan Jegen wrote:
>>> If you you would rather not take this version, what approach would
>>> you take for the character set mapping when using UTF-8?
>>
>>On Linux, one can easily
On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote:
> Hi
>
> This is a braindead and incomplete implementation of tr that only
> works for one-byte encodings. Do you think it makes sense to use this
> implementation as some kind of stopgap-measure until we have a more
> robust version of
Strake dixit:
>On 26/11/2013, Silvan Jegen wrote:
>> If you you would rather not take this version, what approach would
>> you take for the character set mapping when using UTF-8?
>
>On Linux, one can easily make a sparse array with 1-page granularity
>with mmap, and so simply use a (wchar_t [])
On 26/11/2013, Silvan Jegen wrote:
> If you you would rather not take this version, what approach would
> you take for the character set mapping when using UTF-8?
On Linux, one can easily make a sparse array with 1-page granularity
with mmap, and so simply use a (wchar_t []) or (Rune []), but I'm
Hi
This is a braindead and incomplete implementation of tr that only
works for one-byte encodings. Do you think it makes sense to use this
implementation as some kind of stopgap-measure until we have a more
robust version of tr?
If you you would rather not take this version, what approach would
y
22 matches
Mail list logo