Brent Dax wrote:

> Leopold Toetsch:
> # 2) For bitmaps I would provide a bitlist.c with functions for setting
> # and testing bits. This bitlist would be based on list, so it
> # should be
> # fast enough and had no limits WRT unicode chars.
>
> Note that the "Bitmaps" used by rx are only bitmaps within US-ASCII, to
> keep size down.

I know.

> Instead, high bit characters are
> encoded in a separate string. It seems to me that this behavior isn't
> useful in the general case.

I would use a list of int32, items_per_chunk = 8 (or whatever is optimal for $architecture) to have 256 bits in one chunk.
So testing for ascii would be as cheap as currently.
For unicode chars the list would just expand, eventually creating sparse holes. This would take more space, but would proably be faster for the general case.

leo


Reply via email to