Re: [CVS ci] hash compare

2003-11-13 Thread Jeff Clites
On Nov 13, 2003, at 2:21 PM, Nicholas Clark wrote: On Wed, Nov 12, 2003 at 02:07:52PM -0800, Mark A. Biggar wrote: And even when the sequence of Unicode code-points is the same, some encodings have multiple byte sequences for the same code-point. For example, UTF-8 has two ways to encode a code-

Re: [CVS ci] hash compare

2003-11-13 Thread Nicholas Clark
On Wed, Nov 12, 2003 at 02:07:52PM -0800, Mark A. Biggar wrote: > And even when the sequence of Unicode code-points is the same, some > encodings have multiple byte sequences for the same code-point. For > example, UTF-8 has two ways to encode a code-point that is larger the > 0x (Unicode as

Re: [CVS ci] hash compare

2003-11-13 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Thu, 13 Nov 2003, Leopold Toetsch wrote: >> Dan. It starts with ascii keys, unicode code-points 0x00..0x7f. >> When the first non-ascii key is to be stored, *ascii* keys are changed to >> utf8. > Which doesn't do much good if we've got non-ascii, non-u

Re: [CVS ci] hash compare

2003-11-13 Thread Thies C. Arntzen
On Wed, Nov 12, 2003 at 09:18:24PM +, Nicholas Clark wrote: > On Wed, Nov 12, 2003 at 01:57:14PM -0500, Dan Sugalski wrote: > > > You're going to run into problems no matter what you do, and as > > transcoding could happen with each comparison arguably you need to make a > > local copy of the

Re: [CVS ci] hash compare

2003-11-13 Thread Dan Sugalski
On Thu, 13 Nov 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > On Thu, 13 Nov 2003, Leopold Toetsch wrote: > > >> * as long as there are only ascii keys: noop > >> * on first non ascii key, convert all hash to utf8 - doesn't change > >>hash values > > > Well... thi

Re: [CVS ci] hash compare

2003-11-13 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Thu, 13 Nov 2003, Leopold Toetsch wrote: >> * as long as there are only ascii keys: noop >> * on first non ascii key, convert all hash to utf8 - doesn't change >>hash values > Well... this is the place where things fall down. It does change hash

Re: [CVS ci] hash compare

2003-11-13 Thread Dan Sugalski
On Thu, 13 Nov 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > > You're going to run into problems no matter what you do, and as > > transcoding could happen with each comparison arguably you need to make a > > local copy of the string for each comparison, as otherwise y

Re: [CVS ci] hash compare

2003-11-13 Thread Leopold Toetsch
Peter Gibbs <[EMAIL PROTECTED]> wrote: > I would prefer this to be done via an iterator, as it would also solve > the skip_backward problems with DBCS encoding. Something like: There was a discussion, that current string iterators are wrong. They should take a position argument (and start of stri

Re: [CVS ci] hash compare

2003-11-13 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote: > You're going to run into problems no matter what you do, and as > transcoding could happen with each comparison arguably you need to make a > local copy of the string for each comparison, as otherwise you run the > risk of significant data loss as a sring

Re: [CVS ci] hash compare

2003-11-12 Thread Mark A. Biggar
Mark A. Biggar wrote: 0x (Unicode as code-points up to 0x10FFF), as either two 16 bit Oops that should be 0x10^^^ -- [EMAIL PROTECTED] [EMAIL PROTECTED]

Re: [CVS ci] hash compare

2003-11-12 Thread Mark A. Biggar
Nicholas Clark wrote: On Wed, Nov 12, 2003 at 01:57:14PM -0500, Dan Sugalski wrote: You're going to run into problems no matter what you do, and as transcoding could happen with each comparison arguably you need to make a local copy of the string for each comparison, as otherwise you run the ris

Re: [CVS ci] hash compare

2003-11-12 Thread Nicholas Clark
On Wed, Nov 12, 2003 at 11:35:41PM +0200, Peter Gibbs wrote: > I would prefer this to be done via an iterator, as it would also solve > the skip_backward problems with DBCS encoding. Something like: > For the hash_utf8 benchmark with the current code I get numbers like: > 3.758691 > 5.535916

Re: [CVS ci] hash compare

2003-11-12 Thread Peter Gibbs
Leopold Toetsch wrote: > To improve this (and some other operations like this) further, it would > be nice, if we could combine encoding->decode and encoding->skip_forward > to another function: > >INTVAL code = s->encoding->decode_skip_forward_1( &sptr ); I would prefer this to be done via a

Re: [CVS ci] hash compare

2003-11-12 Thread Nicholas Clark
On Wed, Nov 12, 2003 at 01:57:14PM -0500, Dan Sugalski wrote: > You're going to run into problems no matter what you do, and as > transcoding could happen with each comparison arguably you need to make a > local copy of the string for each comparison, as otherwise you run the > risk of significant

Re: [CVS ci] hash compare

2003-11-12 Thread Dan Sugalski
On Wed, 12 Nov 2003, Leopold Toetsch wrote: > Dan Sugalski wrote: > > > On Wed, 12 Nov 2003, Steve Fink wrote: > >>... ($x eq $y) ... couldn't this be > >>done as another string vtable entry instead of being specific to > >>hash_compare? > >> > > > > Yeah, that'd be a better way to do it. Add a st

Re: [CVS ci] hash compare

2003-11-12 Thread Leopold Toetsch
Dan Sugalski wrote: On Wed, 12 Nov 2003, Steve Fink wrote: ... ($x eq $y) ... couldn't this be done as another string vtable entry instead of being specific to hash_compare? Yeah, that'd be a better way to do it. Add a string_equal function to string.c and do the magic there instead. Good point. I

Re: [CVS ci] hash compare

2003-11-12 Thread Dan Sugalski
On Wed, 12 Nov 2003, Steve Fink wrote: > On Nov-12, Leopold Toetsch wrote: > > I've committed a change that speeds up hash_compare considerably[1], > > when comparing hashes with mixed e.g. ascii and utf8 encodings. > > I read the patch, and thought that we'll also have a lot of ($x eq $y) > and (

Re: [CVS ci] hash compare

2003-11-12 Thread Steve Fink
On Nov-12, Leopold Toetsch wrote: > I've committed a change that speeds up hash_compare considerably[1], > when comparing hashes with mixed e.g. ascii and utf8 encodings. I read the patch, and thought that we'll also have a lot of ($x eq $y) and ($x ne $y) statements that this won't accelerate --