> Ok. I ran the modified test (now the iteration is reduced to 100000 in > liketest()). As you can see, there's huge difference. MB seems up to > ~8 times slower:-< There seems some problems existing in the > implementation. Considering REGEX is not so slow, maybe we should > employ the same design as REGEX. i.e. using wide charcters, not > multibyte streams... > > MB+LIKE > Total runtime: 1321.58 msec > Total runtime: 1718.03 msec > Total runtime: 2519.97 msec > Total runtime: 4187.05 msec > Total runtime: 7629.24 msec > Total runtime: 14456.45 msec > Total runtime: 17320.14 msec > Total runtime: 17323.65 msec > Total runtime: 17321.51 msec > > noMB+LIKE > Total runtime: 964.90 msec > Total runtime: 993.09 msec > Total runtime: 1057.40 msec > Total runtime: 1192.68 msec > Total runtime: 1494.59 msec > Total runtime: 2078.75 msec > Total runtime: 2328.77 msec > Total runtime: 2326.38 msec > Total runtime: 2330.53 msec
I did some trials with wide characters implementation and saw virtually no improvement. My guess is the logic employed in LIKE is too simple to hide the overhead of the multibyte and wide character conversion. The reason why REGEX with MB is not so slow would be the complexity of its logic, I think. As you can see in my previous postings, $1 ~ $2 operation (this is logically same as a LIKE '%a%') is, for example, almost 80 times slower than LIKE (remember that likest() loops over 10 times more than regextest()). So I decided to use a completely different approach. Now like has two matching engines, one for single byte encodings (MatchText()), the other is for multibyte ones (MBMatchText()). MatchText() is identical to the non MB version of it, and virtually no performance penalty for single byte encodings. MBMatchText() is for multibyte encodings and is identical the one used in 7.1. Here is the MB case result with SQL_ASCII encoding. Total runtime: 901.69 msec Total runtime: 939.08 msec Total runtime: 993.60 msec Total runtime: 1148.18 msec Total runtime: 1434.92 msec Total runtime: 2024.59 msec Total runtime: 2288.50 msec Total runtime: 2290.53 msec Total runtime: 2316.00 msec To accomplish this, I moved MatchText etc. to a separate file and now like.c includes it *twice* (similar technique used in regexec()). This makes like.o a little bit larger, but I believe this is worth for the optimization. -- Tatsuo Ishii ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly