On Thu, Oct 15, 2020 at 01:59:38PM -0400, John Naylor wrote: > I think I've seen a trie recommended somewhere, maybe the official website. > That said, I was able to get the hash working for recomposition (split into > a separate patch, and both of them now leave frontend alone), and I'm > pleased to say it's 50-75x faster than linear search in simple tests. I'd > be curious how it compares to ICU now. Perhaps Daniel Verite would be > interested in testing again? (CC'd)
Yeah, that would be interesting to compare. Now the gains proposed by this patch are already a good step forward, so I don't think that it should be a blocker for a solution we have at hand as the numbers speak by themselves here. So if something better gets proposed, we could always change the decomposition and recomposition logic as needed. > select count(normalize(t, NFC)) from ( > select md5(i::text) as t from > generate_series(1,100000) as i > ) s; > > master patch > 18800ms 257ms My environment was showing HEAD as being a bit faster with 15s, while the patch gets "only" down to 290~300ms (compiled with -O2, as I guess you did). Nice. + # Then the second + return -1 if $a2 < $b2; + return 1 if $a2 > $b2; Should say "second code point" here? + hashkey = pg_hton64(((uint64) start << 32) | (uint64) code); + h = recompinfo.hash(&hashkey); This choice should be documented, and most likely we should have comments on the perl and C sides to keep track of the relationship between the two. The binary sizes of libpgcommon_shlib.a and libpgcommon.a change because Decomp_hash_func() gets included, impacting libpq. Structurally, wouldn't it be better to move this part into its own, backend-only, header? It could be possible to paint the difference with some ifdef FRONTEND of course, or just keep things as they are because this can be useful for some out-of-core frontend tool? But if we keep that as a separate header then any C part can decide to include it or not, so frontend tools could also make this choice. Note that we don't include unicode_normprops_table.h for frontends in unicode_norm.c, but that's the case of unicode_norm_table.h. -- Michael
signature.asc
Description: PGP signature