At Tue, 28 Feb 2017 15:20:06 +0900, Michael Paquier <michael.paqu...@gmail.com> wrote in <cab7npqr49krgp6qaakal2v3hcnn+dnzv8dq_ysgbdsr6b_y...@mail.gmail.com> > On Mon, Feb 27, 2017 at 5:37 PM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > At Wed, 22 Feb 2017 16:06:14 +0900, Michael Paquier > > <michael.paqu...@gmail.com> wrote in > > <cab7npqrtq+7zjxuptbsr18mxvw7mtd29mn+91n7ag8fe5ac...@mail.gmail.com> > >> In order to conduct sanity checks on the shape of the radix tree maps > >> compared to the existing maps, having map_checker surely makes sense. > >> Now in the final result I don't think we need it. The existing map > >> files ought to be replaced by their radix versions at the end, and > >> map_checker should be removed. This leads to a couple of > >> simplifications, like Makefile, and reduces the maintenance to one > >> mechanism. > > > > Hmm.. Though I don't remember clearly what the radix map of the > > first version looked like, the current radix map seems > > human-readable for me. It might be by practice or by additional > > comments in map files. Anyway I removed all of the stuff so as > > not to generate the plain maps. But I didn't change the names of > > _radix.map and just commented out the line to output the plain > > maps in UCS_to_*.pl. Combined maps are still in the plain format > > so print_tables was changed to take character tables separately > > for regular (non-combined) characters and combined characters. > > Do others have thoughts to offer on the matter? I would think that the > new radix maps should just replace by the old plain ones, and that the > only way to build the maps going forward is to use the new methods. > The radix trees is the only thing used in the backend code as well > (conv.c). We could keep the way to build the old maps, with the > map_checker in module out of the core code. FWIW, I am fine to add the > old APIs in my plugin repository on github and have the sanity checks > in that as well. And of course also publish on this thread a module to > do that.
I couldn't make out my mind to move to radix tree completely, but UtfToLocal/LocalToUtf no longer handle the "plain map"s for non-combined character so they have lost their planground. Okay, I think I removed all the trace of the plain map era. Every characters in a mapping has a comment that describes what the character is or where it is defined. This information is no longer useful (radix map doesn't have a plance to show it) but I left it for debug use. (This might just be justification..) > > - Split the property {direction} into two boolean properties > > {to_unicode} and {from_unicode}. > > > > - Make the {direction} property an integer and compared with > > defined constants $BOTH, $TO_UNICODE and $FROM_UNICODE using > > the '=' operator. > > > > I choosed the former in this patch. > > Fine for me. Thanks. > >> + $charmap{ ucs2utf($src) } = $dst; > >> + } > >> + > >> + } > >> Unnecessary newline here. > > > > removed in convutils.pm. > > > > Since Makefile ignores old .map files, the steps to generate a > > patch for map files was a bit chaged. > > > > $ rm *.map > > $ make distclean maintainer-clean all > > $ make distclean > > $ git add . > > $ git commit > > +# ignore generated files > +/map_checker > +/map_checker.h > [...] > +map_checker.h: make_mapchecker.pl $(MAPS) $(RADIXMAPS) > + $(PERL) $< > + > +map_checker.o: map_checker.c map_checker.h ../char_converter.c > + > +map_checker: map_checker.o > With map_checker out of the game, those things are not needed. Ouch! Thanks for pointing out it. Removed. > +++ b/src/backend/utils/mb/char_converter.c > @@ -0,0 +1,116 @@ > +/*------------------------------------------------------------------------- > + * > + * Character converter function using radix tree > In the simplified version of the patch, pg_mb_radix_conv() being only > needed in conv.c I think that this could just be a static local > routine. > > -#include "../../Unicode/utf8_to_koi8r.map" > -#include "../../Unicode/koi8r_to_utf8.map" > -#include "../../Unicode/utf8_to_koi8u.map" > -#include "../../Unicode/koi8u_to_utf8.map" > +#include "../../Unicode/utf8_to_koi8r_radix.map" > +#include "../../Unicode/koi8r_to_utf8_radix.map" > +#include "../../Unicode/utf8_to_koi8u_radix.map" > +#include "../../Unicode/koi8u_to_utf8_radix.map" > FWIW, I am fine to use those new names as include points. > > -distclean: clean > +distclean: > rm -f $(TEXTS) > -maintainer-clean: distclean > +# maintainer-clean intentionally leaves $(TEXTS) > +maintainer-clean: > Why is that? There is also a useless diff down that code block. It *was* for convenience but now it is automatically downloaded so such distinction donsn't offer anything good. Changed it to remove $(TEXTS). > +conv.o: conv.c char_converter.c > This also can go away. Touching char_converter.c will be ignored if it is removed. Did you mistake it for map_checker? > -print_tables("EUC_JIS_2004", \@all, 1); > +# print_tables("EUC_JIS_2004", \@regular, undef, 1); > +print_radix_trees("EUC_JIS_2004", \@regular); > +print_tables("EUC_JIS_2004", undef, \@combined, 1); > [...] > sub print_tables > { > - my ($charset, $table, $verbose) = @_; > + my ($charset, $regular, $combined, $verbose) = @_; > print_tables is only used for combined maps, you could remove $regular > from it and just keep $combined around, perhaps renaming print_tables > to print_combined_maps? Renamed to print_combied_maps. And the code-comment pointed in the comment by the previous mail is rewritten as Robert's suggestion. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
0001-Use-radix-tree-for-character-conversion.patch.gz
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers