Marek Lewczuk escreveu: > Please execute following example: > select * from ts_debug('english', '<img width="182" height="120" > align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>') > > As the result you will see, that <img/> is not identified as XML tag, but > rather splitted as words, blank spaces etc. The reason for that is the fact, > that last attribute "test_aa" contains underscore in its name - when the > underscore is removed, then img tag is properly identified as XML tag. > > XML definition allows using underscore in tag and attribute names. > The problem is we already allow it in tag names but not in attribute names. So the proper fix is to allow underscore when the state is TPS_InTag; according to XML spec [1], the underscore is a valid character in attribute names.
A possible downside is that we don't have underscores in HTML attribute names. In this case, should it fail? I don't think so but... The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there isn't a problem to back-patch it. [1] http://www.w3.org/TR/REC-xml/#sec-common-syn -- Euler Taveira de Oliveira http://www.timbira.com/
Index: wparser_def.c =================================================================== RCS file: /a/pgsql/dev/anoncvs/pgsql/src/backend/tsearch/wparser_def.c,v retrieving revision 1.24 diff -c -r1.24 wparser_def.c *** wparser_def.c 16 Jul 2009 06:33:44 -0000 1.24 --- wparser_def.c 23 Sep 2009 23:19:28 -0000 *************** *** 1225,1230 **** --- 1225,1231 ---- {p_isdigit, 0, A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '=', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '-', A_NEXT, TPS_Null, 0, NULL}, + {p_iseqC, '_', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '#', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, '/', A_NEXT, TPS_Null, 0, NULL}, {p_iseqC, ':', A_NEXT, TPS_Null, 0, NULL},
-- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs