Matthew Barnett added the comment: In function SRE_MATCH, the code for SRE_OP_GROUPREF (line 1290) contains this:
while (p < e) { if (ctx->ptr >= end || SRE_CHARGET(state, ctx->ptr, 0) != SRE_CHARGET(state, p, 0)) RETURN_FAILURE; p += state->charsize; ctx->ptr += state->charsize; } However, the code for SRE_OP_GROUPREF_IGNORE (line 1316) contains this: while (p < e) { if (ctx->ptr >= end || state->lower(SRE_CHARGET(state, ctx->ptr, 0)) != state->lower(*p)) RETURN_FAILURE; p++; ctx->ptr += state->charsize; } (In both cases 'p' is of type 'char*'.) The problem appears to be that the latter is still using '*p' and 'p++' and is thus always working with chars (it gets and advances 1 byte at a time instead of 1, 2 or 4 bytes for Unicode). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16688> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com