Bugs item #676346, was opened at 2003-01-28 17:59 Message generated for change (Comment added) made by facundobatista You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=676346&group_id=5470
Category: Unicode Group: Python 2.2.2 Status: Open Resolution: None Priority: 3 Submitted By: David M. Grimes (dmgrime) Assigned to: M.-A. Lemburg (lemburg) Summary: String formatting operation Unicode problem. Initial Comment: When performing a string formatting operation using %s and a unicode argument, the argument evaluation is performed more than once. In certain environments (see example) this leads to excessive calls. It seems Python-2.2.2:Objects/stringobject.c:3394 is where PyObject_GetItem is used (for dictionary-like formatting args). Later, at :3509, there is a"goto unicode" when a string argument is actually unicode. At this point, everything resets and we do it all over again in PyUnicode_Format. There is an underlying assumption that the cost of the call to PyObject_GetItem is very low (since we're going to do them all again for unicode). We've got a Python-based templating system which uses a very simple Mix-In class to facilitate flexible page generation. At the core is a simple __getitem__ implementation which maps calls to getattr(): class mixin: def __getitem__(self, name): print '%r::__getitem__(%s)' % (self, name) hook = getattr(self, name) if callable(hook): return hook() else: return hook Obviously, the print is diagnostic. So, this basic mechanism allows one to write hierarchical templates filling in content found in "%(xxxx)s" escapes with functions returning strings. It has worked extremely well for us. BUT, we recently did some XML-based work which uncovered this strange unicode behaviour. Given the following classes: class w1u(mixin): v1 = u'v1' class w2u(mixin): def v2(self): return '%(v1)s' % w1u() class w3u(mixin): def v3(self): return '%(v2)s' % w2u() class w1(mixin): v1 = 'v1' class w2(mixin): def v2(self): return '%(v1)s' % w1() class w3(mixin): def v3(self): return '%(v2)s' % w2() And test case: print 'All string:' print '%(v3)s' % w3() print print 'Unicode injected at w1u:' print '%(v3)s' % w3u() print As we can see, the only difference between the w{1,2,3} and w{1,2,3}u classes is that w1u defines v1 as unicode where w1 uses a "normal" string. What we see is the string-based one shows 3 calls, as expected: All string: <__main__.w3 instance at 0x8150524>::__getitem__(v3) <__main__.w2 instance at 0x814effc>::__getitem__(v2) <__main__.w1 instance at 0x814f024>::__getitem__(v1) v1 But the unicode causes a tree-like recursion: Unicode injected at w1u: <__main__.w3u instance at 0x8150524>::__getitem__(v3) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w3u instance at 0x8150524>::__getitem__(v3) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) v1 I'm sure this isn't a "common" use of the string formatting mechanism, but it seems that evaluating the arguments multiple times could be a bad thing. It certainly is for us 8^) We're running this on a RedHat 7.3/8.0 setup, not that it appears to matter (from looking in stringojbect.c). Also appears to still be a problem in 2.3a1. Any comments? Help? Questions? ---------------------------------------------------------------------- Comment By: Facundo Batista (facundobatista) Date: 2005-01-11 00:54 Message: Logged In: YES user_id=752496 Please, could you verify if this problem persists in Python 2.3.4 or 2.4? If yes, in which version? Can you provide a test case? If the problem is solved, from which version? Note that if you fail to answer in one month, I'll close this bug as "Won't fix". Thank you! . Facundo ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-01-28 19:23 Message: Logged In: YES user_id=38388 I don't see how you can avoid fetching the Unicode argument a second time without restructuring the formatting code altogether. If you know that your arguments can be Unicode, you should start with a Unicode formatting string to begin with. That's faster and doesn't involve a fallback solution. If you still want to see this fixed, I'd suggest to submit a patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=676346&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com