Re: [Mesa-dev] [PATCH v3] python: Rework bytes/unicode string handling

Jose Fonseca Fri, 17 Aug 2018 05:30:01 -0700

This change caused one of our MSVC build machines to fail with


scons: Building targets ...
  Generating build\windows-x86-debug\util\xmlpool\options.h ...
Traceback (most recent call last):
  File "src\util\xmlpool\gen_xmlpool.py", line 221, in <module>
    print(line, end='')

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' inposition 68: ordinal not in range(128)

scons: *** [build\windows-x86-debug\util\xmlpool\options.h] Error 1

I have no idea why that machine is affected, but AppVeyor and my localruns are not.

Setting PYTHONIOENCODING=utf-8 helps, but then bad things still happenwhen the output is loaded src/gallium/auxiliary/pipe-loader/



But the fact is that everything was working before.

Perhaps a solution is to just start using Python 3 for the generationscripts, as it might yield more consistent results.




Jose


On 10/08/18 22:17, Mathieu Bridon wrote:

In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').

On Python 2, the read() method of a file object opened in mode 'r' will
return byte strings, while on Python 3 it will return unicode strings.

Explicitly specifying the binary mode ('rb') then decoding the byte
string means we always handle unicode strings on both Python 2 and 3.

Which in turns means all re.match(line) will return unicode strings as
well.

If we also make expandCString return unicode strings, we don't need the
call to the unicode() constructor any more.

We were using the ugettext() method because it always returns unicode
strings in Python 2, contrarily to the gettext() one which returns
byte strings. The ugettext() method doesn't exist on Python 3, so we
must use the right method on each version of Python.

The last hurdles are that Python 3 doesn't let us concatenate unicode
and byte strings directly, and that Python 2's stdout wants encoded byte
strings while Python 3's want unicode strings.

With these changes, the script gives the same output on both Python 2
and 3.

Signed-off-by: Mathieu Bridon <boche...@daitauha.fr>
---
  src/util/xmlpool/gen_xmlpool.py | 41 +++++++++++++++++++++++++--------
  1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/src/util/xmlpool/gen_xmlpool.py b/src/util/xmlpool/gen_xmlpool.py
index b0db183854..327709c7f8 100644
--- a/src/util/xmlpool/gen_xmlpool.py
+++ b/src/util/xmlpool/gen_xmlpool.py
@@ -13,6 +13,12 @@ import sys
  import gettext
  import re

+

+if sys.version_info < (3, 0):
+    gettext_method = 'ugettext'
+else:
+    gettext_method = 'gettext'
+
  # Path to t_options.h
  template_header_path = sys.argv[1]

@@ -60,7 +66,7 @@ def expandCString (s):

      octa = False
      num = 0
      digits = 0
-    r = ''
+    r = u''
      while i < len(s):
          if not escape:
              if s[i] == '\\':
@@ -128,16 +134,29 @@ def expandMatches (matches, translations, end=None):
          if len(matches) == 1 and i < len(translations) and \
                 not matches[0].expand (r'\7').endswith('\\'):
              suffix = ' \\'
-        # Expand the description line. Need to use ugettext in order to allow
-        # non-ascii unicode chars in the original English descriptions.
-        text = escapeCString (trans.ugettext (unicode (expandCString (
-            matches[0].expand (r'\5')), "utf-8"))).encode("utf-8")
-        print(matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + 
suffix)
+        text = escapeCString (getattr(trans, gettext_method) (expandCString (
+            matches[0].expand (r'\5'))))
+        text = (matches[0].expand (r'\1' + lang + r'\3"' + text + r'"\7') + 
suffix)
+
+        # In Python 2, stdout expects encoded byte strings, or else it will
+        # encode them with the ascii 'codec'
+        if sys.version_info.major == 2:
+            text = text.encode('utf-8')
+
+        print(text)
+
          # Expand any subsequent enum lines
          for match in matches[1:]:
-            text = escapeCString (trans.ugettext (unicode (expandCString (
-                match.expand (r'\3')), "utf-8"))).encode("utf-8")
-            print(match.expand (r'\1"' + text + r'"\5'))
+            text = escapeCString (getattr(trans, gettext_method) 
(expandCString (
+                match.expand (r'\3'))))
+            text = match.expand (r'\1"' + text + r'"\5')
+
+            # In Python 2, stdout expects encoded byte strings, or else it will
+            # encode them with the ascii 'codec'
+            if sys.version_info.major == 2:
+                text = text.encode('utf-8')
+
+            print(text)

# Expand description end

          if end:
@@ -168,9 +187,11 @@ 
print("/***********************************************************************\

# Process the options template and generate options.h with all

  # translations.
-template = open (template_header_path, "r")
+template = open (template_header_path, "rb")
  descMatches = []
  for line in template:
+    line = line.decode('utf-8')
+
      if len(descMatches) > 0:
          matchENUM     = reENUM    .match (line)
          matchDESC_END = reDESC_END.match (line)


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] python: Rework bytes/unicode string handling

Reply via email to