On 2023/12/07 19:33, Daniel Sahlberg wrote:
> Den tors 7 dec. 2023 kl 09:51 skrev Ruediger Pluem <rpl...@apache.org>:
> 
>> I stumbled accross a Python 3 compatibility issue in
>> tools/hook-scripts/mailer/mailer.py.
>>
>> The call of self.cfg.which_groups in line 565 passes an empty string as
>> first parameter.
>> In which_groups this empty string is passed to to_str in line 1489.
>> In line 88 to_str does x.decode('utf-8')
>> In Python 2 str objects have a decode method at least in later Python 3
>> versions they have not. Hence I propose the following patch which fixes the
>> issue for me:
<snip>

> There was some work done on mailer.py by Greg Stein, started here:
> https://lists.apache.org/thread/v5bh1j8qz8kk0jv1tlctxqt1k454tz0h
> However I don't think that touched these parts of the code.

Yes, I also think the issue has existed before it.

> I think there is more to this than that simple fix, but I'm no expert in
> Python or in the Python bindings. There are two more calls to
> which_groups():
> 
> Line 491, called with the return from
> svn.repos.ChangeCollector(...).get_changes().items() (first argument in a
> tuple). I don't know which encoding this uses.

We decided that for all C APIs which returns char * values, their Python 3
wrapper functions return them as bytes object, and also we don't convert
those values into str within helper functions in svn modules. Thus their
path elements should be bytes object, and as they are internal paths,
their encoding is UTF-8, regardless of locale.

> Line 663, called with the return from sys.stdin.buffer.readlines(), already
> processed by to_str() once.
> 
> In particular the call on 663 should be double decoded unless I'm missing
> something.

Yes, there is inconsistency here.

> So depending on what get_changes() return, maybe we should remove the
> to_str() from which_groups() instead?

As Config.which_group() call from Commit.__init__() has worked correctly,
the pathes as matching pattern in which_group() are expected as str.
So if remove to_str() from which_groups(), path argment for
Config.which_group() should be a str object.

> Note that we are still somewhat supporting Python 2 so we need to make sure
> any changes does work on Python 2 as well.

Then, how about this patch? It at least mailer-t1.sh passed both
on python=python2.7 and on python=python3.9.

[[[
Fix inconsistency in path argment on Config.which_group()

Previously, we call Config.which_group() with path as bytes in
Commit.__init__() and with path as str in Lock.__init__() and
in PropChange.__init__(), but Config.which_group handled path
argment as bytes. To fix it we only use str as path argment on
Config.wich_group().

* tools/hook-scripts/mailer/mailer.py
  (Config.which_groups): Treat path as str.
  (Commit.__init__): convert bytes path into str before calling above.

Found by: Ruediger Pluem (rpluem {_AT_} apache.org)

Index: tools/hook-scripts/mailer/mailer.py
===================================================================
--- tools/hook-scripts/mailer/mailer.py (revision 1913728)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -488,7 +488,7 @@
     # collect the set of groups and the unique sets of params for the options
     self.groups = { }
     for path, change in self.changelist:
-      for (group, params) in self.cfg.which_groups(path, log):
+      for (group, params) in self.cfg.which_groups(to_str(path), log):
         # turn the params into a hashable object and stash it away
         param_list = sorted(params.items())
         # collect the set of paths belonging to this group
@@ -1486,9 +1486,9 @@
     "Return the path's associated groups."
     groups = []
     for group, pattern, exclude_pattern, repos_params, search_logmsg_re in 
self._group_re:
-      match = pattern.match(to_str(path))
+      match = pattern.match(path)
       if match:
-        if exclude_pattern and exclude_pattern.match(to_str(path)):
+        if exclude_pattern and exclude_pattern.match(path):
           continue
         params = repos_params.copy()
         params.update(match.groupdict())
]]]

Cheers,
-- 
Yasuhito FUTATSUKI <futat...@yf.bsdclub.org>

Reply via email to