On Sun, Jan 13, 2013 at 12:41:30AM +0000, John Keeping wrote:
> On Sat, Jan 12, 2013 at 06:43:04PM -0500, Pete Wyckoff wrote:
>> Can you give me some hints about the byte/unicode string issues
>> in git-p4.py? There's really only one place that does:
>>
>> p4 = subprocess.Popen("p4 -G ...")
>> marshal.load(p4.stdout)
>>
>> If that's the only issue, this might not be too paniful.
>
> The problem is that what gets loaded there is a dictionary (encoded by
> p4) that maps byte strings to byte strings, so all of the accesses to
> that dictionary need to either:
>
> 1) explicitly call encode() on a string constant
> or 2) use a byte string constant with a "b" prefix
>
> Or we could re-write the dictionary once, which handles the keys... but
> some of the values are also used as strings and we can't handle that as
> a one-off conversion since in other places we really do want the byte
> string (think content of binary files).
>
> Basically a thorough audit of all access to variables that come from p4
> would be needed, with explicit decode()s for authors, dates, etc.
Having thought about this a bit more, another possibility would be to
apply this transformation once using something like this (completely
untested, I haven't looked up the keys of interest):
-- >8 --
def _noop(s):
return s
def _decode(s):
return s.decode('utf-8')
CONVERSION_MAP = {
'user': _decode,
'data': _decode
}
d = marshal.load(p4.stdout)
retval = {}
for k, v in d.items():
key = k.decode('utf-8')
retval[key] = CONVERSION_MAP.get(key, _noop)(v)
return retval
-- 8< --
Obviously this isn't ideal but without p4 gaining a Python 3 output mode
I suspect this would be the best we could do.
John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html