[issue17618] base85 encoding

2014-03-08 Thread Roundup Robot
Roundup Robot added the comment: New changeset 1853679c6f71 by R David Murray in branch 'default': whatsnew: base65 encodings. (#17618) http://hg.python.org/cpython/rev/1853679c6f71 -- ___ Python tracker __

[issue17618] base85 encoding

2013-11-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Now committed, thanks for the reviews and the code! -- resolution: -> fixed stage: needs patch -> committed/rejected status: open -> closed ___ Python tracker

[issue17618] base85 encoding

2013-11-17 Thread Roundup Robot
Roundup Robot added the comment: New changeset 42366e293b7b by Antoine Pitrou in branch 'default': Issue #17618: Add Base85 and Ascii85 encoding/decoding to the base64 module. http://hg.python.org/cpython/rev/42366e293b7b -- nosy: +python-dev ___ Pyth

[issue17618] base85 encoding

2013-11-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Yet one nitpick and the patch LGTM. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Un

[issue17618] base85 encoding

2013-11-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Updated patch after Serhiy's comments. -- Added file: http://bugs.python.org/file32672/base85-3.patch ___ Python tracker ___ ___

[issue17618] base85 encoding

2013-11-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Grr. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue17618] base85 encoding

2013-11-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I added more comments on Rietveld. Did you forget to publish them? -- ___ Python tracker ___ ___

[issue17618] base85 encoding

2013-11-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I added more comments on Rietveld. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Uns

[issue17618] base85 encoding

2013-11-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Updated patch incorporating Serhiy's self-review from 6 months ago (grr). -- Added file: http://bugs.python.org/file32661/base85-2.patch ___ Python tracker

[issue17618] base85 encoding

2013-11-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Updated patch with suggested API changes, + docs. -- Added file: http://bugs.python.org/file32659/base85.patch ___ Python tracker ___ __

[issue17618] base85 encoding

2013-10-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: Well, I think the following comments (Serhiy's) should be implemented: """As for interface, I think 'adobe' flag should be false by default. It makes encoder simpler. ascii85 encoder in Go's standard library doesn't wrap nor add Adobe's brackets. btoa/atob fun

[issue17618] base85 encoding

2013-10-06 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- assignee: -> pitrou ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://m

[issue17618] base85 encoding

2013-10-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I'm not very interesting in working on this (but analyzing and optimizing made fun to me). You Antoine as originator definitely are interested. So make decision about interface which you need and finish the work using proposed patches as a basis. I would mad

[issue17618] base85 encoding

2013-10-05 Thread Jason Stokes
Jason Stokes added the comment: What issues are there with the implementation as it stands? I am happy to contribute (as I need to code a base36 implementation myself, and it's basically the same work) but it looks like the existing implementation is fine, except possibly some people don't lik

[issue17618] base85 encoding

2013-09-24 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue17618] base85 encoding

2013-08-23 Thread Antoine Pitrou
Antoine Pitrou added the comment: Serhiy, Martin, is one of you still working on this? -- ___ Python tracker ___ ___ Python-bugs-list

[issue17618] base85 encoding

2013-04-23 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The problem with autodetecting is that it makes it impossible for an > application to use this library to verify that something is encoded in > a specific way. Explicit is better than implicit. Agreed. Also, you generally known what format your data is in. O

[issue17618] base85 encoding

2013-04-21 Thread Martin Morrison
Martin Morrison added the comment: On 21 Apr 2013, at 17:38, Serhiy Storchaka wrote: > Serhiy Storchaka added the comment: > > As for interface, I think 'adobe' flag should be false by default. It makes > encoder simpler. ascii85 encoder in Go's standard library doesn't wrap nor > add Adobe's

[issue17618] base85 encoding

2013-04-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: As for interface, I think 'adobe' flag should be false by default. It makes encoder simpler. ascii85 encoder in Go's standard library doesn't wrap nor add Adobe's brackets. btoa/atob functions looks redundant as we can just use a85encode/a85decoder with appr

[issue17618] base85 encoding

2013-04-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: There are some bugs in ascii85 end base85 implementations (see in Rietveld for details). Besides, ascii85 implementation was too slow. I've prepared a patch that corrects errors and speeds up encoding and decoding. Microbenchmarks: ./python -m timeit -r 1 -

[issue17618] base85 encoding

2013-04-19 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file29942/issue17618-5.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue17618] base85 encoding

2013-04-18 Thread Martin Morrison
Martin Morrison added the comment: Attached a minor tweak over the last diff - I'd forgotten to fix the Struct handling inside the Mercurial implementation as well. All other comments still apply to this diff. -- Added file: http://bugs.python.org/file29930/issue17618-5.diff _

[issue17618] base85 encoding

2013-04-18 Thread Martin Morrison
Martin Morrison added the comment: Raised http://bz.selenic.com/show_bug.cgi?id=3894 against Mercurial for them to workaround issue14596. -- ___ Python tracker ___ _

[issue17618] base85 encoding

2013-04-17 Thread Martin Morrison
Martin Morrison added the comment: New diff. Changes from the last one: - change in struct handling to avoid issue14596 - Addition of btoa85 and atob85 functions that do legacy 'btoa' encoding/decoding. These are just wrappers around a85(en|de)code, which now have additional keyword args to c

[issue17618] base85 encoding

2013-04-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Serhiy, Martin, perhaps one of you could report the potential memory leak on the Mercurial bug tracker: http://bz.selenic.com/ -- ___ Python tracker __

[issue17618] base85 encoding

2013-04-17 Thread Martin Morrison
Martin Morrison added the comment: >> Can you elaborate on this? What leakage is there? I assume this is some > implementation quirk of the struct module that I'm not aware of. > > issue14596. Thanks for the pointer. I will rework the patch for the encoder/decoders to use an explicit Struct so

[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Can you elaborate on this? What leakage is there? I assume this is some implementation quirk of the struct module that I'm not aware of. issue14596. -- ___ Python tracker ___

[issue17618] base85 encoding

2013-04-17 Thread Martin Morrison
Martin Morrison added the comment: > Using a trick with struct.unpack() has very unpleasant side effect. > It might be a few speed up encoding, but creates the Struct object > with the size is many times larger than the size of the processed > data. Worse, this object is cached and continues to c

[issue17618] base85 encoding

2013-04-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le mercredi 17 avril 2013 à 18:14 +, Serhiy Storchaka a écrit : > I think we can provide a universal solution compatible (with some > pre/postprocessing) with both variants. Enclose encoded data in <~ and > ~> or not, and at which column wrap an encoded data.

[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > btoa/atob seems extinct. At least half of ascii85 encoders in wild implement this variant. I think we can provide a universal solution compatible (with some pre/postprocessing) with both variants. Enclose encoded data in <~ and ~> or not, and at which col

[issue17618] base85 encoding

2013-04-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: > After searching a lot of other implementations of this encoding I > conclude that there are at least three different variants. Yes. The current proposal is to include both the Adobe version ("ascii85") and the Mercurial/Git version ("base85"). btoa/atob seems

[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: After searching a lot of other implementations of this encoding I conclude that there are at least three different variants. 1. The original btoa/atob encoding. 4 zeros are packaged as 'z', last incomplete 4 bytes are padded by zeros, an output is wrapped in

[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: After a more careful look of the b85encode code I say that it's implementation is not optimal. For the sake of simplicity the entire volume of data is copied several times. This can affect the processing of a large volume of data. On other hand, this dumb co

[issue17618] base85 encoding

2013-04-14 Thread Antoine Pitrou
Antoine Pitrou added the comment: Hi and thanks for the patch! > I named the Mercurial base85 implementation functions with the "b85" > prefix. For the Ascii85 ones, I used "a85". I considered overloading > the same functions with a keyword argument to select which encoding, > but rejected that.

[issue17618] base85 encoding

2013-04-14 Thread Martin Morrison
Martin Morrison added the comment: I've updated the Ascii85 algorithms to remove the quadratic complexity, and use a single struct.pack/unpack. They should now be much quicker for large input strings. It's difficult to factor out commonality with b85* because the encodings and rules differ. T

[issue17618] base85 encoding

2013-04-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I want to see both algorithms to be similar so far as it is possible. It might be worth extract and reuse a common code. Mercurial's code looks far more optimal (for example a85encode has a quadratic complexity in result accumulating). -- _

[issue17618] base85 encoding

2013-04-13 Thread Martin Morrison
Martin Morrison added the comment: Updated patch that includes both my original implementation of Ascii85, as well as the Mercurial implementation of base85. A few notes/questions: - I named the Mercurial base85 implementation functions with the "b85" prefix. For the Ascii85 ones, I used "a85"

[issue17618] base85 encoding

2013-04-13 Thread Martin Morrison
Martin Morrison added the comment: Ok, great. I'll update the patch to include both encoding schemes. -- ___ Python tracker ___ ___ Py

[issue17618] base85 encoding

2013-04-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: For the record, Mads and Brendan have submitted a contributor's agreement, so we can now take what we want from Mercurial's base85.py (which you can find at http://selenic.com/hg/file/4e1ae55e63ef/mercurial/pure/base85.py). --

[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: > So I'm not sure what you want to do. I would suggest a standard > Ascii85 encoder is definitely useful, and provides feature parity with > Ruby. If we want the standard library to be able to read/write > Mercurial/Git base64 encoded files, then I guess that can

[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison
Martin Morrison added the comment: Ok, I'm not even sure that Mercurial follows RFC1924! That RFC is specifically for encoding IPv6 addresses, and mandates that the calculations be performed on a 128bit integer. The Mercurial implementation seems to follow the Ascii85 policy of taking each 4

[issue17618] base85 encoding

2013-04-07 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- nosy: +serhiy.storchaka ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The Ascii85 version is what is used with PDF, and the default output > format for the equivalent Ruby library, so seems useful to have. So I > guess what might be desirable is to have both in the codebase? Yes, it could be useful to have both. -- ___

[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison
Martin Morrison added the comment: Ok, having now looked at mercurial's implementation... it looks like they implemented the RFC1924 version, whereas my implementation is the Ascii85 version (and I verified it against, amongst others: http://www.tools4noobs.com/online_tools/ascii85_encode/ ).

[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Forgot to mention, I included an optional keyword argument to support > the 'btoa' shortcut for sequences of space characters as described in > the Wikipedia article. However, I'm unsure if any other implementation > supports this, so might not be worth includi

[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison
Martin Morrison added the comment: (sorry for spam) Forgot to mention, I included an optional keyword argument to support the 'btoa' shortcut for sequences of space characters as described in the Wikipedia article. However, I'm unsure if any other implementation supports this, so might not be

[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison
Martin Morrison added the comment: I wrote an implementation from scratch (based on the wikipedia article; I've not looked at any existing implementations) in pure Python in the attached diff. It includes tests. Feel free to use it as the pure Python implementation if desired, though I won't

[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: The Mercurial authors have given their informal agreement for a relicensing. OTOH, they must still sign a contributor's agreement. The relicensing would allow us to use their pure Python implementation (in mercurial/pure/base85.py). OTOH, the C implementation

[issue17618] base85 encoding

2013-04-07 Thread R. David Murray
R. David Murray added the comment: Antoine is talking to Mercurial about relicensing, and I believe at this point it is just a matter of working out the mechanical details (that is, he has an agreement-in-principal from them). -- nosy: +r.david.murray _

[issue17618] base85 encoding

2013-04-07 Thread Sijin Joseph
Sijin Joseph added the comment: Is anyone working on this? I'd like to include this in a CPython sprint @MIT on 4/13. -- nosy: +sijinjoseph ___ Python tracker ___ __

[issue17618] base85 encoding

2013-04-02 Thread Jesús Cea Avión
Changes by Jesús Cea Avión : -- keywords: +easy ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue17618] base85 encoding

2013-04-02 Thread Jesús Cea Avión
Changes by Jesús Cea Avión : -- nosy: +jcea ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue17618] base85 encoding

2013-04-02 Thread Florent Xicluna
Changes by Florent Xicluna : -- nosy: +flox ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue17618] base85 encoding

2013-04-02 Thread Antoine Pitrou
New submission from Antoine Pitrou: Base85 encoding (see e.g. http://en.wikipedia.org/wiki/Ascii85 ) allows a tighter encoding than Base64: it has a 5/4 expansion ratio, rather than 4/3. It is used in Mercurial, git, and there's another variant that's used by Adobe in the PDF format. It would