On Wed, Oct 28, 2015 at 11:30 PM, Marc Aymerich <glicer...@gmail.com> wrote: > Hi, > I'm writting an application that saves historical state in a log file. > I want to be really efficient in terms of used bytes. > > What I'm doing now is: > > 1) First use zlib.compress > 2) And then remove all new lines using binascii.b2a_base64, so I have > a log entry per line. > > but b2a_base64 is far from ideal: adds lots of bytes to the compressed > log entry. So, I wonder if perhaps there is a better way to remove new > lines from the zlib output? or maybe a different approach? > > Anyone?
[....] wow, lots of interesting replies here, allow me to clarify my situation and answear some of the questions. I'm writing a toy project for my master thesis, which is never going into production. What I'm doing is a decentralized file system for configuration managemente (without centralized authority). This means: 1) Each node on the cluster needs to keep track of *all* the changes that ever ocurred. So far, each node is storing each change as individual lines on a file (the "historical state log" I was referring to, the concept is very similar to the bitcoin blockchain) 2) The main communication channel is driven by a UDP gossip protocol. >From the performance perspective, it makes a huge difference if the whole log entry fits into the UDP payload (512B), otherwise the log entry has to be transferred by other means. Because config files are mostly text, almost every single one of them can fit into a UDP packet, if properly compressed. After reading your replies I'm concluding that 1) I should use the most space-efficient encoding *only* for transferring the log entry, just lzma compress it. 2) I should use the most readable one for storing the block on the log file. Leave metadata as text and compress+base64 the "actual file content" so it fits in an space-less ascii block, something like: # $ cat log # <parent_hash> <timestamp> <action> <path> <lzma+base56 content> <fingerprint> <signature> a5438566b83b4383899500c6b70dcac1 1446054664 WRITE /.keys TUY4Q0FRRUVHQHNkl6MTNtZz09Cg== 2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f iPDxBYuUEjlZl99/xGCNzpbuDezJJfolr+eNLNrXEYAgG/0yme3bu9DCkPO9Gq7+ cb4f67a712964699a5c2d49a42e48946 1446054664 WRITE /.cluster MTcyLjE3LjLjEK 2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f /VKMeVG95MT9VdObRyhidzxIgiTef+7nl3flgQpqVAgRfhqrBGRB4XTgJFSelvCo 5041fba6b6534dfe92bf99ed5ead8fa6 1446055543 MKDIR /etc 2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f +CMeVp33FxXFSfczbmkoW4tnalu5ojuC1WprMkc7Kxp/WHlMsx9Os3Zal0Bi/uD8 80c47cd5a73e4881b7284eed465ab10a 1446055843 WRITE /etc/node.conf aG9sYQo= 2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f oQVF7UCAFRCC7cC0Ln8V16f8mnON465sdXoIEIGCKBUOWOBE5daFmJTu0thAkXVf -- Marc -- https://mail.python.org/mailman/listinfo/python-list