On Sun, Mar 9, 2014 at 5:19 AM, liao liao <[email protected]> wrote:
> "new design/implementation in the next release" to significantly improve > python sdk's performance, when? > Probably in Q3. > the c++ binding has been experimental for years. > > > On Saturday, March 1, 2014 2:49:06 AM UTC+8, Feng Xiao wrote: > >> For patch #1, you can submit it to the issue tracker. We don't have time >> to look at it immediately but will get to it in the next release. >> >> For patch #2, we don't accept patches to the experimental C++ binding >> support any more because it will be replaced by a new design/implementation >> in the next release. Sorry for that but AFAIK the new design/implementation >> is significantly better so it's worthwhile to try if you are currently >> using the old one (which has hard-to-address memory ownership problems). >> >> On Wed, Feb 26, 2014 at 4:11 PM, <[email protected]> wrote: >> >>> Hey guys, >>> >>> >>> I wrote two different patches which optimize python proto performance. >>> Both patches are running in production at Dropbox. What is the best way >>> to upstream these changes? >>> >>> >>> Patrick >>> >>> ============================ >>> >>> Patch #1. Python message patch (https://www.dropbox.com/s/ >>> q0y44ypti0by779/protobuf-2.5.0.patch1): >>> >>> Changes: >>> - precompute various varint tables >>> - don't use proto's ByteSize function for serialization >>> - simplified some code (got rid of the listener) >>> - got rid of StringIO >>> >>> Internal benchmark: >>> - random repeated int32s - ~18% faster >>> - random repeated int64s - ~20% faster >>> - random repeated strings - 27% faster >>> - random repeated bytes - 27% faster >>> - repeated message with each with a single random string - ~20% faster >>> >>> NOTE: >>> - predefined_varints.py is generated by generate_predefined_varints.py >>> >>> ============================ >>> >>> Patch #2. C++ experimental binding patch (https://www.dropbox.com/s/ >>> 5nr0v76nfraaxif/protobuf-2.5.0.patch2): >>> >>> Changes: >>> - fixed memory ownership / dangling pointer (see NOTE #1 for known >>> issues) >>> 1. inc ref count parent message when accessing a field, >>> 2. a cleared field's is freed only when the parent is deleted >>> - fixed MakeDescriptor to correctly generating simple proto (see NOTE #2) >>> - fixed MergeFrom to not crash on check failure due to self merge >>> - fixed both repeated and non-repeated field clearing >>> - modified varint deserialization to always return PyLong (to match >>> existing python implementation) >>> - always mark message as mutate when extending a repeated field (even >>> when extending by an empty list) >>> - deleted/updated bad tests from the protobuf test suite >>> >>> Internal benchmark (relative to the first patch): >>> - 30x faster for repeated varints >>> - 8x faster for repeated string >>> - 6x faster for repeated bytes >>> - 26x speed up for repeated nested msg >>> >>> NOTE: >>> 1. In the current implementation, a new python object is created each >>> time a field is accessed. To make this 100% correct, we should return the >>> same python object whenever the same field is accessed; however, I don't >>> think the accounting overhead is worth it. Implications due to the >>> current implementation: >>> - repeatedly clearing / mutating the same message can cause memory >>> blow up >>> - There's a subtle bug with clearing / mutating default message >>> fields: >>> >>> This is correct. Holding a reference to a MUTATED field X, then >>> clearing the parent, then mutate X. e.g., >>> child = parent.optional_nested_msg >>> child.field = 123 # this mutates the field >>> parent.Clear() >>> child.field = 321 >>> assert not parent.HasField('child') # passes >>> >>> This is incorrect. Holding a reference to a UNMUTATED field X, then >>> clearing the parent, then mutate X. >>> child = parent.optional_nested_msg >>> parent.Clear() >>> child.field = 321 # this inadvertently causes parent to generate >>> a different empty msg for optional_nested_msg. >>> assert not parent.HasField('optional_nested_msg') # fail >>> >>> Luckily, these access patterns are extremely rare (at least at >>> dropbox). >>> >>> 2. I wrote a fully functional MakeDescriptor for c++ protos when I was >>> at google. Talk to the F1 team (specifically Bart Samwel / Chad Whipkey) >>> if you're interested in upstreaming that to the opensource community. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Protocol Buffers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> >>> Visit this group at http://groups.google.com/group/protobuf. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/protobuf. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.
