I've been investigating this a bit. I'm hoping Chris can ring in, since
he's identified wire compatibility issues. Replying inline to Chris' comment
<https://issues.apache.org/jira/browse/HDFS-11010?focusedCommentId=15576383&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15576383>
on HDFS-11010:

There's no mention of the convenient "Embedded messages are compatible with
> bytes if the bytes contain an encoded version of the message" semantics in
> proto3.


I checked the proto3 guide, and I think this is supported:
https://developers.google.com/protocol-buffers/docs/proto3#updating

If unknown fields are dropped, then applications proxying tokens and other
> data between servers will effectively corrupt those messages, unless we
> make everything opaque bytes, which- absent the convenient, prenominate
> semantics managing the conversion- obviate the compatibility machinery that
> is the whole point of PB. Google is removing the features that justified
> choosing PB over its alternatives. Since we can't require that our
> applications compile (or link) against our updated schema, this creates a
> problem that PB was supposed to solve.


This is scary, and it potentially affects services outside of the Hadoop
codebase. This makes it difficult to assess the impact.

Paraphrasing, the issues with PB2.5 are:

   1. poor support for non-x86, non-Linux platforms
   2. not as available, so harder to setup a dev environment
   3. missing zero-copy support, which helped performance in HBase

#1 and #2 can be addressed if we rehosted PB (with cross-OS compilation
patches) elsewhere.
#3 I don't think we benefit from, since we don't pass around big PB byte
arrays (at least in HDFS).

So the way I see it, upgrading to PB3 has risk from the behavior change wrt
unknown fields, while there are other ways of addressing the stated issues
with PB2.5.

Best,
Andrew

Reply via email to