Forwarding to common-dev, hdfs-dev, mapreduce-dev too.

Thanks
- Tsuyoshi

2017年3月27日(月) 21:16 Tsuyoshi Ozawa <oz...@apache.org>:

> Dear Hadoop developers,
>
> After shaded client, introduced by HADOOP-11804, is merged,
> we can more easily update some dependency with minimizing the impact
> of backward compatibility on trunk. (Thanks Sean and Sanjin for taking
> the issue!)
>
> Then, is it time to update protobuf's version to the latest one on
> trunk? Could you share your opinion here?
>
> There has been plural discussions in parallel so far. Hence, I would
> like to share current opinions by developers with my understanding
> here.
>
> Stack mentioned on HADOOP-13363:
> * Would this be a problem? Old clients can talk to the new servers
> because of wire compatible. Is anyone consuming hadoop protos directly
> other than hadoop? Are hadoop proto files considered
> InterfaceAudience.Private or InterfaceAudience.Public? If the former,
> I could work on a patch for 3.0.0 (It'd be big but boring). Does
> Hadoop have Protobuf in its API anywhere (I can take a look but being
> lazy asking here first).
>
> gohadoop[1] uses proto files directly, treating the proto files as a
> stable interface.
> [1] https://github.com/hortonworks/gohadoop/search?
> utf8=%E2%9C%93&q=*proto&type=
>
> Fortunately, in fact, no additional work is needed to compile hadoop
> code base. Only one work I did is to change getOndiskTrunkSize's
> argument to take protobuf v3's object[2]. Please point me if I'm
> something missing.
>
> [2] https://issues.apache.org/jira/secure/attachment/
> 12860647/HADOOP-13363.004.patch
>
> There are some concerns against updating protobuf on HDFS-11010:
> * I'm really hesitant to bump PB considering the pain it brought last
> time. (by Andrew)
>
> This is because there are no *binary* compatibility, not wire
> compatibility. If I understand correctly, at the last time, the
> problem is caused by mixing v2.4.0 and v.2.5.0 class are mixed between
> Hadoop and HBase. (I knew this fact on Steve's comment on
> HADOOP-13363[3])
> As I firstly mentioned, the protobuf is shaded now on trunk. We don't
> need to care binary(source code level) compatibility.
>
> [3] https://issues.apache.org/jira/browse/HADOOP-13363?
> focusedCommentId=15372724&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15372724
>
> * Have we checked if it's wire compatible with our current version of
> PB? (by Andrew)
>
> As far as I know, it's wire compatible between protobuf v2 and protobuf v3.
> Google team has been testing it. Of course we can validate it by using
> a following script.
>
> https://chromium.googlesource.com/external/github.com/
> google/protobuf/+/master/java/compatibility_tests/README.md
>
> * Let me ask the question in a different way, what about PB 3 is
> concerning to you ?(by Anu)
>
> * Some of its incompatibilities with 2.x, such as dropping unknown
> fields from records. Any component that proxies records must have an
> updated version of the schema, or it will silently drop data and
> convert unknown values to defaults. Unknown enum value handling has
> changed. There's no mention of the convenient "Embedded messages are
> compatible with bytes if the bytes contain an encoded version of the
> message" semantics in proto3. (by Chris)
>
> This is what we need to discuss.
> Quoting a document from google's developer's manual,
> https://developers.google.com/protocol-buffers/docs/proto3#unknowns
>
> > For most Google protocol buffers implementations, unknown fields are not
> accessible in proto3 via the corresponding proto runtimes, and are dropped
> and forgotten at deserialization time. This is different behaviour to
> proto2, where unknown fields are always preserved and serialized along with
> the message.
>
> Is this incompatibility acceptable, or not acceptable for us? If we
> need to check some test cases before updating protobuf, it's nice to
> clarify the test cases we need to check here and test it now.
>
> Best regards,
> - Tsuyoshi
>

Reply via email to