Forwarding to common-dev, hdfs-dev, mapreduce-dev too. Thanks - Tsuyoshi
2017年3月27日(月) 21:16 Tsuyoshi Ozawa <oz...@apache.org>: > Dear Hadoop developers, > > After shaded client, introduced by HADOOP-11804, is merged, > we can more easily update some dependency with minimizing the impact > of backward compatibility on trunk. (Thanks Sean and Sanjin for taking > the issue!) > > Then, is it time to update protobuf's version to the latest one on > trunk? Could you share your opinion here? > > There has been plural discussions in parallel so far. Hence, I would > like to share current opinions by developers with my understanding > here. > > Stack mentioned on HADOOP-13363: > * Would this be a problem? Old clients can talk to the new servers > because of wire compatible. Is anyone consuming hadoop protos directly > other than hadoop? Are hadoop proto files considered > InterfaceAudience.Private or InterfaceAudience.Public? If the former, > I could work on a patch for 3.0.0 (It'd be big but boring). Does > Hadoop have Protobuf in its API anywhere (I can take a look but being > lazy asking here first). > > gohadoop[1] uses proto files directly, treating the proto files as a > stable interface. > [1] https://github.com/hortonworks/gohadoop/search? > utf8=%E2%9C%93&q=*proto&type= > > Fortunately, in fact, no additional work is needed to compile hadoop > code base. Only one work I did is to change getOndiskTrunkSize's > argument to take protobuf v3's object[2]. Please point me if I'm > something missing. > > [2] https://issues.apache.org/jira/secure/attachment/ > 12860647/HADOOP-13363.004.patch > > There are some concerns against updating protobuf on HDFS-11010: > * I'm really hesitant to bump PB considering the pain it brought last > time. (by Andrew) > > This is because there are no *binary* compatibility, not wire > compatibility. If I understand correctly, at the last time, the > problem is caused by mixing v2.4.0 and v.2.5.0 class are mixed between > Hadoop and HBase. (I knew this fact on Steve's comment on > HADOOP-13363[3]) > As I firstly mentioned, the protobuf is shaded now on trunk. We don't > need to care binary(source code level) compatibility. > > [3] https://issues.apache.org/jira/browse/HADOOP-13363? > focusedCommentId=15372724&page=com.atlassian.jira. > plugin.system.issuetabpanels:comment-tabpanel#comment-15372724 > > * Have we checked if it's wire compatible with our current version of > PB? (by Andrew) > > As far as I know, it's wire compatible between protobuf v2 and protobuf v3. > Google team has been testing it. Of course we can validate it by using > a following script. > > https://chromium.googlesource.com/external/github.com/ > google/protobuf/+/master/java/compatibility_tests/README.md > > * Let me ask the question in a different way, what about PB 3 is > concerning to you ?(by Anu) > > * Some of its incompatibilities with 2.x, such as dropping unknown > fields from records. Any component that proxies records must have an > updated version of the schema, or it will silently drop data and > convert unknown values to defaults. Unknown enum value handling has > changed. There's no mention of the convenient "Embedded messages are > compatible with bytes if the bytes contain an encoded version of the > message" semantics in proto3. (by Chris) > > This is what we need to discuss. > Quoting a document from google's developer's manual, > https://developers.google.com/protocol-buffers/docs/proto3#unknowns > > > For most Google protocol buffers implementations, unknown fields are not > accessible in proto3 via the corresponding proto runtimes, and are dropped > and forgotten at deserialization time. This is different behaviour to > proto2, where unknown fields are always preserved and serialized along with > the message. > > Is this incompatibility acceptable, or not acceptable for us? If we > need to check some test cases before updating protobuf, it's nice to > clarify the test cases we need to check here and test it now. > > Best regards, > - Tsuyoshi >