On Wed, Mar 29, 2017 at 1:13 PM, Stack <st...@duboce.net> wrote: > Is the below evidence enough that pb3 in proto2 syntax mode does not drop > 'unknown' fields? (Maybe you want evidence that java tooling behaves the > same?)
I reproduced your example with the Java tooling, including changing some of the fields in the intermediate representation. As long as the syntax is "proto2", it seems to have compatible semantics. > To be clear, when we say proxy above, are we expecting that a pb message > deserialized by a process down-the-line that happens to have a crimped proto > definition that is absent a couple of fields somehow can re-serialize and at > the end of the line, all fields are present? Or are we talking pass-through > of the message without rewrite? The former; an intermediate handler decoding, [modifying,] and encoding the record without losing unknown fields. This looks fine. -C > Thanks, > St.Ack > > > # Using the protoc v3.0.2 tool > $ protoc --version > libprotoc 3.0.2 > > # I have a simple proto definition with two fields in it > $ more pb.proto > message Test { > optional string one = 1; > optional string two = 2; > } > > # This is a text-encoded instance of a 'Test' proto message: > $ more pb.txt > one: "one" > two: "two" > > # Now I encode the above as a pb binary > $ protoc --encode=Test pb.proto < pb.txt > pb.bin > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax > specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 > syntax.) > > # Here is a dump of the binary > $ od -xc pb.bin > 0000000 030a 6e6f 1265 7403 6f77 > \n 003 o n e 022 003 t w o > 0000012 > > # Here is a proto definition file that has a Test Message minus the 'two' > field. > $ more pb_drops_two.proto > message Test { > optional string one = 1; > } > > # Use it to decode the bin file: > $ protoc --decode=Test pb_drops_two.proto < pb.bin > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax > specified for the proto file: pb_drops_two.proto. Please use 'syntax = > "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted > to proto2 syntax.) > one: "one" > 2: "two" > > Note how the second field is preserved (absent a field name). It is not > dropped. > > If I change the syntax of pb_drops_two.proto to be proto3, the field IS > dropped. > > # Here proto file with proto3 syntax specified (had to drop the 'optional' > qualifier -- not allowed in proto3): > $ more pb_drops_two.proto > syntax = "proto3"; > message Test { > string one = 1; > } > > $ protoc --decode=Test pb_drops_two.proto < pb.bin > pb_drops_two.txt > $ more pb_drops_two.txt > one: "one" > > > I cannot reencode the text output using pb_drops_two.proto. It complains: > > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt > > pb_drops_two.bin > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax > specified for the proto file: pb_drops_two.proto. Please use 'syntax = > "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted > to proto2 syntax.) > input:2:1: Expected identifier, got: 2 > > Proto 2.5 does same: > > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto < > pb_drops_two.txt > pb_drops_two.bin > input:2:1: Expected identifier. > Failed to parse input. > > St.Ack > > > > > > > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote: >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com> >> wrote: >>> >>> > >>> > > If unknown fields are dropped, then applications proxying tokens and >>> > other >>> > >> data between servers will effectively corrupt those messages, unless >>> > >> we >>> > >> make everything opaque bytes, which- absent the convenient, >>> > >> prenominate >>> > >> semantics managing the conversion- obviate the compatibility >>> > >> machinery >>> > that >>> > >> is the whole point of PB. Google is removing the features that >>> > >> justified >>> > >> choosing PB over its alternatives. Since we can't require that our >>> > >> applications compile (or link) against our updated schema, this >>> > >> creates >>> > a >>> > >> problem that PB was supposed to solve. >>> > > >>> > > >>> > > This is scary, and it potentially affects services outside of the >>> > > Hadoop >>> > > codebase. This makes it difficult to assess the impact. >>> > >>> > Stack mentioned a compatibility mode that uses the proto2 semantics. >>> > If that carries unknown fields through intermediate handlers, then >>> > this objection goes away. -C >>> >>> >>> Did some more googling, found this: >>> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ >>> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds like >>> packing the fields into a byte type. No mention of a PB2 compatibility >>> mode. Also here: >>> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ >>> >>> Participants say that unknown fields were dropped for automatic JSON >>> encoding, since you can't losslessly convert to JSON without knowing the >>> type. >>> >>> Unfortunately, it sounds like these are intrinsic differences with PB3. >>> >> >> As I read it Andrew, the field-dropping happens when pb3 is running in >> proto3 'mode'. Let me try it... >> >> St.Ack >> >> >>> >>> Best, >>> Andrew >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org