Re: [protobuf] Spec v2 int-lit snafu?

Josh Humphries Tue, 13 Nov 2018 06:58:26 -0800

On Mon, Nov 12, 2018 at 10:30 PM Michael Powell <[email protected]>
wrote:


> On Mon, Nov 12, 2018 at 12:46 PM Michael Powell <[email protected]>
> wrote:
> >
> > On Mon, Nov 12, 2018 at 10:06 AM Michael Powell <[email protected]>
> wrote:
> > >
> > > Hello,
> > >
> > > Another question following up, how about the sign character for hex
> > > and oct integers? Is it necessary, should it be discarded?
> > >
> > > intLit     = decimalLit | octalLit | hexLit
> > > decimalLit = ( "1" … "9" ) { decimalDigit }
> > > octalLit   = "0" { octalDigit }
> > > hexLit = "0" ( "x" | "X" ) hexDigit { hexDigit }
> > >
> > > constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ]
> > > floatLit ) | strLit | boolLit
> > >
> > >
> https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#integer_literals
> > >
> https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#constant
> > >
> > > For instance, I am fairly certain the sign character is encoded in a
> > > hex encoded integer. Not sure about octal, but I imagine that it is
> > > fairly consistent.
>
> Got it sorted out I believe. Actually, it's quite nice the parser
> support Spirit provides, aligns pretty much perfectly with the grammar
> specification. There's a bit of gymnastics involved juggling whether
> the AST has a sign or not and so forth, but other than that, it flows
> well enough.
>

If you haven't already, take a look at descriptor.proto
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto>
-- FileDescriptorProto
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L61>
therein is basically like an AST for the proto language (and is what protoc
produces as it parses). And for parsing options and the literal values in
particular, take a look at UninterpretedOption
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L701>.
Options are first parsed into this structure, and then "interpreted" into
the attributes of *Options messages in a second pass. You'll see that the
approach there includes the negation in the literal integer value but
also distinguishes
between the two
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L716>
in the AST.


>
> > > Case in point, the value 107026150751750362 gets encoded as
> > > 0X17C3BB7913C48DA (upper-case). Whereas it's negative counterpart,
> > > -107026150751750362, really does get encoded as 0xFE83C4486EC3B726.
> > > Signage included, if memory serves.
> > >
> > > In these cases, I think the sign bit falls in the "optional" category?
> >
> > So... As far as I can determine, there are a couple of ways to
> > interpret this, semantically speaking. But this potentially informs
> > whatever parsing stack you are using as well.
> >
> > I'm using Boost Spirit Qi, for instance, which supports radix-based
> > integer parsing well enough, but has its own set of issues when
> > dealing with signage. That being said...
> >
> > 1. Treat the value itself as positive one way or another, with an
> > optional sign attribute (i.e. '+' or '-'). This would potentially
> > work, especially when there is base 16 (hex) or base 8 (octal)
> > involved.
> >
> > 2. Otherwise, open to suggestions, but for Qi constraints; that I know
> > of, fails to parse negative signed hexadecimal/octal encoded values.
> >
> > Again, kind of a symptom of an imprecise grammar specification. I can
> > get a sense for how to handle it, but does it truly capture "intent".
> >
> > Thanks in advance for any light that can be shed.
> >
> > > Cheers, thanks,
> > >
> > > Michael
> > > On Sun, Nov 11, 2018 at 10:56 AM Josh Humphries <[email protected]>
> wrote:
> > > >
> > > > For the case of zero by itself, per the spec, it will be parsed as
> an octal literal with value zero -- so functionally equivalent to a decimal
> literal with value zero. And for values with multiple digits, a leading
> zero means it is an octal literal. Decimal values will not have a leading
> zero.
> > > >
> > > > ----
> > > > Josh Humphries
> > > > [email protected]
> > > >
> > > >
> > > > On Sat, Nov 10, 2018 at 10:16 PM Michael Powell <
> [email protected]> wrote:
> > > >>
> > > >> Hello,
> > > >>
> > > >> I think 0 can be a decimal-lit, don't you think? However, the spec
> > > >> reads as follows:
> > > >>
> > > >> intLit     = decimalLit | octalLit | hexLit
> > > >> decimalLit = ( "1" … "9" ) { decimalDigit }
> > > >> octalLit   = "0" { octalDigit }
> > > >> hexLit     = "0" ( "x" | "X" ) hexDigit { hexDigit }
> > > >>
> > > >> Is there a reason, semantically speaking, why decimal must be
> greater
> > > >> than 0? And that's not including a plus/minus sign when you factor
> in
> > > >> constants.
> > > >>
> > > >> Of course, parsing, order matters, similar as with the escape
> > > >> character phrases in the string-literal:
> > > >>
> > > >> hex-lit | oct-lit | dec-lit
> > > >>
> > > >> And so on, since you have to rule out 0x\d+ for hex, followed by
> 0\d* ...
> > > >>
> > > >> Actually, now that I look at it "0" (really, "decimal" 0) is lurking
> > > >> in the oct-lit phrase.
> > > >>
> > > >> Kind of a grammatical nit-pick, I know, but I just wanted to be
> clear
> > > >> here. Seems like a possible source of confusion if you aren't paying
> > > >> careful attention.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Michael Powell
> > > >>
> > > >> --
> > > >> You received this message because you are subscribed to the Google
> Groups "Protocol Buffers" group.
> > > >> To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected].
> > > >> To post to this group, send email to [email protected].
> > > >> Visit this group at https://groups.google.com/group/protobuf.
> > > >> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] Spec v2 int-lit snafu?

Reply via email to