Re: [protobuf] Spec v2 int-lit snafu?

Michael Powell Mon, 12 Nov 2018 19:30:02 -0800

On Mon, Nov 12, 2018 at 12:46 PM Michael Powell <[email protected]> wrote:
>
> On Mon, Nov 12, 2018 at 10:06 AM Michael Powell <[email protected]> wrote:
> >
> > Hello,
> >
> > Another question following up, how about the sign character for hex
> > and oct integers? Is it necessary, should it be discarded?
> >
> > intLit     = decimalLit | octalLit | hexLit
> > decimalLit = ( "1" … "9" ) { decimalDigit }
> > octalLit   = "0" { octalDigit }
> > hexLit = "0" ( "x" | "X" ) hexDigit { hexDigit }
> >
> > constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ]
> > floatLit ) | strLit | boolLit
> >
> > https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#integer_literals
> > https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#constant
> >
> > For instance, I am fairly certain the sign character is encoded in a
> > hex encoded integer. Not sure about octal, but I imagine that it is
> > fairly consistent.


Got it sorted out I believe. Actually, it's quite nice the parser
support Spirit provides, aligns pretty much perfectly with the grammar
specification. There's a bit of gymnastics involved juggling whether
the AST has a sign or not and so forth, but other than that, it flows
well enough.

> > Case in point, the value 107026150751750362 gets encoded as
> > 0X17C3BB7913C48DA (upper-case). Whereas it's negative counterpart,
> > -107026150751750362, really does get encoded as 0xFE83C4486EC3B726.
> > Signage included, if memory serves.
> >
> > In these cases, I think the sign bit falls in the "optional" category?
>
> So... As far as I can determine, there are a couple of ways to
> interpret this, semantically speaking. But this potentially informs
> whatever parsing stack you are using as well.
>
> I'm using Boost Spirit Qi, for instance, which supports radix-based
> integer parsing well enough, but has its own set of issues when
> dealing with signage. That being said...
>
> 1. Treat the value itself as positive one way or another, with an
> optional sign attribute (i.e. '+' or '-'). This would potentially
> work, especially when there is base 16 (hex) or base 8 (octal)
> involved.
>
> 2. Otherwise, open to suggestions, but for Qi constraints; that I know
> of, fails to parse negative signed hexadecimal/octal encoded values.
>
> Again, kind of a symptom of an imprecise grammar specification. I can
> get a sense for how to handle it, but does it truly capture "intent".
>
> Thanks in advance for any light that can be shed.
>
> > Cheers, thanks,
> >
> > Michael
> > On Sun, Nov 11, 2018 at 10:56 AM Josh Humphries <[email protected]> 
> > wrote:
> > >
> > > For the case of zero by itself, per the spec, it will be parsed as an 
> > > octal literal with value zero -- so functionally equivalent to a decimal 
> > > literal with value zero. And for values with multiple digits, a leading 
> > > zero means it is an octal literal. Decimal values will not have a leading 
> > > zero.
> > >
> > > ----
> > > Josh Humphries
> > > [email protected]
> > >
> > >
> > > On Sat, Nov 10, 2018 at 10:16 PM Michael Powell <[email protected]> 
> > > wrote:
> > >>
> > >> Hello,
> > >>
> > >> I think 0 can be a decimal-lit, don't you think? However, the spec
> > >> reads as follows:
> > >>
> > >> intLit     = decimalLit | octalLit | hexLit
> > >> decimalLit = ( "1" … "9" ) { decimalDigit }
> > >> octalLit   = "0" { octalDigit }
> > >> hexLit     = "0" ( "x" | "X" ) hexDigit { hexDigit }
> > >>
> > >> Is there a reason, semantically speaking, why decimal must be greater
> > >> than 0? And that's not including a plus/minus sign when you factor in
> > >> constants.
> > >>
> > >> Of course, parsing, order matters, similar as with the escape
> > >> character phrases in the string-literal:
> > >>
> > >> hex-lit | oct-lit | dec-lit
> > >>
> > >> And so on, since you have to rule out 0x\d+ for hex, followed by 0\d* ...
> > >>
> > >> Actually, now that I look at it "0" (really, "decimal" 0) is lurking
> > >> in the oct-lit phrase.
> > >>
> > >> Kind of a grammatical nit-pick, I know, but I just wanted to be clear
> > >> here. Seems like a possible source of confusion if you aren't paying
> > >> careful attention.
> > >>
> > >> Thoughts?
> > >>
> > >> Best regards,
> > >>
> > >> Michael Powell
> > >>
> > >> --
> > >> You received this message because you are subscribed to the Google 
> > >> Groups "Protocol Buffers" group.
> > >> To unsubscribe from this group and stop receiving emails from it, send 
> > >> an email to [email protected].
> > >> To post to this group, send email to [email protected].
> > >> Visit this group at https://groups.google.com/group/protobuf.
> > >> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] Spec v2 int-lit snafu?

Reply via email to