Thanks! KIP reads even better for me now.
Just voted. +1 non-binding

Konstantine

On Wed, Aug 14, 2019 at 7:00 PM Almog Gavra <al...@confluent.io> wrote:

> Thanks for the review Konstantine!
>
> I think the terminology suggestion definitely makes things clearer - I will
> update the documentation based on your suggestion (e.g. s/Consumer/Sink
> Converter/g and s/Producer/Source Converter/g).
>
> Cheers,
> Almog
>
> On Wed, Aug 14, 2019 at 8:13 AM Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thanks Almog for preparing this KIP!
> > I think it will improve usability and troubleshooting with JSON data a
> lot.
> >
> > The finalized plan seems quite concrete now. I also liked that some
> > implementation specific implications (such as setting the ObjectMapper to
> > deserialize floating point as BigDecimal) are highlighted in the KIP.
> >
> > Still, as I was reading the KIP, the main obstacle I encountered was
> around
> > terminology. I couldn't get used to reading "producer" and "consumer" and
> > not thinking in terms of Kafka producers and consumers - which are not
> > relevant to what this KIP proposes. Thus, I'd suggest replacing
> > "Producer(s)" with "Source Converter(s)" and "Consumer(s)" with "Sink
> > Converter(s)" (even if "Converter used by Source Connector" or "Converter
> > used by Sink Connector" would be even more accurate - maybe this could be
> > an explanation in a footnote). Terminology around converters has been
> > tricky in the past and adding producers/consumers in the mix might add to
> > the confusion.
> >
> > Another example where I'd apply this different terminology would be to a
> > phrase such as the following:
> > "Because of this, users must take care to first ensure that all consumers
> > have upgraded to the new code before upgrading producers to make use of
> the
> > NUMERIC serialization format."
> > which I'd write
> > "Because of this, users must take care to first ensure that all sink
> > connectors have upgraded to the new converter code before upgrading
> source
> > connectors to make use of the NUMERIC serialization format in
> > JsonConverter."
> >
> > Let me know if you think this suggestion makes the KIP easier to follow.
> > Otherwise I think it's a solid proposal.
> >
> > I'm concluding with a couple of nits:
> >
> > - "Upgraded Producer with BASE64 serialization, Legacy Consumer: this
> > scenario is okay as the upgraded ~producer~ consumer will be able to read
> > binary as today" (again according to my suggestion above, it could be as
> > the upgraded source converter ...)
> >
> > - "consumers cannot consumer NUMERIC data. " -> "consumers cannot read
> > NUMERIC data"
> >
> > Best,
> > Konstantine
> >
> > On Fri, Aug 9, 2019 at 6:37 PM Almog Gavra <al...@confluent.io> wrote:
> >
> > > Good catches! Fixed :)
> > >
> > > On Thu, Aug 8, 2019 at 10:36 PM Arjun Satish <arjun.sat...@gmail.com>
> > > wrote:
> > >
> > > > Cool!
> > > >
> > > > Couple of nits:
> > > >
> > > > - In public interfaces, typo: *json.decimal.serialization.fromat*
> > > > - In public interfaces, you use the term "HEX" instead of "BASE64".
> > > >
> > > >
> > > >
> > > > On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <al...@confluent.io>
> wrote:
> > > >
> > > > > EDIT: everywhere I've been using "HEX" I meant to be using
> "BASE64".
> > I
> > > > will
> > > > > update the KIP to reflect this.
> > > > >
> > > > > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io>
> > wrote:
> > > > >
> > > > > > Thanks for the feedback Arjun! I'm happy changing the default
> > config
> > > to
> > > > > > HEX instead of BINARY, no strong feelings there.
> > > > > >
> > > > > > I'll also clarify the example in the KIP to be clearer:
> > > > > >
> > > > > > - serialize the decimal field "foo" with value "10.2345" with the
> > HEX
> > > > > > setting: {"foo": "D3J5"}
> > > > > > - serialize the decimal field "foo" with value "10.2345" with the
> > > > NUMERIC
> > > > > > setting: {"foo": 10.2345}
> > > > > >
> > > > > > With regards to the precision issue, that was my original concern
> > as
> > > > well
> > > > > > (and why I originally suggested a TEXT format). Many JSON
> > > deserializers
> > > > > > (e.g. Jackson with
> > > DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS),
> > > > > > however, have the ability to deserialize decimals correctly so I
> > will
> > > > > > configure that as the default for Connect's JsonDeserializer.
> It's
> > > > > probably
> > > > > > a good idea to call out that using other deserializers must be
> done
> > > > with
> > > > > > care - I will add that documentation to the serialization config.
> > > > > >
> > > > > > Note that there would not be an issue on the _serialization_ side
> > of
> > > > > > things as Jackson respects BigDecimal.
> > > > > >
> > > > > > Almog
> > > > > >
> > > > > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <
> > arjun.sat...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> hey Almog, nice work! couple of thoughts (hope I'm not late
> since
> > > you
> > > > > >> started the voting thread already):
> > > > > >>
> > > > > >> can you please add some examples to show the changes that you
> are
> > > > > >> proposing. makes me think that for a given decimal number, we
> will
> > > > have
> > > > > >> two
> > > > > >> encodings: “asHex” and “asNumber”.
> > > > > >>
> > > > > >> should we call the default config value “HEX” instead of
> “BINARY”?
> > > > > >>
> > > > > >> Should we call out the fact that JS systems might be susceptible
> > to
> > > > > double
> > > > > >> precision round offs with the new numeric format? here are some
> > > people
> > > > > >> discussing a similar problem
> > > > > >> https://github.com/EventStore/EventStore/issues/1541
> > > > > >>
> > > > > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io>
> > > > wrote:
> > > > > >>
> > > > > >> > Hello Everyone,
> > > > > >> >
> > > > > >> > Summarizing an in-person discussion with Randall (this is
> copied
> > > > from
> > > > > >> the
> > > > > >> > KIP):
> > > > > >> >
> > > > > >> > The original KIP suggested supporting an additional
> > > representation -
> > > > > >> base10
> > > > > >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues
> > > > because
> > > > > >> it
> > > > > >> > is impossible to disambiguate between TEXT and BINARY without
> an
> > > > > >> additional
> > > > > >> > config - furthermore, this makes the migration from one to the
> > > other
> > > > > >> nearly
> > > > > >> > impossible because it would require that all consumers stop
> > > > consuming
> > > > > >> and
> > > > > >> > producers stop producing and atomically updating the config on
> > all
> > > > of
> > > > > >> them
> > > > > >> > after deploying the new code, or waiting for the full
> retention
> > > > period
> > > > > >> to
> > > > > >> > pass - neither option is viable. The suggestion in the KIP is
> > > > strictly
> > > > > >> an
> > > > > >> > improvement over the existing behavior, even if it doesn't
> > support
> > > > all
> > > > > >> > combinations.
> > > > > >> >
> > > > > >> > It seems that since most real-world use cases actually use the
> > > > numeric
> > > > > >> > representation (not string) we can consider this an
> improvement.
> > > > With
> > > > > >> the
> > > > > >> > new suggestion, we don't need a deserialization configuration
> > > (only
> > > > a
> > > > > >> > serialization option) and the consumers will be able to always
> > > > > >> > automatically determine the serialization format.
> > > > > >> >
> > > > > >> > Based on this, I'll be opening up the simplified version of
> the
> > > KIP
> > > > > to a
> > > > > >> > vote.
> > > > > >> >
> > > > > >> > Almog
> > > > > >> >
> > > > > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <
> al...@confluent.io
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> > > I'm mostly happy with your current suggestion (two configs,
> > one
> > > > for
> > > > > >> > > serialization and one for deserialization) and your
> > > implementation
> > > > > >> > > suggestion. One thing to note:
> > > > > >> > >
> > > > > >> > > > We should _always_ be able to deserialize a standard JSON
> > > > > >> > > > number as a decimal
> > > > > >> > >
> > > > > >> > > I was doing some research into decimals and JSON, and I can
> > > > imagine
> > > > > a
> > > > > >> > > compelling reason to require string representations to avoid
> > > > losing
> > > > > >> > > precision and to be certain that whomever is sending the
> data
> > > > isn't
> > > > > >> > losing
> > > > > >> > > precision (e.g.
> https://stackoverflow.com/a/38357877/2258040
> > ).
> > > > > >> > >
> > > > > >> > > I'm okay with always allowing numerics, but thought it's
> worth
> > > > > raising
> > > > > >> > the
> > > > > >> > > thought.
> > > > > >> > >
> > > > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <
> > a...@confluent.io>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > >> The way I see it, we need to control two seperate things:
> > > > > >> > >>
> > > > > >> > >> 1. How do we _deserialize_ a decimal type if we encounter a
> > > text
> > > > > >> node in
> > > > > >> > >> the JSON?    (We should _always_ be able to deserialize a
> > > > standard
> > > > > >> JSON
> > > > > >> > >> number as a decimal).
> > > > > >> > >> 2. How do we chose how we want decimals to be _serialized_.
> > > > > >> > >>
> > > > > >> > >> This looks to fits well with your second suggestion of
> > slightly
> > > > > >> > different
> > > > > >> > >> configs names for serialization vs deserialization.
> > > > > >> > >> a, For deserialization we only care about how to handle
> text
> > > > > nodes: `
> > > > > >> > >> deserialization.decimal.*text*.format`, which should only
> > have
> > > > two
> > > > > >> valid
> > > > > >> > >> values BINARY | TEXT.
> > > > > >> > >> b. For serialization we need all three:
> > > > > >> `serialization.decimal.format`,
> > > > > >> > >> which should support all three options: BINARY | TEXT |
> > > NUMERIC.
> > > > > >> > >>
> > > > > >> > >> Implementation wise, I think these should be two separate
> > > enums,
> > > > > >> rather
> > > > > >> > >> than one shared enum and throwing an error if the
> > deserializer
> > > is
> > > > > >> set to
> > > > > >> > >> NUMERIC.  Mainly as this means the enums reflect the
> options
> > > > > >> available,
> > > > > >> > >> rather than this being hidden in config checking code.  But
> > > > that's
> > > > > a
> > > > > >> > minor
> > > > > >> > >> implementation detail.
> > > > > >> > >>
> > > > > >> > >> Personally, I'd be tempted to have the BINARY value named
> > > > something
> > > > > >> like
> > > > > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users
> to
> > > move
> > > > > >> away
> > > > > >> > >> from
> > > > > >> > >> it.
> > > > > >> > >>
> > > > > >> > >> It's a real shame that both of these settings require a
> > default
> > > > of
> > > > > >> > BINARY
> > > > > >> > >> for backwards compatibility, but I agree that discussions /
> > > plans
> > > > > >> around
> > > > > >> > >> switching the defaults should not block this KIP.
> > > > > >> > >>
> > > > > >> > >> Andy
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <
> > al...@confluent.io>
> > > > > >> wrote:
> > > > > >> > >>
> > > > > >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)!
> > > > > >> > >> >
> > > > > >> > >> > > Is the text decimal a base16 encoded number, or is it
> > > base16
> > > > > >> encoded
> > > > > >> > >> > binary
> > > > > >> > >> > > form of the number?
> > > > > >> > >> >
> > > > > >> > >> > The conversion happens as
> > > decimal.unscaledValue().toByteArray()
> > > > > and
> > > > > >> > then
> > > > > >> > >> > the byte array is converted to a hex string, so it's
> > > definitely
> > > > > the
> > > > > >> > >> binary
> > > > > >> > >> > form of the number converted to base16. Whether or not
> > that's
> > > > the
> > > > > >> same
> > > > > >> > >> as
> > > > > >> > >> > the base16 encoded number is a good question (toByteArray
> > > > > returns a
> > > > > >> > byte
> > > > > >> > >> > array containing a signed, big-endian, two's complement
> > > > > >> representation
> > > > > >> > >> of
> > > > > >> > >> > the big integer).
> > > > > >> > >> >
> > > > > >> > >> > > One suggestion I have is to change the proposed new
> > config
> > > to
> > > > > >> only
> > > > > >> > >> affect
> > > > > >> > >> > > decimals stored as text, i.e. to switch between the
> > current
> > > > > >> base16
> > > > > >> > and
> > > > > >> > >> > the
> > > > > >> > >> > > more common base10.   Then add another config to the
> > > > serializer
> > > > > >> only
> > > > > >> > >> that
> > > > > >> > >> > > controls if decimals should be serialized as text or
> > > numeric.
> > > > > >> > >> >
> > > > > >> > >> > I think we need to be able to handle all mappings from
> > > > > >> serialization
> > > > > >> > >> format
> > > > > >> > >> > to deserialization format (e.g. read in BINARY and output
> > > > TEXT),
> > > > > >> > which I
> > > > > >> > >> > think would be impossible with the alternative
> suggestion.
> > I
> > > > > agree
> > > > > >> > that
> > > > > >> > >> > automatically deserializing numerics is valuable. I see
> two
> > > > other
> > > > > >> ways
> > > > > >> > >> to
> > > > > >> > >> > get this, both keeping the serialization.format config
> the
> > > > same:
> > > > > >> > >> >
> > > > > >> > >> > - have json.decimal.deserialization.format accept all
> three
> > > > > >> formats.
> > > > > >> > if
> > > > > >> > >> set
> > > > > >> > >> > to BINARY/TEXT, numerics would be automatically
> supported.
> > If
> > > > set
> > > > > >> to
> > > > > >> > >> > NUMERIC, then any string coming in would result in
> > > > > deserialization
> > > > > >> > error
> > > > > >> > >> > (defaults to BINARY for backwards compatibility)
> > > > > >> > >> > - change json.decimal.deserialization.format to
> > > > > >> > >> > json.decimal.deserialization.string.format which accepts
> > only
> > > > > >> > >> BINARY/TEXT
> > > > > >> > >> > (defaults to BINARY for backwards compatibility)
> > > > > >> > >> >
> > > > > >> > >> > > would be a breaking change in that things that
> previously
> > > > > failed
> > > > > >> > would
> > > > > >> > >> > > suddenly start deserializing.  This is a price I'm
> > willing
> > > to
> > > > > >> pay.
> > > > > >> > >> >
> > > > > >> > >> > I agree. I'm willing to pay this price too.
> > > > > >> > >> >
> > > > > >> > >> > > IMHO, we should then plan to switch the default of
> > decimal
> > > > > >> > >> serialization
> > > > > >> > >> > to
> > > > > >> > >> > > numeric, and text serialization to base 10 in the next
> > > major
> > > > > >> > release.
> > > > > >> > >> >
> > > > > >> > >> > I think that can be a separate discussion, I don't want
> to
> > > > block
> > > > > >> this
> > > > > >> > >> KIP
> > > > > >> > >> > on it.
> > > > > >> > >> >
> > > > > >> > >> > Thoughts?
> > > > > >> > >> >
> > > > > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <
> > > > o...@wikimedia.org>
> > > > > >> > wrote:
> > > > > >> > >> >
> > > > > >> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I
> > use
> > > > > >> > >> JSONSchemas to
> > > > > >> > >> > > indicate whether a JSON number should be deserialized
> as
> > an
> > > > > >> integer
> > > > > >> > >> or a
> > > > > >> > >> > > decimal
> > > > > >> > >> > > <
> > > > > >> > >> > >
> > > > > >> > >> >
> > > > > >> > >>
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> > > > > >> > >> > > >.
> > > > > >> > >> > > Not everyone is going to have JSONSchemas available
> when
> > > > > >> converting,
> > > > > >> > >> but
> > > > > >> > >> > if
> > > > > >> > >> > > you do, it is an easy way to support JSON numbers as
> > > > decimals.
> > > > > >> > >> > >
> > > > > >> > >> > > Carry on! :)
> > > > > >> > >> > >
> > > > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <
> > > > a...@confluent.io
> > > > > >
> > > > > >> > >> wrote:
> > > > > >> > >> > >
> > > > > >> > >> > > > Hi Almog,
> > > > > >> > >> > > >
> > > > > >> > >> > > > Like the KIP - I think being able to support decimals
> > in
> > > > JSON
> > > > > >> in
> > > > > >> > the
> > > > > >> > >> > same
> > > > > >> > >> > > > way most other systems do is a great improvement.
> > > > > >> > >> > > >
> > > > > >> > >> > > > It's not 100% clear to me from the KIP what the
> current
> > > > > format
> > > > > >> is.
> > > > > >> > >> Is
> > > > > >> > >> > > the
> > > > > >> > >> > > > text decimal a base16 encoded number, or is it base16
> > > > encoded
> > > > > >> > binary
> > > > > >> > >> > form
> > > > > >> > >> > > > of the number? (I've not tried to get my head around
> if
> > > > these
> > > > > >> two
> > > > > >> > >> are
> > > > > >> > >> > > even
> > > > > >> > >> > > > different!)
> > > > > >> > >> > > >
> > > > > >> > >> > > > One suggestion I have is to change the proposed new
> > > config
> > > > to
> > > > > >> only
> > > > > >> > >> > affect
> > > > > >> > >> > > > decimals stored as text, i.e. to switch between the
> > > current
> > > > > >> base16
> > > > > >> > >> and
> > > > > >> > >> > > the
> > > > > >> > >> > > > more common base10.   Then add another config to the
> > > > > serialzier
> > > > > >> > only
> > > > > >> > >> > that
> > > > > >> > >> > > > controls if decimals should be serialized as text or
> > > > numeric.
> > > > > >> The
> > > > > >> > >> > > benefit
> > > > > >> > >> > > > of this approach is it allows us to enhance the
> > > > deserializer
> > > > > to
> > > > > >> > >> > > > automatically handle numeric decimals even without
> any
> > > > config
> > > > > >> > >> having to
> > > > > >> > >> > > be
> > > > > >> > >> > > > set, i.e. default config in the deserializer would be
> > > able
> > > > to
> > > > > >> > handle
> > > > > >> > >> > > > numeric decimals.  Of course, this is a two edged
> > sword:
> > > > this
> > > > > >> > would
> > > > > >> > >> > make
> > > > > >> > >> > > > the deserializer work out of the box with numeric
> > > decimals,
> > > > > >> > (yay!),
> > > > > >> > >> but
> > > > > >> > >> > > > would be a breaking change in that things that
> > previously
> > > > > >> failed
> > > > > >> > >> would
> > > > > >> > >> > > > suddenly start deserializing.  This is a price I'm
> > > willing
> > > > to
> > > > > >> pay.
> > > > > >> > >> > > >
> > > > > >> > >> > > > IMHO, we should then plan to switch the default of
> > > decimal
> > > > > >> > >> > serialization
> > > > > >> > >> > > to
> > > > > >> > >> > > > numeric, and text serialization to base 10 in the
> next
> > > > major
> > > > > >> > >> release.
> > > > > >> > >> > > > (With upgrade notes to match). Though I know this is
> > more
> > > > > >> > >> contentious,
> > > > > >> > >> > I
> > > > > >> > >> > > > think it moves us forward in a much more standard way
> > > that
> > > > > the
> > > > > >> > >> current
> > > > > >> > >> > > > encoding of decimals.
> > > > > >> > >> > > >
> > > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <
> > > > > al...@confluent.io>
> > > > > >> > >> wrote:
> > > > > >> > >> > > >
> > > > > >> > >> > > > > Hi Everyone!
> > > > > >> > >> > > > >
> > > > > >> > >> > > > > Kicking off discussion for a new KIP:
> > > > > >> > >> > > > >
> > > > > >> > >> > > > >
> > > > > >> > >> > > >
> > > > > >> > >> > >
> > > > > >> > >> >
> > > > > >> > >>
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > > > > >> > >> > > > >
> > > > > >> > >> > > > > For those who are interested, I have a prototype
> > > > > >> implementation
> > > > > >> > >> that
> > > > > >> > >> > > > helped
> > > > > >> > >> > > > > guide my design:
> > > https://github.com/agavra/kafka/pull/1
> > > > > >> > >> > > > >
> > > > > >> > >> > > > > Cheers,
> > > > > >> > >> > > > > Almog
> > > > > >> > >> > > > >
> > > > > >> > >> > > >
> > > > > >> > >> > >
> > > > > >> > >> >
> > > > > >> > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to