Cool! Couple of nits:
- In public interfaces, typo: *json.decimal.serialization.fromat* - In public interfaces, you use the term "HEX" instead of "BASE64". On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <al...@confluent.io> wrote: > EDIT: everywhere I've been using "HEX" I meant to be using "BASE64". I will > update the KIP to reflect this. > > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io> wrote: > > > Thanks for the feedback Arjun! I'm happy changing the default config to > > HEX instead of BINARY, no strong feelings there. > > > > I'll also clarify the example in the KIP to be clearer: > > > > - serialize the decimal field "foo" with value "10.2345" with the HEX > > setting: {"foo": "D3J5"} > > - serialize the decimal field "foo" with value "10.2345" with the NUMERIC > > setting: {"foo": 10.2345} > > > > With regards to the precision issue, that was my original concern as well > > (and why I originally suggested a TEXT format). Many JSON deserializers > > (e.g. Jackson with DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS), > > however, have the ability to deserialize decimals correctly so I will > > configure that as the default for Connect's JsonDeserializer. It's > probably > > a good idea to call out that using other deserializers must be done with > > care - I will add that documentation to the serialization config. > > > > Note that there would not be an issue on the _serialization_ side of > > things as Jackson respects BigDecimal. > > > > Almog > > > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <arjun.sat...@gmail.com> > > wrote: > > > >> hey Almog, nice work! couple of thoughts (hope I'm not late since you > >> started the voting thread already): > >> > >> can you please add some examples to show the changes that you are > >> proposing. makes me think that for a given decimal number, we will have > >> two > >> encodings: “asHex” and “asNumber”. > >> > >> should we call the default config value “HEX” instead of “BINARY”? > >> > >> Should we call out the fact that JS systems might be susceptible to > double > >> precision round offs with the new numeric format? here are some people > >> discussing a similar problem > >> https://github.com/EventStore/EventStore/issues/1541 > >> > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io> wrote: > >> > >> > Hello Everyone, > >> > > >> > Summarizing an in-person discussion with Randall (this is copied from > >> the > >> > KIP): > >> > > >> > The original KIP suggested supporting an additional representation - > >> base10 > >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues because > >> it > >> > is impossible to disambiguate between TEXT and BINARY without an > >> additional > >> > config - furthermore, this makes the migration from one to the other > >> nearly > >> > impossible because it would require that all consumers stop consuming > >> and > >> > producers stop producing and atomically updating the config on all of > >> them > >> > after deploying the new code, or waiting for the full retention period > >> to > >> > pass - neither option is viable. The suggestion in the KIP is strictly > >> an > >> > improvement over the existing behavior, even if it doesn't support all > >> > combinations. > >> > > >> > It seems that since most real-world use cases actually use the numeric > >> > representation (not string) we can consider this an improvement. With > >> the > >> > new suggestion, we don't need a deserialization configuration (only a > >> > serialization option) and the consumers will be able to always > >> > automatically determine the serialization format. > >> > > >> > Based on this, I'll be opening up the simplified version of the KIP > to a > >> > vote. > >> > > >> > Almog > >> > > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <al...@confluent.io> > wrote: > >> > > >> > > I'm mostly happy with your current suggestion (two configs, one for > >> > > serialization and one for deserialization) and your implementation > >> > > suggestion. One thing to note: > >> > > > >> > > > We should _always_ be able to deserialize a standard JSON > >> > > > number as a decimal > >> > > > >> > > I was doing some research into decimals and JSON, and I can imagine > a > >> > > compelling reason to require string representations to avoid losing > >> > > precision and to be certain that whomever is sending the data isn't > >> > losing > >> > > precision (e.g. https://stackoverflow.com/a/38357877/2258040). > >> > > > >> > > I'm okay with always allowing numerics, but thought it's worth > raising > >> > the > >> > > thought. > >> > > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io> > >> wrote: > >> > > > >> > >> The way I see it, we need to control two seperate things: > >> > >> > >> > >> 1. How do we _deserialize_ a decimal type if we encounter a text > >> node in > >> > >> the JSON? (We should _always_ be able to deserialize a standard > >> JSON > >> > >> number as a decimal). > >> > >> 2. How do we chose how we want decimals to be _serialized_. > >> > >> > >> > >> This looks to fits well with your second suggestion of slightly > >> > different > >> > >> configs names for serialization vs deserialization. > >> > >> a, For deserialization we only care about how to handle text > nodes: ` > >> > >> deserialization.decimal.*text*.format`, which should only have two > >> valid > >> > >> values BINARY | TEXT. > >> > >> b. For serialization we need all three: > >> `serialization.decimal.format`, > >> > >> which should support all three options: BINARY | TEXT | NUMERIC. > >> > >> > >> > >> Implementation wise, I think these should be two separate enums, > >> rather > >> > >> than one shared enum and throwing an error if the deserializer is > >> set to > >> > >> NUMERIC. Mainly as this means the enums reflect the options > >> available, > >> > >> rather than this being hidden in config checking code. But that's > a > >> > minor > >> > >> implementation detail. > >> > >> > >> > >> Personally, I'd be tempted to have the BINARY value named something > >> like > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move > >> away > >> > >> from > >> > >> it. > >> > >> > >> > >> It's a real shame that both of these settings require a default of > >> > BINARY > >> > >> for backwards compatibility, but I agree that discussions / plans > >> around > >> > >> switching the defaults should not block this KIP. > >> > >> > >> > >> Andy > >> > >> > >> > >> > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io> > >> wrote: > >> > >> > >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)! > >> > >> > > >> > >> > > Is the text decimal a base16 encoded number, or is it base16 > >> encoded > >> > >> > binary > >> > >> > > form of the number? > >> > >> > > >> > >> > The conversion happens as decimal.unscaledValue().toByteArray() > and > >> > then > >> > >> > the byte array is converted to a hex string, so it's definitely > the > >> > >> binary > >> > >> > form of the number converted to base16. Whether or not that's the > >> same > >> > >> as > >> > >> > the base16 encoded number is a good question (toByteArray > returns a > >> > byte > >> > >> > array containing a signed, big-endian, two's complement > >> representation > >> > >> of > >> > >> > the big integer). > >> > >> > > >> > >> > > One suggestion I have is to change the proposed new config to > >> only > >> > >> affect > >> > >> > > decimals stored as text, i.e. to switch between the current > >> base16 > >> > and > >> > >> > the > >> > >> > > more common base10. Then add another config to the serializer > >> only > >> > >> that > >> > >> > > controls if decimals should be serialized as text or numeric. > >> > >> > > >> > >> > I think we need to be able to handle all mappings from > >> serialization > >> > >> format > >> > >> > to deserialization format (e.g. read in BINARY and output TEXT), > >> > which I > >> > >> > think would be impossible with the alternative suggestion. I > agree > >> > that > >> > >> > automatically deserializing numerics is valuable. I see two other > >> ways > >> > >> to > >> > >> > get this, both keeping the serialization.format config the same: > >> > >> > > >> > >> > - have json.decimal.deserialization.format accept all three > >> formats. > >> > if > >> > >> set > >> > >> > to BINARY/TEXT, numerics would be automatically supported. If set > >> to > >> > >> > NUMERIC, then any string coming in would result in > deserialization > >> > error > >> > >> > (defaults to BINARY for backwards compatibility) > >> > >> > - change json.decimal.deserialization.format to > >> > >> > json.decimal.deserialization.string.format which accepts only > >> > >> BINARY/TEXT > >> > >> > (defaults to BINARY for backwards compatibility) > >> > >> > > >> > >> > > would be a breaking change in that things that previously > failed > >> > would > >> > >> > > suddenly start deserializing. This is a price I'm willing to > >> pay. > >> > >> > > >> > >> > I agree. I'm willing to pay this price too. > >> > >> > > >> > >> > > IMHO, we should then plan to switch the default of decimal > >> > >> serialization > >> > >> > to > >> > >> > > numeric, and text serialization to base 10 in the next major > >> > release. > >> > >> > > >> > >> > I think that can be a separate discussion, I don't want to block > >> this > >> > >> KIP > >> > >> > on it. > >> > >> > > >> > >> > Thoughts? > >> > >> > > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <o...@wikimedia.org> > >> > wrote: > >> > >> > > >> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I use > >> > >> JSONSchemas to > >> > >> > > indicate whether a JSON number should be deserialized as an > >> integer > >> > >> or a > >> > >> > > decimal > >> > >> > > < > >> > >> > > > >> > >> > > >> > >> > >> > > >> > https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261 > >> > >> > > >. > >> > >> > > Not everyone is going to have JSONSchemas available when > >> converting, > >> > >> but > >> > >> > if > >> > >> > > you do, it is an easy way to support JSON numbers as decimals. > >> > >> > > > >> > >> > > Carry on! :) > >> > >> > > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <a...@confluent.io > > > >> > >> wrote: > >> > >> > > > >> > >> > > > Hi Almog, > >> > >> > > > > >> > >> > > > Like the KIP - I think being able to support decimals in JSON > >> in > >> > the > >> > >> > same > >> > >> > > > way most other systems do is a great improvement. > >> > >> > > > > >> > >> > > > It's not 100% clear to me from the KIP what the current > format > >> is. > >> > >> Is > >> > >> > > the > >> > >> > > > text decimal a base16 encoded number, or is it base16 encoded > >> > binary > >> > >> > form > >> > >> > > > of the number? (I've not tried to get my head around if these > >> two > >> > >> are > >> > >> > > even > >> > >> > > > different!) > >> > >> > > > > >> > >> > > > One suggestion I have is to change the proposed new config to > >> only > >> > >> > affect > >> > >> > > > decimals stored as text, i.e. to switch between the current > >> base16 > >> > >> and > >> > >> > > the > >> > >> > > > more common base10. Then add another config to the > serialzier > >> > only > >> > >> > that > >> > >> > > > controls if decimals should be serialized as text or numeric. > >> The > >> > >> > > benefit > >> > >> > > > of this approach is it allows us to enhance the deserializer > to > >> > >> > > > automatically handle numeric decimals even without any config > >> > >> having to > >> > >> > > be > >> > >> > > > set, i.e. default config in the deserializer would be able to > >> > handle > >> > >> > > > numeric decimals. Of course, this is a two edged sword: this > >> > would > >> > >> > make > >> > >> > > > the deserializer work out of the box with numeric decimals, > >> > (yay!), > >> > >> but > >> > >> > > > would be a breaking change in that things that previously > >> failed > >> > >> would > >> > >> > > > suddenly start deserializing. This is a price I'm willing to > >> pay. > >> > >> > > > > >> > >> > > > IMHO, we should then plan to switch the default of decimal > >> > >> > serialization > >> > >> > > to > >> > >> > > > numeric, and text serialization to base 10 in the next major > >> > >> release. > >> > >> > > > (With upgrade notes to match). Though I know this is more > >> > >> contentious, > >> > >> > I > >> > >> > > > think it moves us forward in a much more standard way that > the > >> > >> current > >> > >> > > > encoding of decimals. > >> > >> > > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra < > al...@confluent.io> > >> > >> wrote: > >> > >> > > > > >> > >> > > > > Hi Everyone! > >> > >> > > > > > >> > >> > > > > Kicking off discussion for a new KIP: > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON > >> > >> > > > > > >> > >> > > > > For those who are interested, I have a prototype > >> implementation > >> > >> that > >> > >> > > > helped > >> > >> > > > > guide my design: https://github.com/agavra/kafka/pull/1 > >> > >> > > > > > >> > >> > > > > Cheers, > >> > >> > > > > Almog > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > > > >> > > >> > > >