Good catches! Fixed :) On Thu, Aug 8, 2019 at 10:36 PM Arjun Satish <arjun.sat...@gmail.com> wrote:
> Cool! > > Couple of nits: > > - In public interfaces, typo: *json.decimal.serialization.fromat* > - In public interfaces, you use the term "HEX" instead of "BASE64". > > > > On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <al...@confluent.io> wrote: > > > EDIT: everywhere I've been using "HEX" I meant to be using "BASE64". I > will > > update the KIP to reflect this. > > > > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io> wrote: > > > > > Thanks for the feedback Arjun! I'm happy changing the default config to > > > HEX instead of BINARY, no strong feelings there. > > > > > > I'll also clarify the example in the KIP to be clearer: > > > > > > - serialize the decimal field "foo" with value "10.2345" with the HEX > > > setting: {"foo": "D3J5"} > > > - serialize the decimal field "foo" with value "10.2345" with the > NUMERIC > > > setting: {"foo": 10.2345} > > > > > > With regards to the precision issue, that was my original concern as > well > > > (and why I originally suggested a TEXT format). Many JSON deserializers > > > (e.g. Jackson with DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS), > > > however, have the ability to deserialize decimals correctly so I will > > > configure that as the default for Connect's JsonDeserializer. It's > > probably > > > a good idea to call out that using other deserializers must be done > with > > > care - I will add that documentation to the serialization config. > > > > > > Note that there would not be an issue on the _serialization_ side of > > > things as Jackson respects BigDecimal. > > > > > > Almog > > > > > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <arjun.sat...@gmail.com> > > > wrote: > > > > > >> hey Almog, nice work! couple of thoughts (hope I'm not late since you > > >> started the voting thread already): > > >> > > >> can you please add some examples to show the changes that you are > > >> proposing. makes me think that for a given decimal number, we will > have > > >> two > > >> encodings: “asHex” and “asNumber”. > > >> > > >> should we call the default config value “HEX” instead of “BINARY”? > > >> > > >> Should we call out the fact that JS systems might be susceptible to > > double > > >> precision round offs with the new numeric format? here are some people > > >> discussing a similar problem > > >> https://github.com/EventStore/EventStore/issues/1541 > > >> > > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io> > wrote: > > >> > > >> > Hello Everyone, > > >> > > > >> > Summarizing an in-person discussion with Randall (this is copied > from > > >> the > > >> > KIP): > > >> > > > >> > The original KIP suggested supporting an additional representation - > > >> base10 > > >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues > because > > >> it > > >> > is impossible to disambiguate between TEXT and BINARY without an > > >> additional > > >> > config - furthermore, this makes the migration from one to the other > > >> nearly > > >> > impossible because it would require that all consumers stop > consuming > > >> and > > >> > producers stop producing and atomically updating the config on all > of > > >> them > > >> > after deploying the new code, or waiting for the full retention > period > > >> to > > >> > pass - neither option is viable. The suggestion in the KIP is > strictly > > >> an > > >> > improvement over the existing behavior, even if it doesn't support > all > > >> > combinations. > > >> > > > >> > It seems that since most real-world use cases actually use the > numeric > > >> > representation (not string) we can consider this an improvement. > With > > >> the > > >> > new suggestion, we don't need a deserialization configuration (only > a > > >> > serialization option) and the consumers will be able to always > > >> > automatically determine the serialization format. > > >> > > > >> > Based on this, I'll be opening up the simplified version of the KIP > > to a > > >> > vote. > > >> > > > >> > Almog > > >> > > > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <al...@confluent.io> > > wrote: > > >> > > > >> > > I'm mostly happy with your current suggestion (two configs, one > for > > >> > > serialization and one for deserialization) and your implementation > > >> > > suggestion. One thing to note: > > >> > > > > >> > > > We should _always_ be able to deserialize a standard JSON > > >> > > > number as a decimal > > >> > > > > >> > > I was doing some research into decimals and JSON, and I can > imagine > > a > > >> > > compelling reason to require string representations to avoid > losing > > >> > > precision and to be certain that whomever is sending the data > isn't > > >> > losing > > >> > > precision (e.g. https://stackoverflow.com/a/38357877/2258040). > > >> > > > > >> > > I'm okay with always allowing numerics, but thought it's worth > > raising > > >> > the > > >> > > thought. > > >> > > > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io> > > >> wrote: > > >> > > > > >> > >> The way I see it, we need to control two seperate things: > > >> > >> > > >> > >> 1. How do we _deserialize_ a decimal type if we encounter a text > > >> node in > > >> > >> the JSON? (We should _always_ be able to deserialize a > standard > > >> JSON > > >> > >> number as a decimal). > > >> > >> 2. How do we chose how we want decimals to be _serialized_. > > >> > >> > > >> > >> This looks to fits well with your second suggestion of slightly > > >> > different > > >> > >> configs names for serialization vs deserialization. > > >> > >> a, For deserialization we only care about how to handle text > > nodes: ` > > >> > >> deserialization.decimal.*text*.format`, which should only have > two > > >> valid > > >> > >> values BINARY | TEXT. > > >> > >> b. For serialization we need all three: > > >> `serialization.decimal.format`, > > >> > >> which should support all three options: BINARY | TEXT | NUMERIC. > > >> > >> > > >> > >> Implementation wise, I think these should be two separate enums, > > >> rather > > >> > >> than one shared enum and throwing an error if the deserializer is > > >> set to > > >> > >> NUMERIC. Mainly as this means the enums reflect the options > > >> available, > > >> > >> rather than this being hidden in config checking code. But > that's > > a > > >> > minor > > >> > >> implementation detail. > > >> > >> > > >> > >> Personally, I'd be tempted to have the BINARY value named > something > > >> like > > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move > > >> away > > >> > >> from > > >> > >> it. > > >> > >> > > >> > >> It's a real shame that both of these settings require a default > of > > >> > BINARY > > >> > >> for backwards compatibility, but I agree that discussions / plans > > >> around > > >> > >> switching the defaults should not block this KIP. > > >> > >> > > >> > >> Andy > > >> > >> > > >> > >> > > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io> > > >> wrote: > > >> > >> > > >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)! > > >> > >> > > > >> > >> > > Is the text decimal a base16 encoded number, or is it base16 > > >> encoded > > >> > >> > binary > > >> > >> > > form of the number? > > >> > >> > > > >> > >> > The conversion happens as decimal.unscaledValue().toByteArray() > > and > > >> > then > > >> > >> > the byte array is converted to a hex string, so it's definitely > > the > > >> > >> binary > > >> > >> > form of the number converted to base16. Whether or not that's > the > > >> same > > >> > >> as > > >> > >> > the base16 encoded number is a good question (toByteArray > > returns a > > >> > byte > > >> > >> > array containing a signed, big-endian, two's complement > > >> representation > > >> > >> of > > >> > >> > the big integer). > > >> > >> > > > >> > >> > > One suggestion I have is to change the proposed new config to > > >> only > > >> > >> affect > > >> > >> > > decimals stored as text, i.e. to switch between the current > > >> base16 > > >> > and > > >> > >> > the > > >> > >> > > more common base10. Then add another config to the > serializer > > >> only > > >> > >> that > > >> > >> > > controls if decimals should be serialized as text or numeric. > > >> > >> > > > >> > >> > I think we need to be able to handle all mappings from > > >> serialization > > >> > >> format > > >> > >> > to deserialization format (e.g. read in BINARY and output > TEXT), > > >> > which I > > >> > >> > think would be impossible with the alternative suggestion. I > > agree > > >> > that > > >> > >> > automatically deserializing numerics is valuable. I see two > other > > >> ways > > >> > >> to > > >> > >> > get this, both keeping the serialization.format config the > same: > > >> > >> > > > >> > >> > - have json.decimal.deserialization.format accept all three > > >> formats. > > >> > if > > >> > >> set > > >> > >> > to BINARY/TEXT, numerics would be automatically supported. If > set > > >> to > > >> > >> > NUMERIC, then any string coming in would result in > > deserialization > > >> > error > > >> > >> > (defaults to BINARY for backwards compatibility) > > >> > >> > - change json.decimal.deserialization.format to > > >> > >> > json.decimal.deserialization.string.format which accepts only > > >> > >> BINARY/TEXT > > >> > >> > (defaults to BINARY for backwards compatibility) > > >> > >> > > > >> > >> > > would be a breaking change in that things that previously > > failed > > >> > would > > >> > >> > > suddenly start deserializing. This is a price I'm willing to > > >> pay. > > >> > >> > > > >> > >> > I agree. I'm willing to pay this price too. > > >> > >> > > > >> > >> > > IMHO, we should then plan to switch the default of decimal > > >> > >> serialization > > >> > >> > to > > >> > >> > > numeric, and text serialization to base 10 in the next major > > >> > release. > > >> > >> > > > >> > >> > I think that can be a separate discussion, I don't want to > block > > >> this > > >> > >> KIP > > >> > >> > on it. > > >> > >> > > > >> > >> > Thoughts? > > >> > >> > > > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto < > o...@wikimedia.org> > > >> > wrote: > > >> > >> > > > >> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I use > > >> > >> JSONSchemas to > > >> > >> > > indicate whether a JSON number should be deserialized as an > > >> integer > > >> > >> or a > > >> > >> > > decimal > > >> > >> > > < > > >> > >> > > > > >> > >> > > > >> > >> > > >> > > > >> > > > https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261 > > >> > >> > > >. > > >> > >> > > Not everyone is going to have JSONSchemas available when > > >> converting, > > >> > >> but > > >> > >> > if > > >> > >> > > you do, it is an easy way to support JSON numbers as > decimals. > > >> > >> > > > > >> > >> > > Carry on! :) > > >> > >> > > > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates < > a...@confluent.io > > > > > >> > >> wrote: > > >> > >> > > > > >> > >> > > > Hi Almog, > > >> > >> > > > > > >> > >> > > > Like the KIP - I think being able to support decimals in > JSON > > >> in > > >> > the > > >> > >> > same > > >> > >> > > > way most other systems do is a great improvement. > > >> > >> > > > > > >> > >> > > > It's not 100% clear to me from the KIP what the current > > format > > >> is. > > >> > >> Is > > >> > >> > > the > > >> > >> > > > text decimal a base16 encoded number, or is it base16 > encoded > > >> > binary > > >> > >> > form > > >> > >> > > > of the number? (I've not tried to get my head around if > these > > >> two > > >> > >> are > > >> > >> > > even > > >> > >> > > > different!) > > >> > >> > > > > > >> > >> > > > One suggestion I have is to change the proposed new config > to > > >> only > > >> > >> > affect > > >> > >> > > > decimals stored as text, i.e. to switch between the current > > >> base16 > > >> > >> and > > >> > >> > > the > > >> > >> > > > more common base10. Then add another config to the > > serialzier > > >> > only > > >> > >> > that > > >> > >> > > > controls if decimals should be serialized as text or > numeric. > > >> The > > >> > >> > > benefit > > >> > >> > > > of this approach is it allows us to enhance the > deserializer > > to > > >> > >> > > > automatically handle numeric decimals even without any > config > > >> > >> having to > > >> > >> > > be > > >> > >> > > > set, i.e. default config in the deserializer would be able > to > > >> > handle > > >> > >> > > > numeric decimals. Of course, this is a two edged sword: > this > > >> > would > > >> > >> > make > > >> > >> > > > the deserializer work out of the box with numeric decimals, > > >> > (yay!), > > >> > >> but > > >> > >> > > > would be a breaking change in that things that previously > > >> failed > > >> > >> would > > >> > >> > > > suddenly start deserializing. This is a price I'm willing > to > > >> pay. > > >> > >> > > > > > >> > >> > > > IMHO, we should then plan to switch the default of decimal > > >> > >> > serialization > > >> > >> > > to > > >> > >> > > > numeric, and text serialization to base 10 in the next > major > > >> > >> release. > > >> > >> > > > (With upgrade notes to match). Though I know this is more > > >> > >> contentious, > > >> > >> > I > > >> > >> > > > think it moves us forward in a much more standard way that > > the > > >> > >> current > > >> > >> > > > encoding of decimals. > > >> > >> > > > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra < > > al...@confluent.io> > > >> > >> wrote: > > >> > >> > > > > > >> > >> > > > > Hi Everyone! > > >> > >> > > > > > > >> > >> > > > > Kicking off discussion for a new KIP: > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON > > >> > >> > > > > > > >> > >> > > > > For those who are interested, I have a prototype > > >> implementation > > >> > >> that > > >> > >> > > > helped > > >> > >> > > > > guide my design: https://github.com/agavra/kafka/pull/1 > > >> > >> > > > > > > >> > >> > > > > Cheers, > > >> > >> > > > > Almog > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > > > > >> > > > >> > > > > > >