EDIT: everywhere I've been using "HEX" I meant to be using "BASE64". I will update the KIP to reflect this.
On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io> wrote: > Thanks for the feedback Arjun! I'm happy changing the default config to > HEX instead of BINARY, no strong feelings there. > > I'll also clarify the example in the KIP to be clearer: > > - serialize the decimal field "foo" with value "10.2345" with the HEX > setting: {"foo": "D3J5"} > - serialize the decimal field "foo" with value "10.2345" with the NUMERIC > setting: {"foo": 10.2345} > > With regards to the precision issue, that was my original concern as well > (and why I originally suggested a TEXT format). Many JSON deserializers > (e.g. Jackson with DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS), > however, have the ability to deserialize decimals correctly so I will > configure that as the default for Connect's JsonDeserializer. It's probably > a good idea to call out that using other deserializers must be done with > care - I will add that documentation to the serialization config. > > Note that there would not be an issue on the _serialization_ side of > things as Jackson respects BigDecimal. > > Almog > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <arjun.sat...@gmail.com> > wrote: > >> hey Almog, nice work! couple of thoughts (hope I'm not late since you >> started the voting thread already): >> >> can you please add some examples to show the changes that you are >> proposing. makes me think that for a given decimal number, we will have >> two >> encodings: “asHex” and “asNumber”. >> >> should we call the default config value “HEX” instead of “BINARY”? >> >> Should we call out the fact that JS systems might be susceptible to double >> precision round offs with the new numeric format? here are some people >> discussing a similar problem >> https://github.com/EventStore/EventStore/issues/1541 >> >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io> wrote: >> >> > Hello Everyone, >> > >> > Summarizing an in-person discussion with Randall (this is copied from >> the >> > KIP): >> > >> > The original KIP suggested supporting an additional representation - >> base10 >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues because >> it >> > is impossible to disambiguate between TEXT and BINARY without an >> additional >> > config - furthermore, this makes the migration from one to the other >> nearly >> > impossible because it would require that all consumers stop consuming >> and >> > producers stop producing and atomically updating the config on all of >> them >> > after deploying the new code, or waiting for the full retention period >> to >> > pass - neither option is viable. The suggestion in the KIP is strictly >> an >> > improvement over the existing behavior, even if it doesn't support all >> > combinations. >> > >> > It seems that since most real-world use cases actually use the numeric >> > representation (not string) we can consider this an improvement. With >> the >> > new suggestion, we don't need a deserialization configuration (only a >> > serialization option) and the consumers will be able to always >> > automatically determine the serialization format. >> > >> > Based on this, I'll be opening up the simplified version of the KIP to a >> > vote. >> > >> > Almog >> > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <al...@confluent.io> wrote: >> > >> > > I'm mostly happy with your current suggestion (two configs, one for >> > > serialization and one for deserialization) and your implementation >> > > suggestion. One thing to note: >> > > >> > > > We should _always_ be able to deserialize a standard JSON >> > > > number as a decimal >> > > >> > > I was doing some research into decimals and JSON, and I can imagine a >> > > compelling reason to require string representations to avoid losing >> > > precision and to be certain that whomever is sending the data isn't >> > losing >> > > precision (e.g. https://stackoverflow.com/a/38357877/2258040). >> > > >> > > I'm okay with always allowing numerics, but thought it's worth raising >> > the >> > > thought. >> > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io> >> wrote: >> > > >> > >> The way I see it, we need to control two seperate things: >> > >> >> > >> 1. How do we _deserialize_ a decimal type if we encounter a text >> node in >> > >> the JSON? (We should _always_ be able to deserialize a standard >> JSON >> > >> number as a decimal). >> > >> 2. How do we chose how we want decimals to be _serialized_. >> > >> >> > >> This looks to fits well with your second suggestion of slightly >> > different >> > >> configs names for serialization vs deserialization. >> > >> a, For deserialization we only care about how to handle text nodes: ` >> > >> deserialization.decimal.*text*.format`, which should only have two >> valid >> > >> values BINARY | TEXT. >> > >> b. For serialization we need all three: >> `serialization.decimal.format`, >> > >> which should support all three options: BINARY | TEXT | NUMERIC. >> > >> >> > >> Implementation wise, I think these should be two separate enums, >> rather >> > >> than one shared enum and throwing an error if the deserializer is >> set to >> > >> NUMERIC. Mainly as this means the enums reflect the options >> available, >> > >> rather than this being hidden in config checking code. But that's a >> > minor >> > >> implementation detail. >> > >> >> > >> Personally, I'd be tempted to have the BINARY value named something >> like >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move >> away >> > >> from >> > >> it. >> > >> >> > >> It's a real shame that both of these settings require a default of >> > BINARY >> > >> for backwards compatibility, but I agree that discussions / plans >> around >> > >> switching the defaults should not block this KIP. >> > >> >> > >> Andy >> > >> >> > >> >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io> >> wrote: >> > >> >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)! >> > >> > >> > >> > > Is the text decimal a base16 encoded number, or is it base16 >> encoded >> > >> > binary >> > >> > > form of the number? >> > >> > >> > >> > The conversion happens as decimal.unscaledValue().toByteArray() and >> > then >> > >> > the byte array is converted to a hex string, so it's definitely the >> > >> binary >> > >> > form of the number converted to base16. Whether or not that's the >> same >> > >> as >> > >> > the base16 encoded number is a good question (toByteArray returns a >> > byte >> > >> > array containing a signed, big-endian, two's complement >> representation >> > >> of >> > >> > the big integer). >> > >> > >> > >> > > One suggestion I have is to change the proposed new config to >> only >> > >> affect >> > >> > > decimals stored as text, i.e. to switch between the current >> base16 >> > and >> > >> > the >> > >> > > more common base10. Then add another config to the serializer >> only >> > >> that >> > >> > > controls if decimals should be serialized as text or numeric. >> > >> > >> > >> > I think we need to be able to handle all mappings from >> serialization >> > >> format >> > >> > to deserialization format (e.g. read in BINARY and output TEXT), >> > which I >> > >> > think would be impossible with the alternative suggestion. I agree >> > that >> > >> > automatically deserializing numerics is valuable. I see two other >> ways >> > >> to >> > >> > get this, both keeping the serialization.format config the same: >> > >> > >> > >> > - have json.decimal.deserialization.format accept all three >> formats. >> > if >> > >> set >> > >> > to BINARY/TEXT, numerics would be automatically supported. If set >> to >> > >> > NUMERIC, then any string coming in would result in deserialization >> > error >> > >> > (defaults to BINARY for backwards compatibility) >> > >> > - change json.decimal.deserialization.format to >> > >> > json.decimal.deserialization.string.format which accepts only >> > >> BINARY/TEXT >> > >> > (defaults to BINARY for backwards compatibility) >> > >> > >> > >> > > would be a breaking change in that things that previously failed >> > would >> > >> > > suddenly start deserializing. This is a price I'm willing to >> pay. >> > >> > >> > >> > I agree. I'm willing to pay this price too. >> > >> > >> > >> > > IMHO, we should then plan to switch the default of decimal >> > >> serialization >> > >> > to >> > >> > > numeric, and text serialization to base 10 in the next major >> > release. >> > >> > >> > >> > I think that can be a separate discussion, I don't want to block >> this >> > >> KIP >> > >> > on it. >> > >> > >> > >> > Thoughts? >> > >> > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <o...@wikimedia.org> >> > wrote: >> > >> > >> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I use >> > >> JSONSchemas to >> > >> > > indicate whether a JSON number should be deserialized as an >> integer >> > >> or a >> > >> > > decimal >> > >> > > < >> > >> > > >> > >> > >> > >> >> > >> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261 >> > >> > > >. >> > >> > > Not everyone is going to have JSONSchemas available when >> converting, >> > >> but >> > >> > if >> > >> > > you do, it is an easy way to support JSON numbers as decimals. >> > >> > > >> > >> > > Carry on! :) >> > >> > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <a...@confluent.io> >> > >> wrote: >> > >> > > >> > >> > > > Hi Almog, >> > >> > > > >> > >> > > > Like the KIP - I think being able to support decimals in JSON >> in >> > the >> > >> > same >> > >> > > > way most other systems do is a great improvement. >> > >> > > > >> > >> > > > It's not 100% clear to me from the KIP what the current format >> is. >> > >> Is >> > >> > > the >> > >> > > > text decimal a base16 encoded number, or is it base16 encoded >> > binary >> > >> > form >> > >> > > > of the number? (I've not tried to get my head around if these >> two >> > >> are >> > >> > > even >> > >> > > > different!) >> > >> > > > >> > >> > > > One suggestion I have is to change the proposed new config to >> only >> > >> > affect >> > >> > > > decimals stored as text, i.e. to switch between the current >> base16 >> > >> and >> > >> > > the >> > >> > > > more common base10. Then add another config to the serialzier >> > only >> > >> > that >> > >> > > > controls if decimals should be serialized as text or numeric. >> The >> > >> > > benefit >> > >> > > > of this approach is it allows us to enhance the deserializer to >> > >> > > > automatically handle numeric decimals even without any config >> > >> having to >> > >> > > be >> > >> > > > set, i.e. default config in the deserializer would be able to >> > handle >> > >> > > > numeric decimals. Of course, this is a two edged sword: this >> > would >> > >> > make >> > >> > > > the deserializer work out of the box with numeric decimals, >> > (yay!), >> > >> but >> > >> > > > would be a breaking change in that things that previously >> failed >> > >> would >> > >> > > > suddenly start deserializing. This is a price I'm willing to >> pay. >> > >> > > > >> > >> > > > IMHO, we should then plan to switch the default of decimal >> > >> > serialization >> > >> > > to >> > >> > > > numeric, and text serialization to base 10 in the next major >> > >> release. >> > >> > > > (With upgrade notes to match). Though I know this is more >> > >> contentious, >> > >> > I >> > >> > > > think it moves us forward in a much more standard way that the >> > >> current >> > >> > > > encoding of decimals. >> > >> > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <al...@confluent.io> >> > >> wrote: >> > >> > > > >> > >> > > > > Hi Everyone! >> > >> > > > > >> > >> > > > > Kicking off discussion for a new KIP: >> > >> > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON >> > >> > > > > >> > >> > > > > For those who are interested, I have a prototype >> implementation >> > >> that >> > >> > > > helped >> > >> > > > > guide my design: https://github.com/agavra/kafka/pull/1 >> > >> > > > > >> > >> > > > > Cheers, >> > >> > > > > Almog >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > >> >