Thanks for all the changes, Almog. The current KIP looks good to me. Randall
On Fri, Aug 30, 2019 at 11:28 AM Almog Gavra <al...@confluent.io> wrote: > Thanks again Randall! Here are the changes I made: > > - Defaults. The KIP had mentioned that the default would be BASE64 in the > "Public Interfaces" section. I have also added your suggestion in "Proposed > Changes". > - I've added a bullet point to change the internal converter to use > decimal.format=NUMERIC in Proposed Changes. If I see that this is not > possible or cumbersome during implementation I will amend the KIP to keep > this change minimally scoped. > - Adopted your third suggestion > - Adopted your migration plan proposal > - Added rejected alternative for creating a new converter > > > calling `NumericNode.decimalValue()` will always return a > java.math.BigDecimal even > > if the underlying Number was not a BigDecimal > > I'm not sure I understand your comment here. NumericNode#decimalValue will > always return a BigDecimal, regardless of what the underlying JSON data is > - that hasn't changed with this KIP. The decimalValue() will only be called > when converting to a DECIMAL logical type. > > What has changed, is that the NumericNode will store a BigDecimal instead > of a double whenever the JSON value is a number with a decimal point in it > (this should answer your fourth point, about deserializing a BigDecimal for > all floating points). > > > Shouldn't the `convertToConnect(Schema, JsonNode) instead use the type of > Number value as > > parsed and returned by the JsonDeserializer to determine the proper > schema > > type > > There is no way to infer the "proper schema type" in the JsonDeserializer. > If my JSON value is {"foo": 1.234} I have no idea whether "foo" is a > decimal or a double - that's the reason we need a configuration value in > first place. This means that in order to avoid precision loss, we must > deserialize any floating point number first as a BigDecimal. > > > and then get the proper value type using that schema type's converter? > > That's exactly the proposal. I think this will be clear in the code. > > Almog > > On Fri, Aug 30, 2019 at 8:00 AM Randall Hauch <rha...@gmail.com> wrote: > > > Thanks for the updates, Almog. This looks really good, but I have a few > > more comments (most wording, though one potentially thorny issue): > > > > First, the KIP should define the default value for the `decimal.format` > > property. IIUC, it will be BASE64, and to gain the new behavior users > will > > have to explicitly set this to NUMERIC. In fact, I'd recommend changing > > this bullet item from: > > > > - JsonConverter will accept the decimal.format configuration to > > determine the serialization format. If the value is BASE64, the > behavior > > remains unchanged (i.e. it serializes it as a base64 text). If the > > value is > > NUMERIC, the JSON node will be a number representing that decimal > (e.g. > > 10.2345 instead of "D3J5"). > > > > to something like: > > > > - Define a new `decimal.format` configuration property on > JsonConverter > > to specify the serialization format for Connect DECIMAL logical type > > values, with two allowed literal values for the configuration > property: > > - `BASE64` specifies the existing behavior of serializing DECIMAL > > logical types as base64 encoded binary data (e.g., "D3J5" in the > > example > > above), and will be the default; and > > - `NUMERIC` will serialize Connect DECIMAL logical type values in > > JSON as a number representing that decimal > > > > Second, since the default will be the existing BASE64 representation, > what > > do you think about changing the JsonConverter instances used by the > Connect > > worker for internal topics to enable `decimal.format=NUMERIC`? I don't > > think we actually use decimals in the internal messages, but if we do at > > some point then the converters will store them with the improved natural > > representation. > > > > Third, the following bullet item could be more clear: > > > > - JsonConverter will automatically handle deserialization of either > > serialization format given a Decimal logical type schema, i.e. it will > > accept both a deserialized BinaryNode and NumericNode. If the value > is a > > BinaryNode, it will construct a java BigDecimal from the binaryValue() > > (which is a btye[]). If the value is a NumericNode, it will simply > pass > > through the decimalValue() deserialized by the JsonDeserializer. > > > > such as maybe: > > > > - The JsonConverter deserialization method currently expects only a > > BinaryNode, but will be changed to also handle NumericNode by calling > > NumericNode.decimalValue(). > > > > This brings up an interesting potential issue: if `schemas.enable=false`, > > then there will be no schema in the record, and calling > > `NumericNode.decimalValue()` will always return a java.math.BigDecimal > even > > if the underlying Number was not a BigDecimal. Shouldn't the > > `convertToConnect(Schema, JsonNode) instead use the type of Number value > as > > parsed and returned by the JsonDeserializer to determine the proper > schema > > type, and then get the proper value type using that schema type's > > converter? > > > > Fourth, I'm not sure I understand the following bullet: > > > > - JsonDeserializer will now default floating point deserialization to > > BigDecimal to avoid losing precision. This may impact performance when > > deserializing doubles - a JMH microbenchmark on my local MBP, this > > estimated about 3x degradation for deserializing JSON floating points. > > If > > the connect schema is not the decimal logical type, the JsonConverter > > will > > convert this BigDecimal value into the corresponding floating point > java > > object. > > > > Fifth, I think the migration plan is not quite accurate. Step 3 mentions > > changing the Connect worker config's key and value converters to use the > > new setting, and a restart (step 4) is necessary after this step. Perhaps > > step 3 should be "If the Connect worker uses the JsonConverter for the > key > > and/or value converters, optionally set the `decimal.format=NUMERIC` for > > the key and/or value converter and restart the workers." followed by Step > > 4: "If desired, update any source connector configs that use the > > JsonConverter for key and/or value converters to use > > `decimal.format=NUMERIC`." > > > > Finally, should we add a short discussion in the Rejected Alternatives > > about the option of leaving JsonConverter untouched and creating a > > different converter implementation? > > > > Thanks! > > > > Randall > > > > > > > > On Mon, Aug 26, 2019 at 11:51 AM Almog Gavra <al...@confluent.io> wrote: > > > > > Thanks for the feedback Randall! I have updated the KIP with the > > following > > > edits: > > > > > > * Updated the reference from "producer" to "source" (I had missed that > > > one!) > > > * Changed the config from "json.decimal.serialization.format" to > > > "decimal.format" > > > * Clarified case sensitivity > > > * Clarified the proposed changes to note that deserialization is not > > > affected by the config > > > * Clarified the changes in JsonConverter to handle deserialization (see > > my > > > third bullet below) > > > * Added a clear migration plan and simplified compatibility > > > > > > Here are also some clarifications based on your comments. > > > > > > * I think "json" has limited value in the configuration name. If put > in a > > > top-level worker config, it clarifies that it only affects connectors > > using > > > the JsonConverter. I have opted for your suggestion and dropped it. > > > * I think "serialization" has limited value in the configuration name. > If > > > we ever want to introduce "deserialization" configurations, there will > be > > > asymmetry in the configuration names. I have opted for your suggestion > > and > > > dropped it. > > > * The JsonConverter will not "always look for numbers". The converter > > will > > > receive from the Jackson Object Mapper either a NumericNode containing > a > > > big decimal or a BinaryNode containing a btye[]. Based on the type of > > this > > > node, it will convert the value to a BigDecimal appropriately (or any > > other > > > Connect java type based on the schema). > > > * "the ... JsonDeserializer are not affected" is not exactly true, but > > > semantically correct. See the note in the KIP about defaulting floating > > > points to BigDecimal to avoid precision loss. > > > * "The resulting application, however, may need to handle a wider range > > of > > > numeric values." Unless I misunderstand what you're saying, I don't > think > > > this is correct. The resulting application will still receive exactly > the > > > same Connect data object from the JsonConverter as it was before - only > > the > > > SerDe layer is affected. > > > > > > Cheers, > > > Almog > > > > > > On Sun, Aug 25, 2019 at 4:28 PM Randall Hauch <rha...@gmail.com> > wrote: > > > > > > > Thanks for all the work, Almog. > > > > > > > > For the most part, I think this KIP will be a great improvement, and > > IMO > > > is > > > > almost ready to go. However, I do have a few suggestions that affect > > the > > > > wording more than the intent. > > > > > > > > First, the name of the `json.decimal.serialization.format` property > is > > > > pretty long, especially when it is prefixed in the Worker config or > in > > a > > > > connector config as `key.converer.json.decimal.serialization.format` > or > > > > `value.converter.json.decimal.serialization.format` . Have you > > > considered a > > > > shorter config property name, such as maybe `decimal.format`? Is > there > > > any > > > > benefit to include "json" and "serialization" in the property name? > > Also, > > > > we should be clear that the value will not be case sensitive (e.g., > > > > "numeric" and "NUMERIC" would be equivalent), to keep in alignment > with > > > > other enumeration literals in Connect configurations. The goal should > > be > > > > simple > > > > > > > > Second, the Motivation section, has the following sentence: > > > > > > > > "A new configuration for producers json.decimal.serialization.format > > will > > > > be introduced to the JsonConverter configuration to help control > > whether > > > > source converters will serialize decimals in numeric or binary > > formats." > > > > > > > > > > > > I agree with an earlier comment from Konstantine that "producers" > here > > is > > > > distracting and does not mirror the normal definition of "producers" > > > within > > > > the Kafka context. I suggest rephrasing this to something like > > > > > > > > "Introduce to the JsonConverter a new configuration property named > > > > json.decimal.serialization.format to control whether source > converters > > > will > > > > serialize decimals in numeric or binary formats." > > > > > > > > > > > > Third, the KIP should be more clear about whether the > > > > `json.decimal.serialization.format` setting does or does not affect > > > > deserialization? IIUC, the deserialization logic will always look for > > > JSON > > > > numbers, and will always use the Schema to define whether it should > > > convert > > > > the value to a different number type. Is that a fair statement? > > > > > > > > Fourth, the JsonSerializer and JsonDeserializer are not affected, yet > > are > > > > still compatible with the old and new behavior. Because the primary > > > purpose > > > > of this new setting is to define how Connect DECIMAL logical type > > values > > > > are serialized in JSON documents, the JsonDeserializer will still be > > able > > > > to deserialize the JSON document correctly. The resulting > application, > > > > however, may need to handle a wider range of numeric values. > > > > > > > > Fifth, the Compatibility section seems more complicated than perhaps > it > > > > needs to be, maybe because it seems to distinguish between upgrading > > and > > > > setting the decimal serialization format. Maybe it would be > sufficient > > to > > > > simply emphasize that all of the sink connectors (or consumer > > > applications) > > > > using the JsonConverter with > > > > the `json.decimal.serialization.format=NUMERIC` setting consuming > > records > > > > from a set of topics be upgraded and changed *before* any of the > source > > > > connectors (or other producer applications) using the JsonConverter > to > > > > serialize records are changed to use > > > > the `json.decimal.serialization.format=NUMERIC` setting? It may also > > > > warrant giving more concrete advice on upgrade procedures. For > example, > > > how > > > > does a user upgrade a set of Connect workers to use this new > property? > > Do > > > > they upgrade first and restart to ensure everything runs as-is, and > > then > > > > upgrade their source connectors to set > > > > `json.decimal.serialization.format=NUMERIC`via connector > configurations > > > or > > > > worker configs? > > > > > > > > Anyway, great job so far! > > > > > > > > Best regards, > > > > > > > > Randall > > > > > > > > On Thu, Aug 15, 2019 at 12:00 PM Konstantine Karantasis < > > > > konstant...@confluent.io> wrote: > > > > > > > > > Thanks! KIP reads even better for me now. > > > > > Just voted. +1 non-binding > > > > > > > > > > Konstantine > > > > > > > > > > On Wed, Aug 14, 2019 at 7:00 PM Almog Gavra <al...@confluent.io> > > > wrote: > > > > > > > > > > > Thanks for the review Konstantine! > > > > > > > > > > > > I think the terminology suggestion definitely makes things > clearer > > - > > > I > > > > > will > > > > > > update the documentation based on your suggestion (e.g. > > > s/Consumer/Sink > > > > > > Converter/g and s/Producer/Source Converter/g). > > > > > > > > > > > > Cheers, > > > > > > Almog > > > > > > > > > > > > On Wed, Aug 14, 2019 at 8:13 AM Konstantine Karantasis < > > > > > > konstant...@confluent.io> wrote: > > > > > > > > > > > > > Thanks Almog for preparing this KIP! > > > > > > > I think it will improve usability and troubleshooting with JSON > > > data > > > > a > > > > > > lot. > > > > > > > > > > > > > > The finalized plan seems quite concrete now. I also liked that > > some > > > > > > > implementation specific implications (such as setting the > > > > ObjectMapper > > > > > to > > > > > > > deserialize floating point as BigDecimal) are highlighted in > the > > > KIP. > > > > > > > > > > > > > > Still, as I was reading the KIP, the main obstacle I > encountered > > > was > > > > > > around > > > > > > > terminology. I couldn't get used to reading "producer" and > > > "consumer" > > > > > and > > > > > > > not thinking in terms of Kafka producers and consumers - which > > are > > > > not > > > > > > > relevant to what this KIP proposes. Thus, I'd suggest replacing > > > > > > > "Producer(s)" with "Source Converter(s)" and "Consumer(s)" with > > > "Sink > > > > > > > Converter(s)" (even if "Converter used by Source Connector" or > > > > > "Converter > > > > > > > used by Sink Connector" would be even more accurate - maybe > this > > > > could > > > > > be > > > > > > > an explanation in a footnote). Terminology around converters > has > > > been > > > > > > > tricky in the past and adding producers/consumers in the mix > > might > > > > add > > > > > to > > > > > > > the confusion. > > > > > > > > > > > > > > Another example where I'd apply this different terminology > would > > be > > > > to > > > > > a > > > > > > > phrase such as the following: > > > > > > > "Because of this, users must take care to first ensure that all > > > > > consumers > > > > > > > have upgraded to the new code before upgrading producers to > make > > > use > > > > of > > > > > > the > > > > > > > NUMERIC serialization format." > > > > > > > which I'd write > > > > > > > "Because of this, users must take care to first ensure that all > > > sink > > > > > > > connectors have upgraded to the new converter code before > > upgrading > > > > > > source > > > > > > > connectors to make use of the NUMERIC serialization format in > > > > > > > JsonConverter." > > > > > > > > > > > > > > Let me know if you think this suggestion makes the KIP easier > to > > > > > follow. > > > > > > > Otherwise I think it's a solid proposal. > > > > > > > > > > > > > > I'm concluding with a couple of nits: > > > > > > > > > > > > > > - "Upgraded Producer with BASE64 serialization, Legacy > Consumer: > > > this > > > > > > > scenario is okay as the upgraded ~producer~ consumer will be > able > > > to > > > > > read > > > > > > > binary as today" (again according to my suggestion above, it > > could > > > be > > > > > as > > > > > > > the upgraded source converter ...) > > > > > > > > > > > > > > - "consumers cannot consumer NUMERIC data. " -> "consumers > cannot > > > > read > > > > > > > NUMERIC data" > > > > > > > > > > > > > > Best, > > > > > > > Konstantine > > > > > > > > > > > > > > On Fri, Aug 9, 2019 at 6:37 PM Almog Gavra <al...@confluent.io > > > > > > wrote: > > > > > > > > > > > > > > > Good catches! Fixed :) > > > > > > > > > > > > > > > > On Thu, Aug 8, 2019 at 10:36 PM Arjun Satish < > > > > arjun.sat...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Cool! > > > > > > > > > > > > > > > > > > Couple of nits: > > > > > > > > > > > > > > > > > > - In public interfaces, typo: > > > *json.decimal.serialization.fromat* > > > > > > > > > - In public interfaces, you use the term "HEX" instead of > > > > "BASE64". > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra < > > al...@confluent.io > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > EDIT: everywhere I've been using "HEX" I meant to be > using > > > > > > "BASE64". > > > > > > > I > > > > > > > > > will > > > > > > > > > > update the KIP to reflect this. > > > > > > > > > > > > > > > > > > > > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra < > > > al...@confluent.io > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback Arjun! I'm happy changing the > > > default > > > > > > > config > > > > > > > > to > > > > > > > > > > > HEX instead of BINARY, no strong feelings there. > > > > > > > > > > > > > > > > > > > > > > I'll also clarify the example in the KIP to be clearer: > > > > > > > > > > > > > > > > > > > > > > - serialize the decimal field "foo" with value > "10.2345" > > > with > > > > > the > > > > > > > HEX > > > > > > > > > > > setting: {"foo": "D3J5"} > > > > > > > > > > > - serialize the decimal field "foo" with value > "10.2345" > > > with > > > > > the > > > > > > > > > NUMERIC > > > > > > > > > > > setting: {"foo": 10.2345} > > > > > > > > > > > > > > > > > > > > > > With regards to the precision issue, that was my > original > > > > > concern > > > > > > > as > > > > > > > > > well > > > > > > > > > > > (and why I originally suggested a TEXT format). Many > JSON > > > > > > > > deserializers > > > > > > > > > > > (e.g. Jackson with > > > > > > > > DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS), > > > > > > > > > > > however, have the ability to deserialize decimals > > correctly > > > > so > > > > > I > > > > > > > will > > > > > > > > > > > configure that as the default for Connect's > > > JsonDeserializer. > > > > > > It's > > > > > > > > > > probably > > > > > > > > > > > a good idea to call out that using other deserializers > > must > > > > be > > > > > > done > > > > > > > > > with > > > > > > > > > > > care - I will add that documentation to the > serialization > > > > > config. > > > > > > > > > > > > > > > > > > > > > > Note that there would not be an issue on the > > > _serialization_ > > > > > side > > > > > > > of > > > > > > > > > > > things as Jackson respects BigDecimal. > > > > > > > > > > > > > > > > > > > > > > Almog > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish < > > > > > > > arjun.sat...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> hey Almog, nice work! couple of thoughts (hope I'm not > > > late > > > > > > since > > > > > > > > you > > > > > > > > > > >> started the voting thread already): > > > > > > > > > > >> > > > > > > > > > > >> can you please add some examples to show the changes > > that > > > > you > > > > > > are > > > > > > > > > > >> proposing. makes me think that for a given decimal > > number, > > > > we > > > > > > will > > > > > > > > > have > > > > > > > > > > >> two > > > > > > > > > > >> encodings: “asHex” and “asNumber”. > > > > > > > > > > >> > > > > > > > > > > >> should we call the default config value “HEX” instead > of > > > > > > “BINARY”? > > > > > > > > > > >> > > > > > > > > > > >> Should we call out the fact that JS systems might be > > > > > susceptible > > > > > > > to > > > > > > > > > > double > > > > > > > > > > >> precision round offs with the new numeric format? here > > are > > > > > some > > > > > > > > people > > > > > > > > > > >> discussing a similar problem > > > > > > > > > > >> https://github.com/EventStore/EventStore/issues/1541 > > > > > > > > > > >> > > > > > > > > > > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra < > > > > > al...@confluent.io> > > > > > > > > > wrote: > > > > > > > > > > >> > > > > > > > > > > >> > Hello Everyone, > > > > > > > > > > >> > > > > > > > > > > > >> > Summarizing an in-person discussion with Randall > (this > > > is > > > > > > copied > > > > > > > > > from > > > > > > > > > > >> the > > > > > > > > > > >> > KIP): > > > > > > > > > > >> > > > > > > > > > > > >> > The original KIP suggested supporting an additional > > > > > > > > representation - > > > > > > > > > > >> base10 > > > > > > > > > > >> > encoded text (e.g. `{"asText":"10.2345"}`). This > > causes > > > > > issues > > > > > > > > > because > > > > > > > > > > >> it > > > > > > > > > > >> > is impossible to disambiguate between TEXT and > BINARY > > > > > without > > > > > > an > > > > > > > > > > >> additional > > > > > > > > > > >> > config - furthermore, this makes the migration from > > one > > > to > > > > > the > > > > > > > > other > > > > > > > > > > >> nearly > > > > > > > > > > >> > impossible because it would require that all > consumers > > > > stop > > > > > > > > > consuming > > > > > > > > > > >> and > > > > > > > > > > >> > producers stop producing and atomically updating the > > > > config > > > > > on > > > > > > > all > > > > > > > > > of > > > > > > > > > > >> them > > > > > > > > > > >> > after deploying the new code, or waiting for the > full > > > > > > retention > > > > > > > > > period > > > > > > > > > > >> to > > > > > > > > > > >> > pass - neither option is viable. The suggestion in > the > > > KIP > > > > > is > > > > > > > > > strictly > > > > > > > > > > >> an > > > > > > > > > > >> > improvement over the existing behavior, even if it > > > doesn't > > > > > > > support > > > > > > > > > all > > > > > > > > > > >> > combinations. > > > > > > > > > > >> > > > > > > > > > > > >> > It seems that since most real-world use cases > actually > > > use > > > > > the > > > > > > > > > numeric > > > > > > > > > > >> > representation (not string) we can consider this an > > > > > > improvement. > > > > > > > > > With > > > > > > > > > > >> the > > > > > > > > > > >> > new suggestion, we don't need a deserialization > > > > > configuration > > > > > > > > (only > > > > > > > > > a > > > > > > > > > > >> > serialization option) and the consumers will be able > > to > > > > > always > > > > > > > > > > >> > automatically determine the serialization format. > > > > > > > > > > >> > > > > > > > > > > > >> > Based on this, I'll be opening up the simplified > > version > > > > of > > > > > > the > > > > > > > > KIP > > > > > > > > > > to a > > > > > > > > > > >> > vote. > > > > > > > > > > >> > > > > > > > > > > > >> > Almog > > > > > > > > > > >> > > > > > > > > > > > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra < > > > > > > al...@confluent.io > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > >> > > > > > > > > > > > >> > > I'm mostly happy with your current suggestion (two > > > > > configs, > > > > > > > one > > > > > > > > > for > > > > > > > > > > >> > > serialization and one for deserialization) and > your > > > > > > > > implementation > > > > > > > > > > >> > > suggestion. One thing to note: > > > > > > > > > > >> > > > > > > > > > > > > >> > > > We should _always_ be able to deserialize a > > standard > > > > > JSON > > > > > > > > > > >> > > > number as a decimal > > > > > > > > > > >> > > > > > > > > > > > > >> > > I was doing some research into decimals and JSON, > > and > > > I > > > > > can > > > > > > > > > imagine > > > > > > > > > > a > > > > > > > > > > >> > > compelling reason to require string > representations > > to > > > > > avoid > > > > > > > > > losing > > > > > > > > > > >> > > precision and to be certain that whomever is > sending > > > the > > > > > > data > > > > > > > > > isn't > > > > > > > > > > >> > losing > > > > > > > > > > >> > > precision (e.g. > > > > > > https://stackoverflow.com/a/38357877/2258040 > > > > > > > ). > > > > > > > > > > >> > > > > > > > > > > > > >> > > I'm okay with always allowing numerics, but > thought > > > it's > > > > > > worth > > > > > > > > > > raising > > > > > > > > > > >> > the > > > > > > > > > > >> > > thought. > > > > > > > > > > >> > > > > > > > > > > > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates < > > > > > > > a...@confluent.io> > > > > > > > > > > >> wrote: > > > > > > > > > > >> > > > > > > > > > > > > >> > >> The way I see it, we need to control two seperate > > > > things: > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> 1. How do we _deserialize_ a decimal type if we > > > > > encounter a > > > > > > > > text > > > > > > > > > > >> node in > > > > > > > > > > >> > >> the JSON? (We should _always_ be able to > > > > deserialize a > > > > > > > > > standard > > > > > > > > > > >> JSON > > > > > > > > > > >> > >> number as a decimal). > > > > > > > > > > >> > >> 2. How do we chose how we want decimals to be > > > > > _serialized_. > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> This looks to fits well with your second > suggestion > > > of > > > > > > > slightly > > > > > > > > > > >> > different > > > > > > > > > > >> > >> configs names for serialization vs > deserialization. > > > > > > > > > > >> > >> a, For deserialization we only care about how to > > > handle > > > > > > text > > > > > > > > > > nodes: ` > > > > > > > > > > >> > >> deserialization.decimal.*text*.format`, which > > should > > > > only > > > > > > > have > > > > > > > > > two > > > > > > > > > > >> valid > > > > > > > > > > >> > >> values BINARY | TEXT. > > > > > > > > > > >> > >> b. For serialization we need all three: > > > > > > > > > > >> `serialization.decimal.format`, > > > > > > > > > > >> > >> which should support all three options: BINARY | > > > TEXT | > > > > > > > > NUMERIC. > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> Implementation wise, I think these should be two > > > > separate > > > > > > > > enums, > > > > > > > > > > >> rather > > > > > > > > > > >> > >> than one shared enum and throwing an error if the > > > > > > > deserializer > > > > > > > > is > > > > > > > > > > >> set to > > > > > > > > > > >> > >> NUMERIC. Mainly as this means the enums reflect > > the > > > > > > options > > > > > > > > > > >> available, > > > > > > > > > > >> > >> rather than this being hidden in config checking > > > code. > > > > > But > > > > > > > > > that's > > > > > > > > > > a > > > > > > > > > > >> > minor > > > > > > > > > > >> > >> implementation detail. > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> Personally, I'd be tempted to have the BINARY > value > > > > named > > > > > > > > > something > > > > > > > > > > >> like > > > > > > > > > > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of > encouraging > > > > users > > > > > > to > > > > > > > > move > > > > > > > > > > >> away > > > > > > > > > > >> > >> from > > > > > > > > > > >> > >> it. > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> It's a real shame that both of these settings > > > require a > > > > > > > default > > > > > > > > > of > > > > > > > > > > >> > BINARY > > > > > > > > > > >> > >> for backwards compatibility, but I agree that > > > > > discussions / > > > > > > > > plans > > > > > > > > > > >> around > > > > > > > > > > >> > >> switching the defaults should not block this KIP. > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> Andy > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra < > > > > > > > al...@confluent.io> > > > > > > > > > > >> wrote: > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> > Thanks for the replies Andy and Andrew (2x > > Andy?)! > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > Is the text decimal a base16 encoded number, > or > > > is > > > > it > > > > > > > > base16 > > > > > > > > > > >> encoded > > > > > > > > > > >> > >> > binary > > > > > > > > > > >> > >> > > form of the number? > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > The conversion happens as > > > > > > > > decimal.unscaledValue().toByteArray() > > > > > > > > > > and > > > > > > > > > > >> > then > > > > > > > > > > >> > >> > the byte array is converted to a hex string, so > > > it's > > > > > > > > definitely > > > > > > > > > > the > > > > > > > > > > >> > >> binary > > > > > > > > > > >> > >> > form of the number converted to base16. Whether > > or > > > > not > > > > > > > that's > > > > > > > > > the > > > > > > > > > > >> same > > > > > > > > > > >> > >> as > > > > > > > > > > >> > >> > the base16 encoded number is a good question > > > > > (toByteArray > > > > > > > > > > returns a > > > > > > > > > > >> > byte > > > > > > > > > > >> > >> > array containing a signed, big-endian, two's > > > > complement > > > > > > > > > > >> representation > > > > > > > > > > >> > >> of > > > > > > > > > > >> > >> > the big integer). > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > One suggestion I have is to change the > proposed > > > new > > > > > > > config > > > > > > > > to > > > > > > > > > > >> only > > > > > > > > > > >> > >> affect > > > > > > > > > > >> > >> > > decimals stored as text, i.e. to switch > between > > > the > > > > > > > current > > > > > > > > > > >> base16 > > > > > > > > > > >> > and > > > > > > > > > > >> > >> > the > > > > > > > > > > >> > >> > > more common base10. Then add another config > > to > > > > the > > > > > > > > > serializer > > > > > > > > > > >> only > > > > > > > > > > >> > >> that > > > > > > > > > > >> > >> > > controls if decimals should be serialized as > > text > > > > or > > > > > > > > numeric. > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > I think we need to be able to handle all > mappings > > > > from > > > > > > > > > > >> serialization > > > > > > > > > > >> > >> format > > > > > > > > > > >> > >> > to deserialization format (e.g. read in BINARY > > and > > > > > output > > > > > > > > > TEXT), > > > > > > > > > > >> > which I > > > > > > > > > > >> > >> > think would be impossible with the alternative > > > > > > suggestion. > > > > > > > I > > > > > > > > > > agree > > > > > > > > > > >> > that > > > > > > > > > > >> > >> > automatically deserializing numerics is > > valuable. I > > > > see > > > > > > two > > > > > > > > > other > > > > > > > > > > >> ways > > > > > > > > > > >> > >> to > > > > > > > > > > >> > >> > get this, both keeping the serialization.format > > > > config > > > > > > the > > > > > > > > > same: > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > - have json.decimal.deserialization.format > accept > > > all > > > > > > three > > > > > > > > > > >> formats. > > > > > > > > > > >> > if > > > > > > > > > > >> > >> set > > > > > > > > > > >> > >> > to BINARY/TEXT, numerics would be automatically > > > > > > supported. > > > > > > > If > > > > > > > > > set > > > > > > > > > > >> to > > > > > > > > > > >> > >> > NUMERIC, then any string coming in would result > > in > > > > > > > > > > deserialization > > > > > > > > > > >> > error > > > > > > > > > > >> > >> > (defaults to BINARY for backwards > compatibility) > > > > > > > > > > >> > >> > - change json.decimal.deserialization.format to > > > > > > > > > > >> > >> > json.decimal.deserialization.string.format > which > > > > > accepts > > > > > > > only > > > > > > > > > > >> > >> BINARY/TEXT > > > > > > > > > > >> > >> > (defaults to BINARY for backwards > compatibility) > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > would be a breaking change in that things > that > > > > > > previously > > > > > > > > > > failed > > > > > > > > > > >> > would > > > > > > > > > > >> > >> > > suddenly start deserializing. This is a > price > > > I'm > > > > > > > willing > > > > > > > > to > > > > > > > > > > >> pay. > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > I agree. I'm willing to pay this price too. > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > IMHO, we should then plan to switch the > default > > > of > > > > > > > decimal > > > > > > > > > > >> > >> serialization > > > > > > > > > > >> > >> > to > > > > > > > > > > >> > >> > > numeric, and text serialization to base 10 in > > the > > > > > next > > > > > > > > major > > > > > > > > > > >> > release. > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > I think that can be a separate discussion, I > > don't > > > > want > > > > > > to > > > > > > > > > block > > > > > > > > > > >> this > > > > > > > > > > >> > >> KIP > > > > > > > > > > >> > >> > on it. > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > Thoughts? > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto < > > > > > > > > > o...@wikimedia.org> > > > > > > > > > > >> > wrote: > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > This is a bit orthogonal, but in > > > > JsonSchemaConverter > > > > > I > > > > > > > use > > > > > > > > > > >> > >> JSONSchemas to > > > > > > > > > > >> > >> > > indicate whether a JSON number should be > > > > deserialized > > > > > > as > > > > > > > an > > > > > > > > > > >> integer > > > > > > > > > > >> > >> or a > > > > > > > > > > >> > >> > > decimal > > > > > > > > > > >> > >> > > < > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261 > > > > > > > > > > >> > >> > > >. > > > > > > > > > > >> > >> > > Not everyone is going to have JSONSchemas > > > available > > > > > > when > > > > > > > > > > >> converting, > > > > > > > > > > >> > >> but > > > > > > > > > > >> > >> > if > > > > > > > > > > >> > >> > > you do, it is an easy way to support JSON > > numbers > > > > as > > > > > > > > > decimals. > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > Carry on! :) > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates < > > > > > > > > > a...@confluent.io > > > > > > > > > > > > > > > > > > > > > >> > >> wrote: > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > > Hi Almog, > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > Like the KIP - I think being able to > support > > > > > decimals > > > > > > > in > > > > > > > > > JSON > > > > > > > > > > >> in > > > > > > > > > > >> > the > > > > > > > > > > >> > >> > same > > > > > > > > > > >> > >> > > > way most other systems do is a great > > > improvement. > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > It's not 100% clear to me from the KIP what > > the > > > > > > current > > > > > > > > > > format > > > > > > > > > > >> is. > > > > > > > > > > >> > >> Is > > > > > > > > > > >> > >> > > the > > > > > > > > > > >> > >> > > > text decimal a base16 encoded number, or is > > it > > > > > base16 > > > > > > > > > encoded > > > > > > > > > > >> > binary > > > > > > > > > > >> > >> > form > > > > > > > > > > >> > >> > > > of the number? (I've not tried to get my > head > > > > > around > > > > > > if > > > > > > > > > these > > > > > > > > > > >> two > > > > > > > > > > >> > >> are > > > > > > > > > > >> > >> > > even > > > > > > > > > > >> > >> > > > different!) > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > One suggestion I have is to change the > > proposed > > > > new > > > > > > > > config > > > > > > > > > to > > > > > > > > > > >> only > > > > > > > > > > >> > >> > affect > > > > > > > > > > >> > >> > > > decimals stored as text, i.e. to switch > > between > > > > the > > > > > > > > current > > > > > > > > > > >> base16 > > > > > > > > > > >> > >> and > > > > > > > > > > >> > >> > > the > > > > > > > > > > >> > >> > > > more common base10. Then add another > config > > > to > > > > > the > > > > > > > > > > serialzier > > > > > > > > > > >> > only > > > > > > > > > > >> > >> > that > > > > > > > > > > >> > >> > > > controls if decimals should be serialized > as > > > text > > > > > or > > > > > > > > > numeric. > > > > > > > > > > >> The > > > > > > > > > > >> > >> > > benefit > > > > > > > > > > >> > >> > > > of this approach is it allows us to enhance > > the > > > > > > > > > deserializer > > > > > > > > > > to > > > > > > > > > > >> > >> > > > automatically handle numeric decimals even > > > > without > > > > > > any > > > > > > > > > config > > > > > > > > > > >> > >> having to > > > > > > > > > > >> > >> > > be > > > > > > > > > > >> > >> > > > set, i.e. default config in the > deserializer > > > > would > > > > > be > > > > > > > > able > > > > > > > > > to > > > > > > > > > > >> > handle > > > > > > > > > > >> > >> > > > numeric decimals. Of course, this is a two > > > edged > > > > > > > sword: > > > > > > > > > this > > > > > > > > > > >> > would > > > > > > > > > > >> > >> > make > > > > > > > > > > >> > >> > > > the deserializer work out of the box with > > > numeric > > > > > > > > decimals, > > > > > > > > > > >> > (yay!), > > > > > > > > > > >> > >> but > > > > > > > > > > >> > >> > > > would be a breaking change in that things > > that > > > > > > > previously > > > > > > > > > > >> failed > > > > > > > > > > >> > >> would > > > > > > > > > > >> > >> > > > suddenly start deserializing. This is a > > price > > > > I'm > > > > > > > > willing > > > > > > > > > to > > > > > > > > > > >> pay. > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > IMHO, we should then plan to switch the > > default > > > > of > > > > > > > > decimal > > > > > > > > > > >> > >> > serialization > > > > > > > > > > >> > >> > > to > > > > > > > > > > >> > >> > > > numeric, and text serialization to base 10 > in > > > the > > > > > > next > > > > > > > > > major > > > > > > > > > > >> > >> release. > > > > > > > > > > >> > >> > > > (With upgrade notes to match). Though I > know > > > this > > > > > is > > > > > > > more > > > > > > > > > > >> > >> contentious, > > > > > > > > > > >> > >> > I > > > > > > > > > > >> > >> > > > think it moves us forward in a much more > > > standard > > > > > way > > > > > > > > that > > > > > > > > > > the > > > > > > > > > > >> > >> current > > > > > > > > > > >> > >> > > > encoding of decimals. > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra < > > > > > > > > > > al...@confluent.io> > > > > > > > > > > >> > >> wrote: > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > Hi Everyone! > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > Kicking off discussion for a new KIP: > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > For those who are interested, I have a > > > > prototype > > > > > > > > > > >> implementation > > > > > > > > > > >> > >> that > > > > > > > > > > >> > >> > > > helped > > > > > > > > > > >> > >> > > > > guide my design: > > > > > > > > https://github.com/agavra/kafka/pull/1 > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > Cheers, > > > > > > > > > > >> > >> > > > > Almog > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >