> Also thanks for showing me your pattern with the SchemaConversions and stuff. Feels pretty clean and worked like a charm :) Glad to hear it, that is very cool!
> converts number to double always. I wonder, did you make this up? Yes, we chose the the mapping. We chose to do number -> double and integer -> bigint because both of those are wider than their float/int counterparts, meaning that double and integer will work in more cases. Of course, this is not an optimal usage of bits, but at least things won't break. > all kinds of fields like double, float, big decimal… they all get mapped to number by my converter It is possible to make some non-JSONSchema convention in the JSONSchema to map to more specific types. This is done for example with format: date-time in our code, to map from a ISO-8601 string to a timestamp. I just did a quick google to find some example of someone else already doing this and found this doc from IBM <https://www.ibm.com/docs/en/cics-ts/5.3?topic=mapping-json-schema-c-c> saying they use JSONSchema's format to specify a float, like type: number format: float This seems like a pretty good idea to me, and we should probably do this at WMF too! However, it would be a custom convention, and not in the JSONSchema spec itself, so when you convert back to a JSONSchema, you'd have to codify this convention to do so (and nothing outside of your code would really respect it). On Tue, Nov 15, 2022 at 4:23 AM Theodor Wübker <theo.wueb...@inside-m2m.de> wrote: > Yes, you are right. Schemas are not so nice in Json. When implementing and > testing my converter from DataType to JsonSchema I noticed that your > converter from JsonSchema to DataType converts number to double always. I > wonder, did you make this up? Because the table that specifies the mapping > <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/> > only > does it for DataType -> JsonSchema. > > Its generally unfortunate that json schema only offers so little > possibility to specify type information… now when I have a Flink DataType > with all kinds of fields like double, float, big decimal… they all get > mapped to number by my converter - in return when I use yours they are all > mapped to a Flink Datatype double again. So I lose a lot of precision. > > I guess for my application it would in general be better to use Avro or > Protobuf, since they retain a lot more type information when you convert > them back and forth… > Also thanks for showing me your pattern with the SchemaConversions and > stuff. Feels pretty clean and worked like a charm :) > > -Theo > > > On 10. Nov 2022, at 15:02, Andrew Otto <o...@wikimedia.org> wrote: > > > I find it interesting that the Mapping from DataType to AvroSchema > does exist in Flink (see AvroSchemaConverter), but for all the other > formats there is no such Mapping, > Yah, but I guess for JSON, there isn't a clear 'schema' to be had. There > of course is JSONSchema, but it isn't a real java-y type system; it's just > more JSON for which there exist validators. > > > > On Thu, Nov 10, 2022 at 2:12 AM Theodor Wübker <theo.wueb...@inside-m2m.de> > wrote: > >> Great, I will have a closer look at what you sent. Your idea seems very >> good, it would be a very clean solution to be able to plug in different >> SchemaConversions that a (Row) DataType can be mapped to. I will probably >> try to implement it like this. I find it interesting that the Mapping from >> DataType to AvroSchema does exist in Flink (see AvroSchemaConverter), but >> for all the other formats there is no such Mapping. Maybe this would be >> something that would interest more people, so I when I am finished perhaps >> I can suggest putting the solution into the flink-json and flink-protobuf >> packages. >> >> -Theo >> >> On 9. Nov 2022, at 21:24, Andrew Otto <o...@wikimedia.org> wrote: >> >> Interesting, yeah I think you'll have to implement code to recurse >> through the (Row) DataType and somehow auto generate the JSONSchema you >> want. >> >> We abstracted the conversions from JSONSchema to other type systems in >> this JsonSchemaConverter >> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/types/JsonSchemaConverter.java>. >> There's nothing special going on here, I've seen versions of this schema >> conversion code over and over again in different frameworks. This one just >> allows us to plug in a SchemaConversions >> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/types/SchemaConversions.java> >> implementation >> to provide the mappings to the output type system (like the Flink DataType >> mappings >> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/DataTypeSchemaConversions.java> >> I >> linked to before), rather than hardcoding the output types. >> >> If I were trying to do what you are doing (in our codebase)...I'd create >> a Flink DataTypeConverter<T> that iterated through a (Row) DataType and a >> SchemaConversions<JsonNode> implementation that mapped to the JsonNode that >> represented the JSONSchema. (If not using Jackson...then you could use >> another Java JSON object than JsonNode). >> You could also make a SchemaConversions<ProtobufSchema> (with whatever >> Protobuf class to use...I'm not familiar with Protobuf) and then use the >> same DataTypeConverter to convert to ProtobufSchema. AND THEN...I'd >> wonder if the input schema recursion code itself could be abstracted too so >> that it would work for either JsonSchema OR DataType OR whatever but anyway >> that is probably too crazy and too much for what you are doing...but it >> would be cool! :p >> >> >> >> >> >> On Wed, Nov 9, 2022 at 9:52 AM Theodor Wübker <theo.wueb...@inside-m2m.de> >> wrote: >> >>> I want to register the result-schema in a schema registry, as I am >>> pushing the result-data to a Kafka topic. The result-schema is not known at >>> compile-time, so I need to find a way to compute it at runtime from the >>> resulting Flink Schema. >>> >>> -Theo >>> >>> (resent - again sorry, I forgot to add the others in the cc) >>> >>> On 9. Nov 2022, at 14:59, Andrew Otto <o...@wikimedia.org> wrote: >>> >>> > I want to convert the schema of a Flink table to both Protobuf >>> *schema* and JSON *schema* >>> Oh, you want to convert from Flink Schema TO JSONSchema? Interesting. >>> That would indeed be something that is not usually done. Just curious, why >>> do you want to do this? >>> >>> On Wed, Nov 9, 2022 at 8:46 AM Andrew Otto <o...@wikimedia.org> wrote: >>> >>>> Hello! >>>> >>>> I see you are talking about JSONSchema, not just JSON itself. >>>> >>>> We're trying to do a similar thing at Wikimedia and have developed some >>>> tooling around this. >>>> >>>> JsonSchemaFlinkConverter >>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/JsonSchemaFlinkConverter.java> >>>> has some logic to convert from JSONSchema Jackson ObjectNodes to Flink >>>> Table DataType or Table SchemaBuilder, or Flink DataStream >>>> TypeInformation[Row]. Some of the conversions from JSONSchema to Flink >>>> type are opinionated. You can see the mappings here >>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/DataTypeSchemaConversions.java> >>>> . >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Nov 9, 2022 at 2:33 AM Theodor Wübker < >>>> theo.wueb...@inside-m2m.de> wrote: >>>> >>>>> Thanks for your reply Yaroslav! The way I do it with Avro seems >>>>> similar to what you pointed out: >>>>> >>>>> ResolvedSchema resultSchema = resultTable.getResolvedSchema(); >>>>> DataType type = resultSchema.toSinkRowDataType(); >>>>> org.apache.avro.Schema converted = >>>>> AvroSchemaConverter.convertToSchema(type.getLogicalType()); >>>>> >>>>> I mentioned the ResolvedSchema because it is my starting point after >>>>> the SQL operation. It seemed to me that I can not retrieve something that >>>>> contains more schema information from the table so I got myself this. >>>>> About >>>>> your other answers: It seems the classes you mentioned can be used to >>>>> serialize actual Data? However this is not quite what I want to do. >>>>> Essentially I want to convert the schema of a Flink table to both >>>>> Protobuf *schema* and JSON *schema* (for Avro as you can see I have >>>>> it already). It seems odd that this is not easily possible, because >>>>> converting from a JSON schema to a Schema of Flink is possible using the >>>>> JsonRowSchemaConverter. However the other way is not implemented it seems. >>>>> This is how I got a Table Schema (that I can use in a table descriptor) >>>>> from a JSON schema: >>>>> >>>>> TypeInformation<Row> type = JsonRowSchemaConverter.convert(json); >>>>> DataType row = TableSchema.fromTypeInfo(type).toPhysicalRowDataType(); >>>>> Schema schema = Schema.newBuilder().fromRowDataType(row).build(); >>>>> >>>>> Sidenote: I use deprecated methods here, so if there is a better >>>>> approach please let me know! But it shows that in Flink its easily >>>>> possible >>>>> to create a Schema for a TableDescriptor from a JSON Schema - the other >>>>> way >>>>> is just not so trivial it seems. And for Protobuf so far I don’t have any >>>>> solutions, not even creating a Flink Schema from a Protobuf Schema - not >>>>> to >>>>> mention the other way around. >>>>> >>>>> -Theo >>>>> >>>>> (resent because I accidentally only responded to you, not the Mailing >>>>> list - sorry) >>>>> >>>>> >>> >> >