Yes. Setting the value type as raw is one possible approach. And I would like to vote for schema inference as well.
Correct me if I am wrong, IMO schema inference means I can provide a method in the table source to infer the data schema base on the runtime computation. Just like some calcite adaptor does. Right? For SQL table registration, I think that requiring the table source to provide a static schema might be too strict. Let planner to infer the table schema will be more flexible. Thank you for your suggestions. Guodong On Thu, May 28, 2020 at 11:11 PM Benchao Li <libenc...@gmail.com> wrote: > Hi Guodong, > > Does the RAW type meet your requirements? For example, you can specify > map<varchar, raw> type, and the value for the map is the raw JsonNode > parsed from Jackson. > This is not supported yet, however IMO this could be supported. > > Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午9:43写道: > >> Benchao, >> >> Thank you for your quick reply. >> >> As you mentioned, for current scenario, approach 2 should work for me. >> But it is a little bit annoying that I have to modify schema to add new >> field types when upstream app changes the json format or adds new fields. >> Otherwise, my user can not refer the field in their SQL. >> >> Per description in the jira, I think after implementing this, all the >> json values will be converted as strings. >> I am wondering if Flink SQL can/will support the flexible schema in the >> future, for example, register the table without defining specific schema >> for each field, to let user define a generic map or array for one field. >> but the value of map/array can be any object. Then, the type conversion >> cost might be saved. >> >> Guodong >> >> >> On Thu, May 28, 2020 at 7:43 PM Benchao Li <libenc...@gmail.com> wrote: >> >>> Hi Guodong, >>> >>> I think you almost get the answer, >>> 1. map type, it's not working for current implementation. For example, >>> use map<varchar, varchar>, if the value if non-string json object, then >>> `JsonNode.asText()` may not work as you wish. >>> 2. list all fields you cares. IMO, this can fit your scenario. And you >>> can set format.fail-on-missing-field = true, to allow setting non-existed >>> fields to be null. >>> >>> For 1, I think maybe we can support it in the future, and I've created >>> jira[1] to track this. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-18002 >>> >>> Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道: >>> >>>> Hi ! >>>> >>>> I want to use Flink SQL to process some json events. It is quite >>>> challenging to define a schema for the Flink SQL table. >>>> >>>> My data source's format is some json like this >>>> { >>>> "top_level_key1": "some value", >>>> "nested_object": { >>>> "nested_key1": "abc", >>>> "nested_key2": 123, >>>> "nested_key3": ["element1", "element2", "element3"] >>>> } >>>> } >>>> >>>> The big challenges for me to define a schema for the data source are >>>> 1. the keys in nested_object are flexible, there might be 3 unique keys >>>> or more unique keys. If I enumerate all the keys in the schema, I think my >>>> code is fragile, how to handle event which contains more nested_keys in >>>> nested_object ? >>>> 2. I know table api support Map type, but I am not sure if I can put >>>> generic object as the value of the map. Because the values in nested_object >>>> are of different types, some of them are int, some of them are string or >>>> array. >>>> >>>> So. how to expose this kind of json data as table in Flink SQL without >>>> enumerating all the nested_keys? >>>> >>>> Thanks. >>>> >>>> Guodong >>>> >>> >>> >>> -- >>> >>> Best, >>> Benchao Li >>> >> > > -- > > Best, > Benchao Li >