Re: How to create schema for flexible json data in Flink SQL

Guodong Wang Thu, 28 May 2020 08:40:47 -0700

Yes. Setting the value type as raw is one possible approach. And I would
like to vote for schema inference as well.


Correct me if I am wrong, IMO schema inference means I can provide a method
in the table source to infer the data schema base on the runtime
computation. Just like some calcite adaptor does. Right?
For SQL table registration, I think that requiring the table source to
provide a static schema might be too strict. Let planner to infer the table
schema will be more flexible.

Thank you for your suggestions.

Guodong


On Thu, May 28, 2020 at 11:11 PM Benchao Li <libenc...@gmail.com> wrote:

> Hi Guodong,
>
> Does the RAW type meet your requirements? For example, you can specify
> map<varchar, raw> type, and the value for the map is the raw JsonNode
> parsed from Jackson.
> This is not supported yet, however IMO this could be supported.
>
> Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午9:43写道：
>
>> Benchao,
>>
>> Thank you for your quick reply.
>>
>> As you mentioned, for current scenario, approach 2 should work for me.
>> But it is a little bit annoying that I have to modify schema to add new
>> field types when upstream app changes the json format or adds new fields.
>> Otherwise, my user can not refer the field in their SQL.
>>
>> Per description in the jira, I think after implementing this, all the
>> json values will be converted as strings.
>> I am wondering if Flink SQL can/will support the flexible schema in the
>> future, for example, register the table without defining specific schema
>> for each field, to let user define a generic map or array for one field.
>> but the value of map/array can be any object. Then, the type conversion
>> cost might be saved.
>>
>> Guodong
>>
>>
>> On Thu, May 28, 2020 at 7:43 PM Benchao Li <libenc...@gmail.com> wrote:
>>
>>> Hi Guodong,
>>>
>>> I think you almost get the answer,
>>> 1. map type, it's not working for current implementation. For example,
>>> use map<varchar, varchar>, if the value if non-string json object, then
>>> `JsonNode.asText()` may not work as you wish.
>>> 2. list all fields you cares. IMO, this can fit your scenario. And you
>>> can set format.fail-on-missing-field = true, to allow setting non-existed
>>> fields to be null.
>>>
>>> For 1, I think maybe we can support it in the future, and I've created
>>> jira[1] to track this.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-18002
>>>
>>> Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道：
>>>
>>>> Hi !
>>>>
>>>> I want to use Flink SQL to process some json events. It is quite
>>>> challenging to define a schema for the Flink SQL table.
>>>>
>>>> My data source's format is some json like this
>>>> {
>>>> "top_level_key1": "some value",
>>>> "nested_object": {
>>>> "nested_key1": "abc",
>>>> "nested_key2": 123,
>>>> "nested_key3": ["element1", "element2", "element3"]
>>>> }
>>>> }
>>>>
>>>> The big challenges for me to define a schema for the data source are
>>>> 1. the keys in nested_object are flexible, there might be 3 unique keys
>>>> or more unique keys. If I enumerate all the keys in the schema, I think my
>>>> code is fragile, how to handle event which contains more  nested_keys in
>>>> nested_object ?
>>>> 2. I know table api support Map type, but I am not sure if I can put
>>>> generic object as the value of the map. Because the values in nested_object
>>>> are of different types, some of them are int, some of them are string or
>>>> array.
>>>>
>>>> So. how to expose this kind of json data as table in Flink SQL without
>>>> enumerating all the nested_keys?
>>>>
>>>> Thanks.
>>>>
>>>> Guodong
>>>>
>>>
>>>
>>> --
>>>
>>> Best,
>>> Benchao Li
>>>
>>
>
> --
>
> Best,
> Benchao Li
>

Re: How to create schema for flexible json data in Flink SQL

Reply via email to