Re: How to create schema for flexible json data in Flink SQL

Benchao Li Thu, 28 May 2020 04:43:24 -0700

Hi Guodong,

I think you almost get the answer,
1. map type, it's not working for current implementation. For example, use
map<varchar, varchar>, if the value if non-string json object, then
`JsonNode.asText()` may not work as you wish.
2. list all fields you cares. IMO, this can fit your scenario. And you can
set format.fail-on-missing-field = true, to allow setting non-existed
fields to be null.


For 1, I think maybe we can support it in the future, and I've created
jira[1] to track this.

[1] https://issues.apache.org/jira/browse/FLINK-18002

Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道：

> Hi !
>
> I want to use Flink SQL to process some json events. It is quite
> challenging to define a schema for the Flink SQL table.
>
> My data source's format is some json like this
> {
> "top_level_key1": "some value",
> "nested_object": {
> "nested_key1": "abc",
> "nested_key2": 123,
> "nested_key3": ["element1", "element2", "element3"]
> }
> }
>
> The big challenges for me to define a schema for the data source are
> 1. the keys in nested_object are flexible, there might be 3 unique keys or
> more unique keys. If I enumerate all the keys in the schema, I think my
> code is fragile, how to handle event which contains more  nested_keys in
> nested_object ?
> 2. I know table api support Map type, but I am not sure if I can put
> generic object as the value of the map. Because the values in nested_object
> are of different types, some of them are int, some of them are string or
> array.
>
> So. how to expose this kind of json data as table in Flink SQL without
> enumerating all the nested_keys?
>
> Thanks.
>
> Guodong
>


-- 

Best,
Benchao Li

Re: How to create schema for flexible json data in Flink SQL

Reply via email to