Hi Guodong, I think you almost get the answer, 1. map type, it's not working for current implementation. For example, use map<varchar, varchar>, if the value if non-string json object, then `JsonNode.asText()` may not work as you wish. 2. list all fields you cares. IMO, this can fit your scenario. And you can set format.fail-on-missing-field = true, to allow setting non-existed fields to be null.
For 1, I think maybe we can support it in the future, and I've created jira[1] to track this. [1] https://issues.apache.org/jira/browse/FLINK-18002 Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道: > Hi ! > > I want to use Flink SQL to process some json events. It is quite > challenging to define a schema for the Flink SQL table. > > My data source's format is some json like this > { > "top_level_key1": "some value", > "nested_object": { > "nested_key1": "abc", > "nested_key2": 123, > "nested_key3": ["element1", "element2", "element3"] > } > } > > The big challenges for me to define a schema for the data source are > 1. the keys in nested_object are flexible, there might be 3 unique keys or > more unique keys. If I enumerate all the keys in the schema, I think my > code is fragile, how to handle event which contains more nested_keys in > nested_object ? > 2. I know table api support Map type, but I am not sure if I can put > generic object as the value of the map. Because the values in nested_object > are of different types, some of them are int, some of them are string or > array. > > So. how to expose this kind of json data as table in Flink SQL without > enumerating all the nested_keys? > > Thanks. > > Guodong > -- Best, Benchao Li