Hi,

I think the more stable option would be the first one, as it also gives you
more flexibility. Reading the row as string and then parsing it in a query
definitely costs more, and makes less straightforward to use the other
Schema features of table, such as watermark definition, primary keys, etc.

I guess you can implement it straightforwardly subclassing the existing
json format provided by flink, in particular
JsonRowDataDeserializationSchema.

A third solution would be to create a SplitFunction, like the one you
created, which directly performs the parsing, outputting rows rather than
strings. This removes the double parsing issue, but still create problems
when interacting with other schema features.

Hope it helps,
FG

On Thu, Feb 3, 2022 at 3:56 PM Илья Соин <ilya.soin...@gmail.com> wrote:

> Hi,
>
> I’m using the Table / SQL API.
>
> I have a stream of strings, where each message contains several json
> strings separated by "\n”.
> For example:
> {“timestamp”: “2021-01-01T00:00:00”, person: {“name”: “Vasya”}}\n
> {“timestamp”: “2021-01-01T01:00:00”, person: {“name”: “Max” }}
>
> I would like to split each message by “\n”, parse each string as a json
> object and get some of the fields.
>
> AFIK there are 2 ways to do it:
>
> 1) Write custom deserialiser and provide it in source table DDL, i.e.
> CREATE TABLE source (
>     timestamp STRING,
>     person: ROW(name STRING)
> )
> WITH(‘format’ = ‘multiline-json’, …);
>
> 2) Use ‘format’ = ‘raw’ and extract the needed fields using .jsonValue,
> i.e.
>
> CREATE TABLE source (
>     row STRING
> );
>
> env.from("source")
>         .joinLateral(
>             call(SplitFunction.class, $("row"), "\n").as(“msg")
>         )
>         .select(
>              $("msg").jsonValue("$.timestamp", DataTypes.STRING()),
>              $("msg").jsonValue(“$.person.name",
> DataTypes.STRING()).as(“name”)
>        );
>
> In 2), will each call of .jsonValue parse the string all over again or
> will it reuse the same JsonNode object internally? Which option better fits
> my problem?
>
> __
> Best, Ilya

Reply via email to