Thank you, Francesco
> On 3 Feb 2022, at 18:21, Francesco Guardiani <france...@ververica.com> wrote:
>
> Hi,
>
> I think the more stable option would be the first one, as it also gives you
> more flexibility. Reading the row as string and then parsing it in a query
> definitely costs more, and makes less straightforward to use the other Schema
> features of table, such as watermark definition, primary keys, etc.
>
> I guess you can implement it straightforwardly subclassing the existing json
> format provided by flink, in particular JsonRowDataDeserializationSchema.
>
> A third solution would be to create a SplitFunction, like the one you
> created, which directly performs the parsing, outputting rows rather than
> strings. This removes the double parsing issue, but still create problems
> when interacting with other schema features.
>
> Hope it helps,
> FG
>
> On Thu, Feb 3, 2022 at 3:56 PM Илья Соин <ilya.soin...@gmail.com
> <mailto:ilya.soin...@gmail.com>> wrote:
> Hi,
>
> I’m using the Table / SQL API.
>
> I have a stream of strings, where each message contains several json strings
> separated by "\n”.
> For example:
> {“timestamp”: “2021-01-01T00:00:00”, person: {“name”: “Vasya”}}\n
> {“timestamp”: “2021-01-01T01:00:00”, person: {“name”: “Max” }}
>
> I would like to split each message by “\n”, parse each string as a json
> object and get some of the fields.
>
> AFIK there are 2 ways to do it:
>
> 1) Write custom deserialiser and provide it in source table DDL, i.e.
> CREATE TABLE source (
> timestamp STRING,
> person: ROW(name STRING)
> )
> WITH(‘format’ = ‘multiline-json’, …);
>
> 2) Use ‘format’ = ‘raw’ and extract the needed fields using .jsonValue, i.e.
>
> CREATE TABLE source (
> row STRING
> );
>
> env.from("source")
> .joinLateral(
> call(SplitFunction.class, $("row"), "\n").as(“msg")
> )
> .select(
> $("msg").jsonValue("$.timestamp", DataTypes.STRING()),
> $("msg").jsonValue(“$.person.name <http://person.name/>",
> DataTypes.STRING()).as(“name”)
> );
>
> In 2), will each call of .jsonValue parse the string all over again or will
> it reuse the same JsonNode object internally? Which option better fits my
> problem?
>
> __
> Best, Ilya