Hi, 

I’m using the Table / SQL API. 

I have a stream of strings, where each message contains several json strings 
separated by "\n”. 
For example:
{“timestamp”: “2021-01-01T00:00:00”, person: {“name”: “Vasya”}}\n 
{“timestamp”: “2021-01-01T01:00:00”, person: {“name”: “Max” }}

I would like to split each message by “\n”, parse each string as a json object 
and get some of the fields. 

AFIK there are 2 ways to do it:

1) Write custom deserialiser and provide it in source table DDL, i.e. 
CREATE TABLE source (
    timestamp STRING,
    person: ROW(name STRING)
)
WITH(‘format’ = ‘multiline-json’, …);

2) Use ‘format’ = ‘raw’ and extract the needed fields using .jsonValue, i.e.

CREATE TABLE source (
    row STRING
);

env.from("source")
        .joinLateral(
            call(SplitFunction.class, $("row"), "\n").as(“msg")
        )
        .select(
             $("msg").jsonValue("$.timestamp", DataTypes.STRING()),
             $("msg").jsonValue(“$.person.name", DataTypes.STRING()).as(“name”)
       );

In 2), will each call of .jsonValue parse the string all over again or will it 
reuse the same JsonNode object internally? Which option better fits my problem?

__
Best, Ilya

Reply via email to