Hi Giacomo,

I think you can try using Flink SQL connector. For JSON input such as {"a":
1, "b": {"c": 2, {"d": 3}}}, you can do:

CREATE TABLE data (
  a INT,
  b ROW<c INT, ROW<d INT>>
) WITH (...)

Let me know if that helps.

Best,
Yik San

On Wed, Apr 14, 2021 at 2:00 AM <g.g.m.5...@web.de> wrote:

> Hi,
> I'm new to Flink and I am trying to create a stream from locally
> downloaded tweets. The tweets are in json format, like in this example:
>
> {"data":{"text":"Polsek Kakas Cegah Covid-19 https://t.co/ADjEgpt7bC
> <https://deref-web.de/mail/client/2UPTbOw73vE/dereferrer/?redirectUrl=https%3A%2F%2Ft.co%2FADjEgpt7bC>
> ","public_metrics":"retweet_count":0,"reply_count":0,"like_count":0,"quote_count":0},
> "author_id":"1367839185764151302","id":"1378275866279469059","created_at":"2021-04-03T09:19:08.000Z","source":"Twitter
> for Android","lang":"in"},
> "includes":{"users":[{"protected":false,"id":"1367839185764151302","name":"Nathan
> Pareda","created_at":"2021-03-05T14:07:56.000Z",
>
> "public_metrics":{"followers_count":0,"following_count":0,"tweet_count":557,"listed_count":0},
>
> "username":"NathanPareda"}]},"matching_rules":[{"id":1378112825051246596,"tag":"coronavirus"}]}
>
> I would like to do it in Python using Pyflink, but could also use Java if
> there is no reasonable way to do it in Python. I've been looking at
> different options for loading these objects into a stream, but am not sure
> what to do. Here's my situation so far:
>
> 1. There doesn't seem to be a fitting connector. The filesystem-connector
> doesn't seem to support json format.
> 2. I've seen in the archive of this mailing list that some reccomend to
> use the Table API. But I am not sure if this is a viable option given how
> nested the json objects are.
> 3. I could of course try to implement a custom DataSource, but that seems
> to be quite difficult so I'd only consider this if there's no other way.
>
> I'll be very grateful for any kind of input.
> Cheers,
> Giacomo
>
>

Reply via email to