Re: Custom source with multiple, differently typed outputs

Arvid Heise Thu, 11 Feb 2021 03:50:18 -0800

Hi Roman,

In general, the use of inconsistent types is discouraged but there is
little that you can do on your end.

I think your approach with SourceFunction is good but I'd probably not use
Row already but rather some POJO or source format record. Note, that I have
never seen side-outputs in a source, so please check if it's working
properly. If not, you can probably do the same with a chained map with
almost no overhead. Then you'd probably need to use an intermediate data
type that is the union of the different schemas. So if you have Person and
Purchase records intermixed, you'd use a PersonOrPurchase intermediate type.

Then, to convert to a Table, please check the docs on how to map the data
types [1].

I'm assuming it is also possible to directly work with a Row although I
haven't done that. But note that in general you cannot provide the
TypeInformation dynamically, it has to be known when you convert to a
Table. In that case, it might be easier to just have a POJO for each
possible type.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/common.html#mapping-of-data-types-to-table-schema

On Tue, Feb 9, 2021 at 5:32 PM Roman Karlstetter <
roman.karlstet...@gmail.com> wrote:

> Hi everyone,
>
> I want to connect to a proprietary data stream, which sends different
> types of messages (maybe interpreted as a table), intertwined in the
> stream. Every type of message (or table) can have a different schema, but
> for each type this schema is known when connecting (i.e., at runtime) and
> does not change.
>
> I'm new to flink, so I have a few (stupid?) questions about this use case.
> I have created a custom SourceFunction which produces Rows read from this
> data stream. Then I use side outputs to split up this stream into multiple
> DataStream[Row]. Is this the right way to do it?
> What's the best way to add custom TypeInformation[Row] to each of those
> streams, so that I can easily map this to a table which can be accessed via
> the Table API? Or would I rather directly implement a ScanTableSource (I
> played with this, the SourceFunction approach was easier though)? I believe
> that Row is the best way to create this kind of schema at runtime, or is
> there a better alternative?
>
> Kind regards
> Roman
>
>
>
>
>
>

Re: Custom source with multiple, differently typed outputs

Reply via email to