Hi to all,

in my use case I have bursts of data to store into hdfs and once finished,
compact them into a single directory (as Parquet). From what I know, the
current approach is to use Flume that automatically ingest data and compact
them based on some configurable policy.
However I'd like to avoid to add Flume to my architecture because these
bursts are not long lived processed so I just want to write a batch of rows
as a single file in some directory, and once the process finish, i want to
read all of them and compact into a single output directory as Parquet.
It's something similar to a streaming process but (for the moment) I'd like
to avoid to have a long lived Flink process listening for incoming data.

Do you have any suggestion for such a process or is there any example in
Flink code?


Best,
Flavio

Reply via email to