Hi to all, in my use case I have bursts of data to store into hdfs and once finished, compact them into a single directory (as Parquet). From what I know, the current approach is to use Flume that automatically ingest data and compact them based on some configurable policy. However I'd like to avoid to add Flume to my architecture because these bursts are not long lived processed so I just want to write a batch of rows as a single file in some directory, and once the process finish, i want to read all of them and compact into a single output directory as Parquet. It's something similar to a streaming process but (for the moment) I'd like to avoid to have a long lived Flink process listening for incoming data.
Do you have any suggestion for such a process or is there any example in Flink code? Best, Flavio