I'm not sure if I got your question right. Do you want to know if it is possible to implement a Flink program that reads several files and writes their data into a Parquet format? Or are you asking how such a job could be scheduled for execution based on some external event (such as a file appearing)?
Both should be possible. The job would be a simple pipeline with or without some transformations depending on the required logic and a Parquet data sink. The job execution can be triggered from outside of Flink for example using a monitoring process or a cron job that calls the CLI client with the right parameters. Best, Fabian 2015-05-22 14:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > Hi to all, > > in my use case I have bursts of data to store into hdfs and once finished, > compact them into a single directory (as Parquet). From what I know, the > current approach is to use Flume that automatically ingest data and compact > them based on some configurable policy. > However I'd like to avoid to add Flume to my architecture because these > bursts are not long lived processed so I just want to write a batch of rows > as a single file in some directory, and once the process finish, i want to > read all of them and compact into a single output directory as Parquet. > It's something similar to a streaming process but (for the moment) I'd > like to avoid to have a long lived Flink process listening for incoming > data. > > Do you have any suggestion for such a process or is there any example in > Flink code? > > > Best, > Flavio >