Re: Batch compressed file output

2020-11-27 Thread Matthias Pohl
Hi Flavio, others might have better ideas to solve this but I'll give it a try: Have you considered extending FileOutputFormat to achieve what you need? That approach (which is discussed in [1]) sounds like something you could do. Another pointer I want to give is the DefaultRollingPolicy [2]. It l

Batch compressed file output

2020-11-27 Thread Flavio Pompermaier
Hello guys, I have to write my batch data (Dataset) to a file format. Actually what I need to do is: 1. split the data if it exceeds some size threshold (by line count or max MB) 2. compress the output data (possibly without converting to the hadoop format) Are there any suggestions