Re: conditional dataset output

lars . bachmann Fri, 09 Dec 2016 02:52:23 -0800

Hi Chesnay,

I actually thought about the same but like you said it seems a bit hacky;-). Anyway thank you!


Regards,

Lars

Am 08.12.2016 16:47 schrieb Chesnay Schepler:

Hello Lars,
The only other way i can think of how this could be done is by wrappingthe usedoutputformat in a custom format, which calls open on the wrappedoutputformat
when you receive the first record.

This should work but is quite hacky though as it interferes with the
format life-cycle.

Regards,
Chesnay

On 08.12.2016 16:39, lars.bachm...@posteo.de wrote:
Hi,
let's assume I have a dataset and depending on the input data anddifferent filter operations this dataset can be empty. Now I want tooutput the dataset to HD, but I want that files are only created ifthe dataset is not empty. If the dataset is empty I don't want anyfiles. The default way: dataset.write(...) will always create as manyfiles as the parallelism of this operator is configured - in case ofan empty dataset all files would be empty as well. I thought aboutdoing something like:
if (dataset.count() > 0) {
   dataset.write(...)
}
but I don't think thats the way to go, because dataset.count()triggers a execution of the (sub)program.
Is there a simple way how to avoid creating empty files for emptydatasets?
Regards,

Lars

Re: conditional dataset output

Reply via email to