Hi Maciek,
The first point seems interesting and we should definitely look into that, also
for other filesystems e.g. HDFS.
It would be nice if we could together find a more “one-size-fits-all” solution.
Because local fs rounds up to a second but other filesystems may have different
strategies.
Hi Maciek,
Thanks for bringing this up again and sorry for not opening the discussion yet.
I will check it out and get back to you during week.
Kostas
> On Nov 26, 2016, at 9:40 PM, Maciek Próchniak wrote:
>
> Hi Kostas,
>
> I didn't see any discussion on dev mailing list, so I'd like to sha
Hi Kostas,
I didn't see any discussion on dev mailing list, so I'd like to share
our problems/solutions (we had a busy month...;)
1. we refactored ContinuousFileMonitoringFunction so that state includes
not only lastModificationTime, but also list of files that have exactly
this modification
Hi Maciek,
I agree with you that 1ms is often too long :P
This is the reason why I will open a discussion to have
all the ideas/ requirements / shortcomings in a single place.
This way the community can track and influence what
is coming next.
Hopefully I will do it in the afternoon and I will
Hi Kostas,
thanks for quick answer.
I wouldn't dare to delete files in InputFormat if they were splitted and
processed in parallel...
As for using notifyCheckpointComplete - thanks for suggestion, it looks
pretty interesting, I'll try to try it out. Although I wonder a bit if
relying only o
Hi Maciek,
Just a follow-up on the previous email, given that splits are read in parallel,
when the
ContinuousFileMonitoringFunction forwards the last split, it does not mean that
the
final splits is going to be processed last. If the node it gets assigned is
fast enough
then it may be proces
Hi Maciek,
Currently this functionality is not supported but this seems like a good
addition.
Actually, give that the feature is rather new, we were thinking of opening a
discussion
in the dev mailing list in order to
i) discuss some current limitations of the Continuous File Processing sourc
Hi,
we want to monitor hdfs (or local) directory, read csv files that appear
and after successful processing - delete them (mainly not to run out of
disk space...)
I'm not quite sure how to achieve it with current implementation.
Previously, when we read binary data (unsplittable files) we m