Thanks for your answer Fabian.
In my opinion this is not just a possible new feature for an optimization,
but a bigger problem because the client program crashes with an exception
when concurrent counts or collects are triggered on the same data set, and
this also happens non deterministically dep
Hi Juan,
count() and collect() trigger the execution of a job.
Since Flink does not cache intermediate results (yet), all operations from
the sink (count()/collect()) to the sources are executed.
So in a sense a DataSet is immutable (given that the input of the sources
do not change) but completel
Hi Timo,
Thanks for your answer. I was surprised to have problems calling those
methods concurrently, because I though data sets were immutable. Now I
understand calling count or collect mutates the data set, not its contents
but some kind of execution plan included in the data set.
I suggest add
Hi Juan,
as far as I know we do not provide any concurrency guarantees for
count() or collect(). Those methods need to be used with caution anyways
as the result size must not exceed a certain threshold. I will loop in
Fabian who might know more about the internals of the execution?
Regards,
Any thoughts on this?
On Sun, Apr 7, 2019, 6:56 PM Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com> wrote:
> Hi,
>
> I have a very simple program using the local execution environment, that
> throws NPE and other exceptions related to concurrent access when launching
> a count for a Dat
Hi,
I have a very simple program using the local execution environment, that
throws NPE and other exceptions related to concurrent access when launching
a count for a DataSet from different threads. The program is
https://gist.github.com/juanrh/685a89039e866c1067a6efbfc22c753e which is
basically t