Because I don't see any good reason for that...maybe also all keyo serialization errors that I have from time to time could be symptomatic of some other error in how Flink manage the ibternal buffers...but also this is just another personal guess I did.. On 4 Jul 2016 12:29 p.m., "Ufuk Celebi" <u...@apache.org> wrote:
> It's not possible to tell. You would have to look into the logs of the > job manager to check what happened. The not killed task manager could > have re-connected to the job manager, if it was restarted quickly > after the failure. Why do you think that the task manager would > influence the job result though? > > On Mon, Jul 4, 2016 at 12:23 PM, Flavio Pompermaier > <pomperma...@okkam.it> wrote: > > No, I haven't. > > I fear that unkilled taskmanger could have been the cause of this > problem. > > Last day I run the job and I discovered that on some node there was some > > zombie taskmanger yhat wasn't terminated during the stop-cluster. > > What do you think?What happens in this situations?old taskmanager are > still > > avle to interfer with the new jobmanager? > > in the webdashboard I didn't see them so I thought it wasn't > problematic > > at all so I just killed them.. > > > > On 4 Jul 2016 12:07 p.m., "Ufuk Celebi" <u...@apache.org> wrote: > > > > I guess Aljoscha was referring to whether you also have broadcasted > > input or something like it? > > > > On Fri, Jul 1, 2016 at 7:05 PM, Flavio Pompermaier <pomperma...@okkam.it > > > > wrote: > >> what do you mean exactly? > >> > >> On 1 Jul 2016 18:58, "Aljoscha Krettek" <aljos...@apache.org> wrote: > >>> > >>> Hi, > >>> do you have any data in the coGroup/groupBy operators that you use, > >>> besides the input data? > >>> > >>> Cheers, > >>> Aljoscha > >>> > >>> On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier <pomperma...@okkam.it> > >>> wrote: > >>>> > >>>> Hi to all, > >>>> I have a Flink job that computes data correctly when launched locally > >>>> from my IDE while it doesn't when launched on the cluster. > >>>> > >>>> Is there any suggestion/example to understand the problematic > operators > >>>> in this way? > >>>> I think the root cause is the fact that some operator (e.g. > >>>> coGroup/groupBy,etc), which I assume to have all the data for a key, > >>>> maybe > >>>> it is not (because the data is partitioned among nodes). > >>>> > >>>> Any help is appreciated, > >>>> Flavio >