Re: Exception while running Flink jobs (1.0.0)

2016-10-12 Thread Flavio Pompermaier
Ok, thanks for the update Ufuk! Let me know if you need test or anything! Best, Flavio On Wed, Oct 12, 2016 at 11:26 AM, Ufuk Celebi wrote: > No, sorry. I was waiting for Tarandeep's feedback before looking into > it further. I will do it over the next days in any case. > > On Wed, Oct 12, 2016

Re: Exception while running Flink jobs (1.0.0)

2016-10-12 Thread Ufuk Celebi
No, sorry. I was waiting for Tarandeep's feedback before looking into it further. I will do it over the next days in any case. On Wed, Oct 12, 2016 at 10:49 AM, Flavio Pompermaier wrote: > Hi Ufuk, > any news on this? > > On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi wrote: >> >> I guess that this

Re: Exception while running Flink jobs (1.0.0)

2016-10-12 Thread Flavio Pompermaier
Hi Ufuk, any news on this? On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi wrote: > I guess that this is caused by a bug in the checksum calculation. Let > me check that. > > On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier > wrote: > > I've ran the job once more (always using the checksum branch

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Ufuk Celebi
I guess that this is caused by a bug in the checksum calculation. Let me check that. On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier wrote: > I've ran the job once more (always using the checksum branch) and this time > I got: > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Flavio Pompermaier
I've ran the job once more (always using the checksum branch) and this time I got: Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112 at org.apache.flink.api.common.typeutils.base.EnumSerializer.deserialize(EnumSerializer.java:83) at org.apache.flink.api.common.typeutils.base.EnumSeri

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Ufuk Celebi
Yes, if that's the case you should go with option (2) and run with the checksums I think. On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier wrote: > The problem is that data is very large and usually cannot run on a single > machine :( > > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi wrote: >>

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Flavio Pompermaier
The problem is that data is very large and usually cannot run on a single machine :( On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi wrote: > On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh > wrote: > > @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. > Slots > > per task mana

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Ufuk Celebi
On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh wrote: > @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. Slots > per task manager: 2-4 (I tried varying this to see if this has any impact). > Network buffers: 5k - 20k (tried different values for it). Could you run the job fir

Re: Exception while running Flink jobs (1.0.0)

2016-10-05 Thread Tarandeep Singh
@Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. Slots per task manager: 2-4 (I tried varying this to see if this has any impact). Network buffers: 5k - 20k (tried different values for it). @Ufuk: Thank you for creating a branch with checksum. I will use this build to test the

Re: Exception while running Flink jobs (1.0.0)

2016-10-05 Thread Ufuk Celebi
@Tarandeep and Flavio: +1 to Stephan's question. Furthermore, I've created a branch which adds a simple CRC32 checksum calculation over the network buffer content here: https://github.com/uce/flink/tree/checksum It would be great if you could run your job with a build from this branch. It's based

Re: Exception while running Flink jobs (1.0.0)

2016-10-04 Thread Stephan Ewen
It would be great to know if this only occurs in setups where Netty in involved (more than one TaskManager and, and at least one shuffle/rebalance) or also in one-taskmanager setups (which have local channels only). Stephan On Tue, Oct 4, 2016 at 11:49 AM, Till Rohrmann wrote: > Hi Tarandeep, >

Re: Exception while running Flink jobs (1.0.0)

2016-10-04 Thread Till Rohrmann
Hi Tarandeep, it would be great if you could compile a small example data set with which you're able to reproduce your problem. We could then try to debug it. It would also be interesting to know whether Flavio's bug solves your problem or not. Cheers, Till On Mon, Oct 3, 2016 at 10:26 PM, Flavi

Re: Exception while running Flink jobs (1.0.0)

2016-10-03 Thread Flavio Pompermaier
I think you're running into the same exception I face sometimes..I've opened a jira for it [1]. Could you please try to apply that patch and see if things get better? https://issues.apache.org/jira/plugins/servlet/mobile#issue/FLINK-4719 Best, Flavio On 3 Oct 2016 22:09, "Tarandeep Singh" wrote

Re: Exception while running Flink jobs (1.0.0)

2016-10-03 Thread Tarandeep Singh
Now, when I ran it again (with lower task slots per machine) I got a different error- org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed. at org.apache.flink.client.program.Client.runBlocking(Client.java:381) at org.apache.flink.c

Exception while running Flink jobs (1.0.0)

2016-10-03 Thread Tarandeep Singh
Hi, I am using flink-1.0.0 and running ETL (batch) jobs on it for quite some time (few months) without any problem. Starting this morning, I have been getting errors like these- "Received an event in channel 3 while still having data from a record. This indicates broken serialization logic. If yo