On 27 July 2018 at 19:21, Chesnay Schepler <ches...@apache.org> wrote:
> At first glance this looks like a bug. Is the nothing in the stack trace
after the NullPointerException?

Hmm, there is actually, sorry about that:

Caused by: java.lang.NullPointerException
at
org.apache.flink.runtime.state.KeyGroupRangeAssignment.assignToKeyGroup(KeyGroupRangeAssignment.java:59)
at
org.apache.flink.runtime.state.KeyGroupRangeAssignment.assignKeyToParallelOperator(KeyGroupRangeAssignment.java:48)
at
org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:63)
at
org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:32)
at
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:104)
at
org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)
at
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 22 more

> How reliably can you reproduce this?

100%, though I've only run this job a handful of times. What's frustrating
is that I cannot reproduce this easily in a unit test (all my unit tests
work fine). On production data it happens every time and pretty much
instantly, but our data volumes are big enough that it's difficult to try
to dig into it further.

For now I think I'll split the job into two, have the first aggregation
write to Kafka and have the second aggregation as a separate job that reads
its input from Kafka. When I run the first aggregation only that is fine
and no errors occur, so the issue seems to be the combination of
aggregations.

Cheers,

Reply via email to