I just want to add another workaround, which does not need a
self-compiled version. You can use TimeWindow with a CountTriger.of(1)
combined with a FoldFunction for pre-aggregration and a
RichWindowFunction to update the queryable state. Additionally, you need
a TimeWindow for the final results. So
Thanks!
This looks like a bigger example, involving MongoDB, etc.
Are you able to reproduce this issue with a smaller example?
It would also help to understand the problem better if we knew the topology
a bit better.
The stack traces look like "phase 1&2" want to send data (but are back
pressured
Hi,
what Ufuk said is valid. In addition, you can make your function a
RichFunction and load the static data in the open() method.
In the future, you might be able to handle this use case with a feature
called side inputs that we're currently working on:
https://docs.google.com/document/d/1hIgxi2Z
Hi,
100 million rows is small load, especially for 1 week.
I suspect that your load would be quite evenly distributed during the day
as it's plant not humans.
If you look for reliability, make 2 Kafka servers at least where each topic
has 6 partitions. And separate Hadoop cluster for Flink.
As fo
There was a private member variable that was not serializable and was not
marked transient. Thanks for the pointer.
On Thu, Feb 23, 2017 at 11:44 PM, Tzu-Li (Gordon) Tai
wrote:
> Thanks for clarifying.
>
> From the looks of your exception:
>
> Caused by: java.io.NotSerializableException:
> c