You should have a look at this project : https://github.com/addthis/stream-lib
You can use it within Flink, storing intermediate values in a local state.
> Le 9 juin 2016 à 15:29, Yukun Guo a écrit :
>
> Thank you very much for the detailed answer. Now I understand a DataStream
> can be re
Hi,
There are some implementations to do that with low memory footprint. Have a
look at the count min sketch for example. There are some Java
implementations.
Christophe
2016-06-09 15:29 GMT+02:00 Yukun Guo :
> Thank you very much for the detailed answer. Now I understand a DataStream
> can be
Thank you very much for the detailed answer. Now I understand a DataStream
can be repartitioned or “joined” (don’t know the exact terminology) with
keyBy.
But another question:
Despite the non-existence of incremental top-k algorithm, I’d like to
incrementally compute the local word count during o
Suggestions in-line below...
On Mon, Jun 6, 2016 at 7:26 PM, Yukun Guo wrote:
> Hi,
>
> I'm working on a project which uses Flink to compute hourly log statistics
> like top-K. The logs are fetched from Kafka by a FlinkKafkaProducer and
> packed
> into a DataStream.
>
> The problem is, I find th
My algorithm is roughly like this taking top-K words problem as an example
(the purpose of computing local “word count” is to deal with data
imbalance):
DataStream of words ->
timeWindow of 1h ->
converted to DataSet of words ->
random partitioning by rebalance ->
local “word count” using mapParti
Hi,
I'm working on a project which uses Flink to compute hourly log statistics
like top-K. The logs are fetched from Kafka by a FlinkKafkaProducer and
packed
into a DataStream.
The problem is, I find the computation quite challenging to express with
Flink's DataStream API:
1. If I use something