Hi Hilmi,

if you just want to count the number of elements, you can also use
accumulators, as described here [1].
They are much more lightweight.

So you need to make your flatMap function a RichFlatMapFunction, then call
getExecutionContext().
Use a long accumulator to count the elements.

If the results with the accumulator are consistent (the exact element
count), then there is a severe bug in Flink. But I suspect that the
accumulator will give you the same result (off by +-5)

Best,
Robert


[1]: http://slideshare.net/robertmetzger1/apache-flink-hands-on

On Mon, Jun 8, 2015 at 3:04 PM, Hilmi Yildirim <hilmi.yildi...@neofonie.de>
wrote:

> Hi,
> I implemented a simple Flink Batch job which reads from an HBase Cluster
> of 13 machines and with nearly 100 million rows. The hbase version is
> 1.0.0-cdh5.4.1. So, I imported hbase-client 1.0.0-cdh5.4.1.
> I implemented a flatmap which creates a tuple ("a", 1L) for each row .
> Then, I use groupBy(0).sum(1).writeAsTest. The result should be the number
> of rows. But, the result is not correct. I run the job multiple times and
> the result flactuates by +-5. I also run the job for a smaller table with
> 100.000 rows and the result is correct.
>
> Does anyone know the reason for that?
>
> Best Regards,
> Hilmi
>
> --
> --
> Hilmi Yildirim
> Software Developer R&D
>
> http://www.neofonie.de
>
> Besuchen Sie den Neo Tech Blog für Anwender:
> http://blog.neofonie.de/
>
> Folgen Sie uns:
> https://plus.google.com/+neofonie
> http://www.linkedin.com/company/neofonie-gmbh
> https://www.xing.com/companies/neofoniegmbh
>
> Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin
> Handelsregister Berlin-Charlottenburg: HRB 67460
> Geschäftsführung: Thomas Kitlitschko
>
>

Reply via email to