Hi Hilmi, if you just want to count the number of elements, you can also use accumulators, as described here [1]. They are much more lightweight.
So you need to make your flatMap function a RichFlatMapFunction, then call getExecutionContext(). Use a long accumulator to count the elements. If the results with the accumulator are consistent (the exact element count), then there is a severe bug in Flink. But I suspect that the accumulator will give you the same result (off by +-5) Best, Robert [1]: http://slideshare.net/robertmetzger1/apache-flink-hands-on On Mon, Jun 8, 2015 at 3:04 PM, Hilmi Yildirim <hilmi.yildi...@neofonie.de> wrote: > Hi, > I implemented a simple Flink Batch job which reads from an HBase Cluster > of 13 machines and with nearly 100 million rows. The hbase version is > 1.0.0-cdh5.4.1. So, I imported hbase-client 1.0.0-cdh5.4.1. > I implemented a flatmap which creates a tuple ("a", 1L) for each row . > Then, I use groupBy(0).sum(1).writeAsTest. The result should be the number > of rows. But, the result is not correct. I run the job multiple times and > the result flactuates by +-5. I also run the job for a smaller table with > 100.000 rows and the result is correct. > > Does anyone know the reason for that? > > Best Regards, > Hilmi > > -- > -- > Hilmi Yildirim > Software Developer R&D > > http://www.neofonie.de > > Besuchen Sie den Neo Tech Blog für Anwender: > http://blog.neofonie.de/ > > Folgen Sie uns: > https://plus.google.com/+neofonie > http://www.linkedin.com/company/neofonie-gmbh > https://www.xing.com/companies/neofoniegmbh > > Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin > Handelsregister Berlin-Charlottenburg: HRB 67460 > Geschäftsführung: Thomas Kitlitschko > >