Hilmi Yildirim created FLINK-2188:
-------------------------------------

             Summary: Reading from big HBase Tables
                 Key: FLINK-2188
                 URL: https://issues.apache.org/jira/browse/FLINK-2188
             Project: Flink
          Issue Type: Bug
            Reporter: Hilmi Yildirim


I detected a bug in the reading from a big Hbase Table.

I used a cluster of 13 machines with 13 processing slots for each machine which 
results in a total number of processing slots of 169. Further, our cluster uses 
cdh5.4.1 and the HBase version is 1.0.0-cdh5.4.1. There is a Hbase Table with 
nearly 100. mio rows. I used Spark and Hive to count the number of rows and 
both results are identical (nearly 100 mio.). 
Then, I used Flink to count the number of rows. For that I added the 
hbase-client 1.0.0-cdh5.4.1 Java API as dependency in maven and excluded the 
other hbase-client dependencies. The result of the job is nearly 102 mio. , 2 
mio. rows more than the result of Spark and Hive. Moreover, I run the Flink job 
multiple times and sometimes the result fluctuates by +-5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to