Hi Ufuk,
I used the TableInput format from flink-addons.
Best Regards,
Hilmi
Am 09.06.2015 um 13:17 schrieb Ufuk Celebi:
Hey Hilmi,
thanks for reporting the issue. Sorry for the inconvenience this has caused.
I'm not familiar with HBase in combination with Flink. From what I've seen,
there a
Hey Hilmi,
thanks for reporting the issue. Sorry for the inconvenience this has caused.
I'm not familiar with HBase in combination with Flink. From what I've seen,
there are two options: either use Flink's TableInputFormat from flink-addons or
the Hadoop TableInputFormat, right? Which one are y
Thank you very much!
From: Hilmi Yildirim
Sent: Tuesday, 9. June, 2015 11:40
To: user@flink.apache.org
Done
https://issues.apache.org/jira/browse/FLINK-2188
Am 09.06.2015 um 11:26 schrieb Fabian Hueske:
Would you mind opening a JIRA for this issue?
-> https://issues.apa
Done
https://issues.apache.org/jira/browse/FLINK-2188
Am 09.06.2015 um 11:26 schrieb Fabian Hueske:
Would you mind opening a JIRA for this issue?
-> https://issues.apache.org/jira/browse/FLINK
I can do it as well, but you know all the details.
Thanks, Fabian
2015-06-09 11:03 GMT+02:00 Hilmi
Would you mind opening a JIRA for this issue?
-> https://issues.apache.org/jira/browse/FLINK
I can do it as well, but you know all the details.
Thanks, Fabian
2015-06-09 11:03 GMT+02:00 Hilmi Yildirim :
> I want to add that I run the Flink job on a cluster with 13 machines and
> each machine
I want to add that I run the Flink job on a cluster with 13 machines and
each machine has 13 processing slots which results in a total number of
processing slots of 169.
Am 09.06.2015 um 10:59 schrieb Hilmi Yildirim:
Correct.
I also counted the rows with Spark and Hive. Both returned the same
Correct.
I also counted the rows with Spark and Hive. Both returned the same
value which is nearly 100 mio. rows. But Flink returns 102 mio. rows.
Best Regards,
Hilmi
Am 09.06.2015 um 10:47 schrieb Fabian Hueske:
OK, so the problem seems to be with the HBase InputFormat.
I guess this issue
OK, so the problem seems to be with the HBase InputFormat.
I guess this issue needs a bit of debugging.
We need to check if records are emitted twice (or more often) and if that
is the case which records.
Unfortunately, this issue only seems to occur with large tables :-(
Did I got that right, th
Hi,
Now I tested the "count" method. It returns the same result as the
flatmap.groupBy(0).sum(1) method.
Furthermore, the Hbase contains nearly 100 mio. rows but the result is
102 mio.. This means that the HbaseInput reads more rows than the HBase
contains.
Best Regards,
Hilmi
Am 08.06.201
Hi Hilmi,
I see two possible reasons:
1) The data source / InputFormat is not properly working, so not all HBase
records are read/forwarded, or
2) The aggregation / count is buggy
Roberts suggestion will use an alternative mechanism to do the count. In
fact, you can count with groupBy(0).sum() a
Hi Hilmi,
if you just want to count the number of elements, you can also use
accumulators, as described here [1].
They are much more lightweight.
So you need to make your flatMap function a RichFlatMapFunction, then call
getExecutionContext().
Use a long accumulator to count the elements.
If the
Hi,
I implemented a simple Flink Batch job which reads from an HBase Cluster
of 13 machines and with nearly 100 million rows. The hbase version is
1.0.0-cdh5.4.1. So, I imported hbase-client 1.0.0-cdh5.4.1.
I implemented a flatmap which creates a tuple ("a", 1L) for each row .
Then, I use group
Hi,
I implemented a simple Flink Batch job which reads from an HBase Cluster
of 13 machines and with nearly 100 million rows. The hbase version is
1.0.0-cdh5.4.1. So, I imported hbase-client 1.0.0-cdh5.4.1.
I implemented a flatmap which creates a tuple ("a", 1L) for each row .
Then, I use group
13 matches
Mail list logo