LeonBein commented on a change in pull request #15109: URL: https://github.com/apache/flink/pull/15109#discussion_r592511610
########## File path: flink-connectors/flink-connector-hbase/README.md ########## @@ -0,0 +1,134 @@ +# Flink HBase Connector + +This connector provides classes that allow Flink to access [HBase](https://hbase.apache.org/). +It supports the new Source and Sink API specified in [FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface) and [FLIP-143](https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API). + +## Building the connector + +Note that the streaming connectors are not part of the binary distribution of Flink. +You need to link them into your job jar for cluster execution. +See how to link with them for cluster execution [here](https://ci.apache.org/projects/flink/flink-docs-stable/dev/project-configuration.html#adding-connector-and-library-dependencies). + +The connector can be built by using maven: + +``` +cd flink-connectors/flink-connector-hbase +mvn clean install +``` + +## Installing HBase + +Follow the instructions from the [HBase Quick Start Guide](http://hbase.apache.org/book.html#quickstart) to install HBase. + +*Version Compatibility*: This module is compatible with Apache HBase *2.3.4*. + +## HBase Configuration + +Connecting to HBase always requires a `Configuration` instance. +If there is an HBase gateway on the same host as the Flink gateway where the application is started, this can be obtained by invoking `HBaseConfiguration.create()` as in the examples below. +If that's not the case a configuration should be provided where the proper core-site, hdfs-site, and hbase-site are added as resources. + +## DataStream API + +### Reading data from HBase + +To receive data from HBase, the connector makes use of the internal replication mechanism of HBase. +The connector registers at the HBase cluster as a *Replication Peer* and will receive all change events from HBase. + +For the replication to work, the HBase config needs to have replication enabled in the `hbase-site.xml` file: +```xml +<configuration> + <property> + <name>hbase.replication</name> + <value>true</value> + </property> + ... +</configuration> +``` +All incoming events to Flink will be processed as an `HBaseEvent`. +You will need to specify a Deserializer which will transform each event from an `HBaseEvent` to the desired DataStream type. + +```java +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + +Configuration hbaseConfig = HBaseConfiguration.create(); +hbaseConfig.setBoolean("hbase.replication", true); Review comment: Removed in 047d43e21488af3a8f405e8e6c01e502721fd80f ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org