Hi, Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn cluster where I need to connect to HBase.
What I have: In my environment: HADOOP_CONF_DIR=/etc/hadoop/conf/ HBASE_CONF_DIR=/etc/hbase/conf/ HIVE_CONF_DIR=/etc/hive/conf/ YARN_CONF_DIR=/etc/hadoop/conf/ In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper hosts for HBase. My test code is this: public class Main { private static final Logger LOG = LoggerFactory.getLogger(Main.class); public static void main(String[] args) throws Exception { printZookeeperConfig(); final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1); env.createInput(new HBaseSource()).print(); env.execute("HBase config problem"); } public static void printZookeeperConfig() { String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum"); LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper); } public static class HBaseSource extends AbstractTableInputFormat<String> { @Override public void configure(org.apache.flink.configuration.Configuration parameters) { table = createTable(); if (table != null) { scan = getScanner(); } } private HTable createTable() { LOG.info("Initializing HBaseConfiguration"); // Uses files found in the classpath org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create(); printZookeeperConfig(); try { return new HTable(hConf, getTableName()); } catch (Exception e) { LOG.error("Error instantiating a new HTable instance", e); } return null; } @Override public String getTableName() { return "bugs:flink"; } @Override protected String mapResultToOutType(Result result) { return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8))); } @Override protected Scan getScanner() { return new Scan(); } } } I run this application with this command on my Yarn cluster (note: first starting a yarn-cluster and then submitting the job yields the same result). flink \ run \ -m yarn-cluster \ --yarncontainer 1 \ --yarnname "Flink on Yarn HBase problem" \ --yarnslots 1 \ --yarnjobManagerMemory 4000 \ --yarntaskManagerMemory 4000 \ --yarnstreaming \ target/flink-hbase-connect-1.0-SNAPSHOT.jar Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end. 2) The zookeeper settings of my experimental environent have been picked up by the software 2017-10-20 11:17:23,973 INFO com.bol.bugreports.Main - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181 When I open the logfiles on the Hadoop cluster I see this: 2017-10-20 13:17:33,250 INFO com.bol.bugreports.Main - ----> Loading HBaseConfiguration: Zookeeper = *localhost* and as a consequence 2017-10-20 13:17:33,368 INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost.localdomain/127.0.0.1:2181 2017-10-20 13:17:33,369 WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-10-20 13:17:33,475 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid The value 'localhost:2181' has been defined within the HBase jar in the hbase-default.xml as the default value for the zookeeper nodes. As a workaround I currently put this extra line in my code which I know is nasty but "works on my cluster" hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml")); What am I doing wrong? What is the right way to fix this? -- Best regards / Met vriendelijke groeten, Niels Basjes