Thanks a lot Yang. What are your thoughts on catching the exception when a name node is down and retrying with the secondary name node ?
Best, Nick. On Sun, Mar 1, 2020 at 9:05 PM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Nick, > > Certainly you could directly use "namenode:port" as the schema of you HDFS > path. > Then the hadoop configs(e.g. core-site.xml, hdfs-site.xml) will not be > necessary. > However, that also means you could benefit from the HDFS > high-availability[1]. > > If your HDFS cluster is HA configured, i strongly suggest you to set the > "HADOOP_CONF_DIR" > for your Flink application. Both the client and cluster(JM/TM) side need > to be set. Then > your HDFS path could be specified like this "hdfs://myhdfs/flink/test". > Given that "myhdfs" > is the name service configured in hdfs-site.xml. > > > Best, > Yang > > > > [1]. > http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html > > Nick Bendtner <buggi...@gmail.com> 于2020年2月29日周六 上午6:00写道: > >> To add to this question, do I need to setup env.hadoop.conf.dir to point >> to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/ for >> the jvm ? Or is it possible to write to hdfs without any external hadoop >> config like core-site.xml, hdfs-site.xml ? >> >> Best, >> Nick. >> >> >> >> On Fri, Feb 28, 2020 at 12:56 PM Nick Bendtner <buggi...@gmail.com> >> wrote: >> >>> Hi guys, >>> I am trying to write to hdfs from streaming file sink. Where should I >>> provide the IP address of the name node ? Can I provide it as a part of the >>> flink-config.yaml file or should I provide it like this : >>> >>> final StreamingFileSink<GenericRecord> sink = StreamingFileSink >>> .forBulkFormat(hdfs://namenode:8020/flink/test, >>> ParquetAvroWriters.forGenericRecord(schema)) >>> >>> .build(); >>> >>> >>> Best, >>> Nick >>> >>> >>>