Thanks Yang. Going with setting the HADOOP_CONF_DIR in the flink application. It integrates neatly with flink.
Best, Nick. On Mon, Mar 2, 2020 at 7:42 PM Yang Wang <danrtsey...@gmail.com> wrote: > It may work. However, you need to set your own retry policy(similar as > `ConfiguredFailoverProxyProvider` in hadoop). > Also if you directly use namenode address and do not load HDFS > configuration, some HDFS client configuration (e.g. > dfs.client.*) will not take effect. > > > Best, > Yang > > Nick Bendtner <buggi...@gmail.com> 于2020年3月2日周一 下午11:58写道: > >> Thanks a lot Yang. What are your thoughts on catching the exception when >> a name node is down and retrying with the secondary name node ? >> >> Best, >> Nick. >> >> On Sun, Mar 1, 2020 at 9:05 PM Yang Wang <danrtsey...@gmail.com> wrote: >> >>> Hi Nick, >>> >>> Certainly you could directly use "namenode:port" as the schema of you >>> HDFS path. >>> Then the hadoop configs(e.g. core-site.xml, hdfs-site.xml) will not be >>> necessary. >>> However, that also means you could benefit from the HDFS >>> high-availability[1]. >>> >>> If your HDFS cluster is HA configured, i strongly suggest you to set the >>> "HADOOP_CONF_DIR" >>> for your Flink application. Both the client and cluster(JM/TM) side need >>> to be set. Then >>> your HDFS path could be specified like this "hdfs://myhdfs/flink/test". >>> Given that "myhdfs" >>> is the name service configured in hdfs-site.xml. >>> >>> >>> Best, >>> Yang >>> >>> >>> >>> [1]. >>> http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html >>> >>> Nick Bendtner <buggi...@gmail.com> 于2020年2月29日周六 上午6:00写道: >>> >>>> To add to this question, do I need to setup env.hadoop.conf.dir to >>>> point to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/ >>>> for the jvm ? Or is it possible to write to hdfs without any external >>>> hadoop config like core-site.xml, hdfs-site.xml ? >>>> >>>> Best, >>>> Nick. >>>> >>>> >>>> >>>> On Fri, Feb 28, 2020 at 12:56 PM Nick Bendtner <buggi...@gmail.com> >>>> wrote: >>>> >>>>> Hi guys, >>>>> I am trying to write to hdfs from streaming file sink. Where should I >>>>> provide the IP address of the name node ? Can I provide it as a part of >>>>> the >>>>> flink-config.yaml file or should I provide it like this : >>>>> >>>>> final StreamingFileSink<GenericRecord> sink = StreamingFileSink >>>>> .forBulkFormat(hdfs://namenode:8020/flink/test, >>>>> ParquetAvroWriters.forGenericRecord(schema)) >>>>> >>>>> .build(); >>>>> >>>>> >>>>> Best, >>>>> Nick >>>>> >>>>> >>>>>