Hi Nick, Certainly you could directly use "namenode:port" as the schema of you HDFS path. Then the hadoop configs(e.g. core-site.xml, hdfs-site.xml) will not be necessary. However, that also means you could benefit from the HDFS high-availability[1].
If your HDFS cluster is HA configured, i strongly suggest you to set the "HADOOP_CONF_DIR" for your Flink application. Both the client and cluster(JM/TM) side need to be set. Then your HDFS path could be specified like this "hdfs://myhdfs/flink/test". Given that "myhdfs" is the name service configured in hdfs-site.xml. Best, Yang [1]. http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html Nick Bendtner <[email protected]> 于2020年2月29日周六 上午6:00写道: > To add to this question, do I need to setup env.hadoop.conf.dir to point > to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/ for > the jvm ? Or is it possible to write to hdfs without any external hadoop > config like core-site.xml, hdfs-site.xml ? > > Best, > Nick. > > > > On Fri, Feb 28, 2020 at 12:56 PM Nick Bendtner <[email protected]> wrote: > >> Hi guys, >> I am trying to write to hdfs from streaming file sink. Where should I >> provide the IP address of the name node ? Can I provide it as a part of the >> flink-config.yaml file or should I provide it like this : >> >> final StreamingFileSink<GenericRecord> sink = StreamingFileSink >> .forBulkFormat(hdfs://namenode:8020/flink/test, >> ParquetAvroWriters.forGenericRecord(schema)) >> >> .build(); >> >> >> Best, >> Nick >> >> >>
