Re: Providing hdfs name node IP for streaming file sink

Yang Wang Sun, 01 Mar 2020 19:06:12 -0800

Hi Nick,

Certainly you could directly use "namenode:port" as the schema of you HDFS
path.
Then the hadoop configs(e.g. core-site.xml, hdfs-site.xml) will not be
necessary.
However, that also means you could benefit from the HDFS
high-availability[1].


If your HDFS cluster is HA configured, i strongly suggest you to set the
"HADOOP_CONF_DIR"
for your Flink application. Both the client and cluster(JM/TM) side need to
be set. Then
your HDFS path could be specified like this "hdfs://myhdfs/flink/test".
Given that "myhdfs"
is the name service configured in hdfs-site.xml.


Best,
Yang



[1].
http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Nick Bendtner <[email protected]> 于2020年2月29日周六 上午6:00写道：

> To add to this question, do I need to setup env.hadoop.conf.dir to point
> to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/ for
> the jvm ? Or is it possible to write to hdfs without any external hadoop
> config like core-site.xml, hdfs-site.xml ?
>
> Best,
> Nick.
>
>
>
> On Fri, Feb 28, 2020 at 12:56 PM Nick Bendtner <[email protected]> wrote:
>
>> Hi guys,
>> I am trying to write to hdfs from streaming file sink. Where should I
>> provide the IP address of the name node ? Can I provide it as a part of the
>> flink-config.yaml file or should I provide it like this :
>>
>> final StreamingFileSink<GenericRecord> sink = StreamingFileSink
>>      .forBulkFormat(hdfs://namenode:8020/flink/test, 
>> ParquetAvroWriters.forGenericRecord(schema))
>>
>>      .build();
>>
>>
>> Best,
>> Nick
>>
>>
>>

Re: Providing hdfs name node IP for streaming file sink

Reply via email to