Re: Providing hdfs name node IP for streaming file sink

Yang Wang Mon, 02 Mar 2020 17:43:06 -0800

It may work. However, you need to set your own retry policy(similar as
`ConfiguredFailoverProxyProvider` in hadoop).
Also if you directly use namenode address and do not load HDFS
configuration, some HDFS client configuration (e.g.
dfs.client.*) will not take effect.



Best,
Yang

Nick Bendtner <buggi...@gmail.com> 于2020年3月2日周一 下午11:58写道：

> Thanks a lot Yang. What are your thoughts on catching the exception when a
> name node is down and retrying with the secondary name node ?
>
> Best,
> Nick.
>
> On Sun, Mar 1, 2020 at 9:05 PM Yang Wang <danrtsey...@gmail.com> wrote:
>
>> Hi Nick,
>>
>> Certainly you could directly use "namenode:port" as the schema of you
>> HDFS path.
>> Then the hadoop configs(e.g. core-site.xml, hdfs-site.xml) will not be
>> necessary.
>> However, that also means you could benefit from the HDFS
>> high-availability[1].
>>
>> If your HDFS cluster is HA configured, i strongly suggest you to set the
>> "HADOOP_CONF_DIR"
>> for your Flink application. Both the client and cluster(JM/TM) side need
>> to be set. Then
>> your HDFS path could be specified like this "hdfs://myhdfs/flink/test".
>> Given that "myhdfs"
>> is the name service configured in hdfs-site.xml.
>>
>>
>> Best,
>> Yang
>>
>>
>>
>> [1].
>> http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
>>
>> Nick Bendtner <buggi...@gmail.com> 于2020年2月29日周六 上午6:00写道：
>>
>>> To add to this question, do I need to setup env.hadoop.conf.dir to
>>> point to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/
>>> for the jvm ? Or is it possible to write to hdfs without any external
>>> hadoop config like core-site.xml, hdfs-site.xml ?
>>>
>>> Best,
>>> Nick.
>>>
>>>
>>>
>>> On Fri, Feb 28, 2020 at 12:56 PM Nick Bendtner <buggi...@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>> I am trying to write to hdfs from streaming file sink. Where should I
>>>> provide the IP address of the name node ? Can I provide it as a part of the
>>>> flink-config.yaml file or should I provide it like this :
>>>>
>>>> final StreamingFileSink<GenericRecord> sink = StreamingFileSink
>>>>    .forBulkFormat(hdfs://namenode:8020/flink/test, 
>>>> ParquetAvroWriters.forGenericRecord(schema))
>>>>
>>>>    .build();
>>>>
>>>>
>>>> Best,
>>>> Nick
>>>>
>>>>
>>>>

Re: Providing hdfs name node IP for streaming file sink

Reply via email to