Re: Providing hdfs name node IP for streaming file sink

Vishwas Siravara Tue, 03 Mar 2020 07:43:24 -0800

Thanks Yang. Going with setting the HADOOP_CONF_DIR in the flink
application. It integrates neatly with flink.


Best,
Nick.

On Mon, Mar 2, 2020 at 7:42 PM Yang Wang <danrtsey...@gmail.com> wrote:

> It may work. However, you need to set your own retry policy(similar as
> `ConfiguredFailoverProxyProvider` in hadoop).
> Also if you directly use namenode address and do not load HDFS
> configuration, some HDFS client configuration (e.g.
> dfs.client.*) will not take effect.
>
>
> Best,
> Yang
>
> Nick Bendtner <buggi...@gmail.com> 于2020年3月2日周一 下午11:58写道：
>
>> Thanks a lot Yang. What are your thoughts on catching the exception when
>> a name node is down and retrying with the secondary name node ?
>>
>> Best,
>> Nick.
>>
>> On Sun, Mar 1, 2020 at 9:05 PM Yang Wang <danrtsey...@gmail.com> wrote:
>>
>>> Hi Nick,
>>>
>>> Certainly you could directly use "namenode:port" as the schema of you
>>> HDFS path.
>>> Then the hadoop configs(e.g. core-site.xml, hdfs-site.xml) will not be
>>> necessary.
>>> However, that also means you could benefit from the HDFS
>>> high-availability[1].
>>>
>>> If your HDFS cluster is HA configured, i strongly suggest you to set the
>>> "HADOOP_CONF_DIR"
>>> for your Flink application. Both the client and cluster(JM/TM) side need
>>> to be set. Then
>>> your HDFS path could be specified like this "hdfs://myhdfs/flink/test".
>>> Given that "myhdfs"
>>> is the name service configured in hdfs-site.xml.
>>>
>>>
>>> Best,
>>> Yang
>>>
>>>
>>>
>>> [1].
>>> http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
>>>
>>> Nick Bendtner <buggi...@gmail.com> 于2020年2月29日周六 上午6:00写道：
>>>
>>>> To add to this question, do I need to setup env.hadoop.conf.dir to
>>>> point to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/
>>>> for the jvm ? Or is it possible to write to hdfs without any external
>>>> hadoop config like core-site.xml, hdfs-site.xml ?
>>>>
>>>> Best,
>>>> Nick.
>>>>
>>>>
>>>>
>>>> On Fri, Feb 28, 2020 at 12:56 PM Nick Bendtner <buggi...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>> I am trying to write to hdfs from streaming file sink. Where should I
>>>>> provide the IP address of the name node ? Can I provide it as a part of 
>>>>> the
>>>>> flink-config.yaml file or should I provide it like this :
>>>>>
>>>>> final StreamingFileSink<GenericRecord> sink = StreamingFileSink
>>>>>   .forBulkFormat(hdfs://namenode:8020/flink/test, 
>>>>> ParquetAvroWriters.forGenericRecord(schema))
>>>>>
>>>>>   .build();
>>>>>
>>>>>
>>>>> Best,
>>>>> Nick
>>>>>
>>>>>
>>>>>

Re: Providing hdfs name node IP for streaming file sink

Reply via email to