Re: Best way to write data to HDFS by Flink

Márton Balassi Tue, 23 Jun 2015 06:38:26 -0700

Dear Hawin,

We do not have out of the box support for that, it is something you would
need to implement yourself in a custom SinkFunction.


Best,

Marton

On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.ji...@gmail.com> wrote:

> Hi  Marton
>
> if we received a huge data from kafka and wrote to HDFS immediately.  We
> should use buffer timeout based on your URL
> I am not sure you have flume experience.  Flume can be configured buffer
> size and partition as well.
>
> What is the partition.
> For example:
> I want to write 1 minute buffer file to HDFS which is
> /data/flink/year=2015/month=06/day=22/hour=21.
> if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is there,
> no need to create it. Otherwise, flume will create it automatically.
> Flume knows the coming data will come to right partition.
>
> I am not sure Flink also provided a similar partition API or configuration
> for this.
> Thanks.
>
>
>
> Best regards
> Hawin
>
> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.ji...@gmail.com>
> wrote:
>
>> Thanks Marton
>> I will use this code to implement my testing.
>>
>>
>>
>> Best regards
>> Hawin
>>
>> On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi <balassi.mar...@gmail.com
>> > wrote:
>>
>>> Dear Hawin,
>>>
>>> You can pass a hdfs path to DataStream's and DataSet's writeAsText and
>>> writeAsCsv methods.
>>> I assume that you are running a Streaming topology, because your source
>>> is Kafka, so it would look like the following:
>>>
>>> StreamExecutionEnvironment env =
>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>
>>> env.addSource(PerisitentKafkaSource(..))
>>>       .map(/* do you operations*/)
>>>
>>> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/file");
>>>
>>> Check out the relevant section of the streaming docs for more info. [1]
>>>
>>> [1]
>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world
>>>
>>> Best,
>>>
>>> Marton
>>>
>>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.ji...@gmail.com>
>>> wrote:
>>>
>>>> Hi All
>>>>
>>>>
>>>>
>>>> Can someone tell me what is the best way to write data to HDFS when
>>>> Flink received data from Kafka?
>>>>
>>>> Big thanks for your example.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best regards
>>>>
>>>> Hawin
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Best way to write data to HDFS by Flink

Reply via email to