Re: Best way to write data to HDFS by Flink

Márton Balassi Sun, 28 Jun 2015 21:10:34 -0700

Dear Hawin,

As for your issues with running the Flink Kafka examples: are those
resolved with Aljoscha's comment in the other thread? :)


Best,

Marton

On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <hawin.ji...@gmail.com> wrote:

> Hi Stephan
>
> Yes, that is a great idea.  if it is possible,  I will try my best to
> contribute some codes to Flink.
> But I have to run some flink examples first to understand Apache Flink.
> I just run some kafka with flink examples.  No examples working for me.
> I am so sad right now.
> I didn't get any troubles to run kafka examples from *kafka*.apache.org
> so far.
> Please suggest me.
> Thanks.
>
>
>
> Best regards
> Hawin
>
>
> On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Hawin!
>>
>> If you are creating code for such an output into different
>> files/partitions, it would be amazing if you could contribute this code to
>> Flink.
>>
>> It seems like a very common use case, so this functionality will be
>> useful to other user as well!
>>
>> Greetings,
>> Stephan
>>
>>
>> On Tue, Jun 23, 2015 at 3:36 PM, Márton Balassi <balassi.mar...@gmail.com
>> > wrote:
>>
>>> Dear Hawin,
>>>
>>> We do not have out of the box support for that, it is something you
>>> would need to implement yourself in a custom SinkFunction.
>>>
>>> Best,
>>>
>>> Marton
>>>
>>> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.ji...@gmail.com>
>>> wrote:
>>>
>>>> Hi  Marton
>>>>
>>>> if we received a huge data from kafka and wrote to HDFS immediately.
>>>> We should use buffer timeout based on your URL
>>>> I am not sure you have flume experience.  Flume can be configured
>>>> buffer size and partition as well.
>>>>
>>>> What is the partition.
>>>> For example:
>>>> I want to write 1 minute buffer file to HDFS which is
>>>> /data/flink/year=2015/month=06/day=22/hour=21.
>>>> if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is
>>>> there, no need to create it. Otherwise, flume will create it automatically.
>>>> Flume knows the coming data will come to right partition.
>>>>
>>>> I am not sure Flink also provided a similar partition API or
>>>> configuration for this.
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> Best regards
>>>> Hawin
>>>>
>>>> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.ji...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Marton
>>>>> I will use this code to implement my testing.
>>>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>> Hawin
>>>>>
>>>>> On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi <
>>>>> balassi.mar...@gmail.com> wrote:
>>>>>
>>>>>> Dear Hawin,
>>>>>>
>>>>>> You can pass a hdfs path to DataStream's and DataSet's writeAsText
>>>>>> and writeAsCsv methods.
>>>>>> I assume that you are running a Streaming topology, because your
>>>>>> source is Kafka, so it would look like the following:
>>>>>>
>>>>>> StreamExecutionEnvironment env =
>>>>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>>>>
>>>>>> env.addSource(PerisitentKafkaSource(..))
>>>>>>       .map(/* do you operations*/)
>>>>>>
>>>>>> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/file");
>>>>>>
>>>>>> Check out the relevant section of the streaming docs for more info.
>>>>>> [1]
>>>>>>
>>>>>> [1]
>>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Marton
>>>>>>
>>>>>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.ji...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can someone tell me what is the best way to write data to HDFS when
>>>>>>> Flink received data from Kafka?
>>>>>>>
>>>>>>> Big thanks for your example.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> Hawin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Best way to write data to HDFS by Flink

Reply via email to