Re: Best way to write data to HDFS by Flink

Hawin Jiang Thu, 25 Jun 2015 23:42:06 -0700

Hi Stephan

Yes, that is a great idea.  if it is possible,  I will try my best to
contribute some codes to Flink.
But I have to run some flink examples first to understand Apache Flink.
I just run some kafka with flink examples.  No examples working for me.   I
am so sad right now.
I didn't get any troubles to run kafka examples from *kafka*.apache.org so
far.
Please suggest me.
Thanks.




Best regards
Hawin


On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi Hawin!
>
> If you are creating code for such an output into different
> files/partitions, it would be amazing if you could contribute this code to
> Flink.
>
> It seems like a very common use case, so this functionality will be useful
> to other user as well!
>
> Greetings,
> Stephan
>
>
> On Tue, Jun 23, 2015 at 3:36 PM, Márton Balassi <balassi.mar...@gmail.com>
> wrote:
>
>> Dear Hawin,
>>
>> We do not have out of the box support for that, it is something you would
>> need to implement yourself in a custom SinkFunction.
>>
>> Best,
>>
>> Marton
>>
>> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.ji...@gmail.com>
>> wrote:
>>
>>> Hi  Marton
>>>
>>> if we received a huge data from kafka and wrote to HDFS immediately.  We
>>> should use buffer timeout based on your URL
>>> I am not sure you have flume experience.  Flume can be configured buffer
>>> size and partition as well.
>>>
>>> What is the partition.
>>> For example:
>>> I want to write 1 minute buffer file to HDFS which is
>>> /data/flink/year=2015/month=06/day=22/hour=21.
>>> if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is
>>> there, no need to create it. Otherwise, flume will create it automatically.
>>> Flume knows the coming data will come to right partition.
>>>
>>> I am not sure Flink also provided a similar partition API or
>>> configuration for this.
>>> Thanks.
>>>
>>>
>>>
>>> Best regards
>>> Hawin
>>>
>>> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.ji...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Marton
>>>> I will use this code to implement my testing.
>>>>
>>>>
>>>>
>>>> Best regards
>>>> Hawin
>>>>
>>>> On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi <
>>>> balassi.mar...@gmail.com> wrote:
>>>>
>>>>> Dear Hawin,
>>>>>
>>>>> You can pass a hdfs path to DataStream's and DataSet's writeAsText and
>>>>> writeAsCsv methods.
>>>>> I assume that you are running a Streaming topology, because your
>>>>> source is Kafka, so it would look like the following:
>>>>>
>>>>> StreamExecutionEnvironment env =
>>>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>>>
>>>>> env.addSource(PerisitentKafkaSource(..))
>>>>>       .map(/* do you operations*/)
>>>>>
>>>>> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/file");
>>>>>
>>>>> Check out the relevant section of the streaming docs for more info. [1]
>>>>>
>>>>> [1]
>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world
>>>>>
>>>>> Best,
>>>>>
>>>>> Marton
>>>>>
>>>>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.ji...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can someone tell me what is the best way to write data to HDFS when
>>>>>> Flink received data from Kafka?
>>>>>>
>>>>>> Big thanks for your example.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Hawin
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Best way to write data to HDFS by Flink

Reply via email to