Dear Hawin, As for your issues with running the Flink Kafka examples: are those resolved with Aljoscha's comment in the other thread? :)
Best, Marton On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <hawin.ji...@gmail.com> wrote: > Hi Stephan > > Yes, that is a great idea. if it is possible, I will try my best to > contribute some codes to Flink. > But I have to run some flink examples first to understand Apache Flink. > I just run some kafka with flink examples. No examples working for me. > I am so sad right now. > I didn't get any troubles to run kafka examples from *kafka*.apache.org > so far. > Please suggest me. > Thanks. > > > > Best regards > Hawin > > > On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <se...@apache.org> wrote: > >> Hi Hawin! >> >> If you are creating code for such an output into different >> files/partitions, it would be amazing if you could contribute this code to >> Flink. >> >> It seems like a very common use case, so this functionality will be >> useful to other user as well! >> >> Greetings, >> Stephan >> >> >> On Tue, Jun 23, 2015 at 3:36 PM, Márton Balassi <balassi.mar...@gmail.com >> > wrote: >> >>> Dear Hawin, >>> >>> We do not have out of the box support for that, it is something you >>> would need to implement yourself in a custom SinkFunction. >>> >>> Best, >>> >>> Marton >>> >>> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.ji...@gmail.com> >>> wrote: >>> >>>> Hi Marton >>>> >>>> if we received a huge data from kafka and wrote to HDFS immediately. >>>> We should use buffer timeout based on your URL >>>> I am not sure you have flume experience. Flume can be configured >>>> buffer size and partition as well. >>>> >>>> What is the partition. >>>> For example: >>>> I want to write 1 minute buffer file to HDFS which is >>>> /data/flink/year=2015/month=06/day=22/hour=21. >>>> if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is >>>> there, no need to create it. Otherwise, flume will create it automatically. >>>> Flume knows the coming data will come to right partition. >>>> >>>> I am not sure Flink also provided a similar partition API or >>>> configuration for this. >>>> Thanks. >>>> >>>> >>>> >>>> Best regards >>>> Hawin >>>> >>>> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.ji...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Marton >>>>> I will use this code to implement my testing. >>>>> >>>>> >>>>> >>>>> Best regards >>>>> Hawin >>>>> >>>>> On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi < >>>>> balassi.mar...@gmail.com> wrote: >>>>> >>>>>> Dear Hawin, >>>>>> >>>>>> You can pass a hdfs path to DataStream's and DataSet's writeAsText >>>>>> and writeAsCsv methods. >>>>>> I assume that you are running a Streaming topology, because your >>>>>> source is Kafka, so it would look like the following: >>>>>> >>>>>> StreamExecutionEnvironment env = >>>>>> StreamExecutionEnvironment.getExecutionEnvironment(); >>>>>> >>>>>> env.addSource(PerisitentKafkaSource(..)) >>>>>> .map(/* do you operations*/) >>>>>> >>>>>> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/file"); >>>>>> >>>>>> Check out the relevant section of the streaming docs for more info. >>>>>> [1] >>>>>> >>>>>> [1] >>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world >>>>>> >>>>>> Best, >>>>>> >>>>>> Marton >>>>>> >>>>>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.ji...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi All >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can someone tell me what is the best way to write data to HDFS when >>>>>>> Flink received data from Kafka? >>>>>>> >>>>>>> Big thanks for your example. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Hawin >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >