Hey, can you measure how fast jmeter is able to push data into Kafka? Maybe that is already the bottleneck.
Flink should be able to read from Kafka with 100k+ elements/second on a single node. On Mon, Jun 29, 2015 at 11:10 AM, Stephan Ewen <se...@apache.org> wrote: > Hi Hawin! > > The performance tuning of Kafka is much trickier than that of Flink. Your > performance bottleneck may be Kafka at this point, not Flink. > To make Kafka fast, make sure you have the right setup for the data > directories, and you set up zookeeper properly (for good throughput). > > To test the Kafka throughput in isolation, push data into Kafka and just > consume it with a command line client that pipes to /dev/null > > Greetings, > Stephan > > > > > On Mon, Jun 29, 2015 at 9:56 AM, Hawin Jiang <hawin.ji...@gmail.com> > wrote: > >> Dear Marton >> >> >> >> Thanks for your asking. Yes. it is working now. >> >> But, the TPS is not very good. I have met four issues as below >> >> >> >> 1. My TPS around 2000 events per second. But I saw some >> companies achieved 132K per second on single node at 2015 Los Angeles big >> data day yesterday. For two nodes, the TPS is 282K per sec. them used >> kafka+Spark. >> >> As you knew that I used kafka+Flink. Maybe we have to do more >> investigations from my side. >> >> >> >> 2. Regarding my performance testing, I used JMeter to producer >> data to Kafka. The total messages in JMeter side is not matched HDFS >> side. In the meantime, I used flink to write data to HDFS. >> >> >> >> 3. I found that Flink randomly created 1, 2, 3 and 4 folders. Only >> 1 and 4 folders have files. The 2 and 3 folders don’t have any files. >> >> >> >> 4. I am going to develop some codes to write data to >> /data/flink/year/month/day/hour folder. I think that folder structure is >> good for flink table API in the future. >> >> >> >> Please let me know if you have some comments or suggests for me. >> >> Thanks. >> >> >> >> >> >> >> >> Best regards >> >> Hawin >> >> >> >> *From:* Márton Balassi [mailto:balassi.mar...@gmail.com] >> *Sent:* Sunday, June 28, 2015 9:09 PM >> *To:* user@flink.apache.org >> *Subject:* Re: Best way to write data to HDFS by Flink >> >> >> >> Dear Hawin, >> >> >> >> As for your issues with running the Flink Kafka examples: are those >> resolved with Aljoscha's comment in the other thread? :) >> >> >> >> Best, >> >> >> >> Marton >> >> >> >> On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <hawin.ji...@gmail.com> >> wrote: >> >> Hi Stephan >> >> >> >> Yes, that is a great idea. if it is possible, I will try my best to >> contribute some codes to Flink. >> >> But I have to run some flink examples first to understand Apache Flink. >> >> I just run some kafka with flink examples. No examples working for me. >> I am so sad right now. >> >> I didn't get any troubles to run kafka examples from *kafka*.apache.org >> so far. >> >> Please suggest me. >> >> Thanks. >> >> >> >> >> >> >> >> Best regards >> >> Hawin >> >> >> >> >> >> On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <se...@apache.org> wrote: >> >> Hi Hawin! >> >> >> >> If you are creating code for such an output into different >> files/partitions, it would be amazing if you could contribute this code to >> Flink. >> >> >> >> It seems like a very common use case, so this functionality will be >> useful to other user as well! >> >> >> >> Greetings, >> Stephan >> >> >> >> >> >> On Tue, Jun 23, 2015 at 3:36 PM, Márton Balassi <balassi.mar...@gmail.com> >> wrote: >> >> Dear Hawin, >> >> >> >> We do not have out of the box support for that, it is something you would >> need to implement yourself in a custom SinkFunction. >> >> >> >> Best, >> >> >> >> Marton >> >> >> >> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.ji...@gmail.com> >> wrote: >> >> Hi Marton >> >> >> >> if we received a huge data from kafka and wrote to HDFS immediately. We >> should use buffer timeout based on your URL >> >> I am not sure you have flume experience. Flume can be configured buffer >> size and partition as well. >> >> >> >> What is the partition. >> >> For example: >> >> I want to write 1 minute buffer file to HDFS which is >> /data/flink/year=2015/month=06/day=22/hour=21. >> >> if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is there, >> no need to create it. Otherwise, flume will create it automatically. >> >> Flume knows the coming data will come to right partition. >> >> >> >> I am not sure Flink also provided a similar partition API or >> configuration for this. >> >> Thanks. >> >> >> >> >> >> >> >> Best regards >> >> Hawin >> >> >> >> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.ji...@gmail.com> >> wrote: >> >> Thanks Marton >> >> I will use this code to implement my testing. >> >> >> >> >> >> >> >> Best regards >> >> Hawin >> >> >> >> On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi <balassi.mar...@gmail.com> >> wrote: >> >> Dear Hawin, >> >> >> >> You can pass a hdfs path to DataStream's and DataSet's writeAsText and >> writeAsCsv methods. >> >> I assume that you are running a Streaming topology, because your source >> is Kafka, so it would look like the following: >> >> >> >> StreamExecutionEnvironment env = >> StreamExecutionEnvironment.getExecutionEnvironment(); >> >> >> >> env.addSource(PerisitentKafkaSource(..)) >> >> .map(/* do you operations*/) >> >> >> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/file"); >> >> >> Check out the relevant section of the streaming docs for more info. [1] >> >> >> >> [1] >> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world >> >> >> >> Best, >> >> >> >> Marton >> >> >> >> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.ji...@gmail.com> >> wrote: >> >> Hi All >> >> >> >> Can someone tell me what is the best way to write data to HDFS when Flink >> received data from Kafka? >> >> Big thanks for your example. >> >> >> >> >> >> >> >> >> >> Best regards >> >> Hawin >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >