Hey,
can you measure how fast jmeter is able to push data into Kafka? Maybe that
is already the bottleneck.

Flink should be able to read from Kafka with 100k+ elements/second on a
single node.

On Mon, Jun 29, 2015 at 11:10 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi Hawin!
>
> The performance tuning of Kafka is much trickier than that of Flink. Your
> performance bottleneck may be Kafka at this point, not Flink.
> To make Kafka fast, make sure you have the right setup for the data
> directories, and you set up zookeeper properly (for good throughput).
>
> To test the Kafka throughput in isolation, push data into Kafka and just
> consume it with a command line client that pipes to /dev/null
>
> Greetings,
> Stephan
>
>
>
>
> On Mon, Jun 29, 2015 at 9:56 AM, Hawin Jiang <hawin.ji...@gmail.com>
> wrote:
>
>> Dear  Marton
>>
>>
>>
>> Thanks for your asking.  Yes. it is working now.
>>
>> But, the TPS is not very good.   I have met four issues as below
>>
>>
>>
>> 1.       My TPS around 2000 events per second.   But I saw some
>> companies achieved 132K per second on single node at 2015 Los Angeles big
>> data day yesterday.   For two nodes, the TPS is 282K per sec.  them used
>> kafka+Spark.
>>
>> As you knew  that I used kafka+Flink. Maybe we have to do more
>> investigations from my side.
>>
>>
>>
>> 2.       Regarding my performance testing, I used JMeter to producer
>> data to Kafka.  The total messages in JMeter side is not matched HDFS
>> side.   In the meantime, I used flink to write data to HDFS.
>>
>>
>>
>> 3.       I found that Flink randomly created 1, 2, 3 and 4 folders. Only
>> 1 and 4 folders have files.  The 2 and 3 folders don’t have any files.
>>
>>
>>
>> 4.       I am going to develop some codes to write data to
>> /data/flink/year/month/day/hour folder.  I think that folder structure is
>> good for flink table API in the future.
>>
>>
>>
>> Please let me know if you have some comments or suggests for me.
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>>
>>
>> *From:* Márton Balassi [mailto:balassi.mar...@gmail.com]
>> *Sent:* Sunday, June 28, 2015 9:09 PM
>> *To:* user@flink.apache.org
>> *Subject:* Re: Best way to write data to HDFS by Flink
>>
>>
>>
>> Dear Hawin,
>>
>>
>>
>> As for your issues with running the Flink Kafka examples: are those
>> resolved with Aljoscha's comment in the other thread? :)
>>
>>
>>
>> Best,
>>
>>
>>
>> Marton
>>
>>
>>
>> On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <hawin.ji...@gmail.com>
>> wrote:
>>
>> Hi Stephan
>>
>>
>>
>> Yes, that is a great idea.  if it is possible,  I will try my best to
>> contribute some codes to Flink.
>>
>> But I have to run some flink examples first to understand Apache Flink.
>>
>> I just run some kafka with flink examples.  No examples working for me.
>> I am so sad right now.
>>
>> I didn't get any troubles to run kafka examples from *kafka*.apache.org
>> so far.
>>
>> Please suggest me.
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>>
>>
>>
>>
>> On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <se...@apache.org> wrote:
>>
>> Hi Hawin!
>>
>>
>>
>> If you are creating code for such an output into different
>> files/partitions, it would be amazing if you could contribute this code to
>> Flink.
>>
>>
>>
>> It seems like a very common use case, so this functionality will be
>> useful to other user as well!
>>
>>
>>
>> Greetings,
>> Stephan
>>
>>
>>
>>
>>
>> On Tue, Jun 23, 2015 at 3:36 PM, Márton Balassi <balassi.mar...@gmail.com>
>> wrote:
>>
>> Dear Hawin,
>>
>>
>>
>> We do not have out of the box support for that, it is something you would
>> need to implement yourself in a custom SinkFunction.
>>
>>
>>
>> Best,
>>
>>
>>
>> Marton
>>
>>
>>
>> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.ji...@gmail.com>
>> wrote:
>>
>> Hi  Marton
>>
>>
>>
>> if we received a huge data from kafka and wrote to HDFS immediately.  We
>> should use buffer timeout based on your URL
>>
>> I am not sure you have flume experience.  Flume can be configured buffer
>> size and partition as well.
>>
>>
>>
>> What is the partition.
>>
>> For example:
>>
>> I want to write 1 minute buffer file to HDFS which is
>> /data/flink/year=2015/month=06/day=22/hour=21.
>>
>> if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is there,
>> no need to create it. Otherwise, flume will create it automatically.
>>
>> Flume knows the coming data will come to right partition.
>>
>>
>>
>> I am not sure Flink also provided a similar partition API or
>> configuration for this.
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>>
>>
>> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.ji...@gmail.com>
>> wrote:
>>
>> Thanks Marton
>>
>> I will use this code to implement my testing.
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>>
>>
>> On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi <balassi.mar...@gmail.com>
>> wrote:
>>
>> Dear Hawin,
>>
>>
>>
>> You can pass a hdfs path to DataStream's and DataSet's writeAsText and
>> writeAsCsv methods.
>>
>> I assume that you are running a Streaming topology, because your source
>> is Kafka, so it would look like the following:
>>
>>
>>
>> StreamExecutionEnvironment env =
>> StreamExecutionEnvironment.getExecutionEnvironment();
>>
>>
>>
>> env.addSource(PerisitentKafkaSource(..))
>>
>>       .map(/* do you operations*/)
>>
>>
>> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/file");
>>
>>
>> Check out the relevant section of the streaming docs for more info. [1]
>>
>>
>>
>> [1]
>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world
>>
>>
>>
>> Best,
>>
>>
>>
>> Marton
>>
>>
>>
>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.ji...@gmail.com>
>> wrote:
>>
>> Hi All
>>
>>
>>
>> Can someone tell me what is the best way to write data to HDFS when Flink
>> received data from Kafka?
>>
>> Big thanks for your example.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Reply via email to