Hi Caleb, I usually do './gradlew installAll' first and that places all the jars in my local maven repo in ~/.m2/repository.
Eno > On 17 Sep 2016, at 00:30, Caleb Welton <[email protected]> wrote: > > Is there a specific way that I need to build kafka for that to work? > > bash$ export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh > org.apache.kafka.streams.perf.SimpleBenchmark > Error: Could not find or load main class > org.apache.kafka.streams.perf.SimpleBenchmark > > bash$ find . -name SimpleBenchmark.java > ./streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java > > bash$ find . -name SimpleBenchmark.class > > bash$ jar -tf streams/build/libs/kafka-streams-0.10.1.0-SNAPSHOT.jar | grep > SimpleBenchmark > > > > On Sat, Sep 10, 2016 at 6:18 AM, Eno Thereska <[email protected]> > wrote: > >> Hi Caleb, >> >> We have a benchmark that we run nightly to keep track of performance. The >> numbers we have do indicate that consuming through streams is indeed slower >> than just a pure consumer, however the performance difference is not as >> large as you are observing. Would it be possible for you to run this >> benchmark in your environment and report the numbers? The benchmark is >> included with Kafka Streams and you run it like this: >> >> export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh >> org.apache.kafka.streams.perf.SimpleBenchmark >> >> You'll need a Kafka broker to be running. The benchmark will report >> various numbers, but one of them is the performance of the consumer and the >> performance of streams reading. >> >> Thanks >> Eno >> >>> On 9 Sep 2016, at 18:19, Caleb Welton <[email protected]> >> wrote: >>> >>> Same in both cases: >>> client.id=Test-Prototype >>> application.id=test-prototype >>> >>> group.id=test-consumer-group >>> bootstrap.servers=broker1:9092,broker2:9092zookeeper.connect=zk1:2181 >>> replication.factor=2 >>> >>> auto.offset.reset=earliest >>> >>> >>> On Friday, September 9, 2016 8:48 AM, Eno Thereska < >> [email protected]> wrote: >>> >>> >>> >>> Hi Caleb, >>> >>> Could you share your Kafka Streams configuration (i.e., StreamsConfig >>> properties you might have set before the test)? >>> >>> Thanks >>> Eno >>> >>> >>> On Thu, Sep 8, 2016 at 12:46 AM, Caleb Welton <[email protected]> >> wrote: >>> >>>> I have a question with respect to the KafkaStreams API. >>>> >>>> I noticed during my prototyping work that my KafkaStreams application >> was >>>> not able to keep up with the input on the stream so I dug into it a bit >> and >>>> found that it was spending an inordinate amount of time in >>>> org.apache.kafka.common.network.Seloctor.select(). Not exactly a >> shooting >>>> gun itself, so I dropped the implementation down to a single processor >>>> reading off a source. >>>> >>>> public class TestProcessor extends AbstractProcessor<String, String> { >>>> static long start = -1; >>>> static long count = 0; >>>> >>>> @Override >>>> public void process(String key, String value) { >>>> if (start < 0) { >>>> start = System.currentTimeMillis(); >>>> } >>>> count += 1; >>>> if (count > 1000000) { >>>> long end = System.currentTimeMillis(); >>>> double time = (end-start)/1000.0; >>>> System.out.printf("Processed %d records in %f seconds (%f >>>> records/s)\n", count, time, count/time); >>>> start = -1; >>>> count = 0; >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> ... >>>> >>>> >>>> TopologyBuilder topologyBuilder = new TopologyBuilder(); >>>> topologyBuilder >>>> .addSource("SOURCE", stringDeserializer, StringDeserializer, >>>> "input") >>>> .addProcessor("PROCESS", TestProcessor::new, "SOURCE"); >>>> >>>> >>>> >>>> Which I then ran through the KafkaStreams API, and then repeated with >>>> the KafkaConsumer API. >>>> >>>> Using the KafkaConsumer API: >>>> Processed 1000001 records in 1.790000 seconds (558659.776536 records/s) >>>> Processed 1000001 records in 1.229000 seconds (813670.463792 records/s) >>>> Processed 1000001 records in 1.106000 seconds (904160.036166 records/s) >>>> Processed 1000001 records in 1.190000 seconds (840336.974790 records/s) >>>> >>>> Using the KafkaStreams API: >>>> Processed 1000001 records in 6.407000 seconds (156079.444358 records/s) >>>> Processed 1000001 records in 5.256000 seconds (190258.942161 records/s) >>>> Processed 1000001 records in 5.141000 seconds (194514.880373 records/s) >>>> Processed 1000001 records in 5.111000 seconds (195656.622970 records/s) >>>> >>>> >>>> The profile on the KafkaStreams consisted of: >>>> >>>> 89.2% org.apache.kafka.common.network.Selector.select() >>>> 7.6% org.apache.kafka.clients.producer.internals. >>>> ProduceRequestResult.await() >>>> 0.8% org.apach.kafka.common.network.PlaintextTransportLayer.read() >>>> >>>> >>>> Is a 5X performance difference between Kafka Consumer and the >>>> KafkaStreams api expected? >>>> >>>> Are there specific things I can do to diagnose/tune the system? >>>> >>>> Thanks, >>>> Caleb >>>> >> >>
