Rado, Yes. you are correct. A lots of messages are created almost in the same time (even use milliseconds). I changed to use "UUID.randomUUID()" with which all messages can be inserted in the Cassandra table without time lag.
Thank you very much! Jerry Wong On Wed, Feb 17, 2016 at 1:50 AM, radoburansky [via Apache Spark User List] < ml-node+s1001560n26246...@n3.nabble.com> wrote: > Hi Jerry, > > How do you know that only 100 messages are inserted? What is the primary > key of the "tableOfTopicA" Cassandra table? Isn't it possible that you > map more messages to the same primamary key and therefore they overwrite > each other in Cassandra? > > Regards > > Rado > > On Tue, Feb 16, 2016 at 10:29 PM, Jerry [via Apache Spark User List] <[hidden > email] <http:///user/SendEmail.jtp?type=node&node=26246&i=0>> wrote: > >> Hello, >> >> I have questions using Spark streaming to consume data from Kafka and >> insert to Cassandra database. >> >> 5 AWS instances (each one does have 8 cores, 30GB memory) for Spark, >> Hadoop, Cassandra >> Scala: 2.10.5 >> Spark: 1.2.2 >> Hadoop: 1.2.1 >> Cassandra 2.0.18 >> >> 3 AWS instances for Kafka cluster (each one does have 8 cores, 30GB >> memory) >> Kafka: 0.8.2.1 >> Zookeeper: 3.4.6 >> >> Other configurations: >> batchInterval = 6 Seconds >> blockInterval = 1500 millis >> spark.locality.wait = 500 millis >> #Consumers = 10 >> >> There are two columns in the cassandra table >> keySpaceOfTopicA.tableOfTopicA, "createdtime" and "log". >> >> Here is a piece of codes, >> >> @transient val kstreams = (1 to numConsumers.toInt).map { _ => >> KafkaUtils.createStream(ssc, zkeeper, groupId, Map("topicA"->1), >> StorageLevel.MEMORY_AND_DISK_SER) >> .map(_._2.toString).map(Tuple1(_)) >> .map{case(log) => (System.currentTimeMillis(), log)} >> } >> @transient val unifiedMessage = ssc.union(kstreams) >> >> unifiedMessage.saveToCassandra("keySpaceOfTopicA", "tableOfTopicA", >> SomeColumns("createdtime", "log")) >> >> I created a producer and send messages to Brokers (1000 messages/per >> time) >> >> But the Cassandra can only be inserted about 100 messages in each round >> of test. >> Can anybody give me advices why the other messages (about 900 message) >> can't be consumed? >> How do I configure and tune the parameters in order to improve the >> throughput of consumers? >> >> Thank you very much for your reading and suggestions in advances. >> >> Jerry Wong >> >> ------------------------------ >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/Optimize-the-performance-of-inserting-data-to-Cassandra-with-Kafka-and-Spark-Streaming-tp26244.html >> To start a new topic under Apache Spark User List, email [hidden email] >> <http:///user/SendEmail.jtp?type=node&node=26246&i=1> >> To unsubscribe from Apache Spark User List, click here. >> NAML >> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Optimize-the-performance-of-inserting-data-to-Cassandra-with-Kafka-and-Spark-Streaming-tp26244p26246.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Optimize the performance of inserting data to > Cassandra with Kafka and Spark Streaming, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=26244&code=amVycnkua2luZzIud29uZ0BnbWFpbC5jb218MjYyNDR8MTYwMzcyMjg3MQ==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Optimize-the-performance-of-inserting-data-to-Cassandra-with-Kafka-and-Spark-Streaming-tp26244p26252.html Sent from the Apache Spark User List mailing list archive at Nabble.com.