TD, can you please comment on this code?
I am really interested in including this code in Spark. But i am bothering about some point about persistence: 1. When we extend Receiver and call store, is it blocking call? Does it return only when spark stores rdd as requested (i.e. replicated or on disk)? Or is there some buffer allowing dataloss? In the latter case, can we have some callback telling to proceed with storing offsets. 2. I saw you implemented some rate limiting. Can you clarify how it works? In the face of network receiver getting data as fast as it can, and you liming this data in BM. what happens with exceeding data? Is it discarded? And if not, what happens? There is a lot of open questions how to make streaming reliable, and i have plenty of questions offlist. But i do not how to improve the code without spark support. > On 21 Aug 2014, at 16:17, Evgeniy Shishkin <itparan...@gmail.com> wrote: > > Hello, > > we are glad to announce yet another kafka input stream. > > Available at https://github.com/wgnet/spark-kafka-streaming > > It is used in production for about 3 months. > We will be happy to hear your feedback. > > Custom Spark Kafka consumer based on Kafka SimpleConsumer API. > > Features > > • discover kafka metadata from zookeeper (more reliable than from > brokers, does not depend on broker list changes) > • reding from multiple topics > • reliably handles leader election and topic reassignment > • saves offsets and stream metadata in hbase (more robust than > zookeeper) > • supports metrics via spark metrics mechanism (jmx, graphite, etc.) > Todo > > • abstract offset storage > • time controlled offsets commit > • refactor kafka message to rdd elements transformation (flatmapper > method) > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org