Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-05-14 Thread Cody Koeninger
artition` guarantee that a >> single worker will process the passed function? Or might two workers split >> the work? >> >> Eg. foreachPartition(f) >> >> Worker 1: f( Iterator[partition 1 records 1 - 50] ) >> Worker 2: f( Iterator[partition 1 record

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-05-14 Thread Cody Koeninger
ords 1 - 50] ) > Worker 2: f( Iterator[partition 1 records 51 - 100] ) > > It is unclear from the scaladocs > ( > https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD > ). > But you can imagine, if it is critical that this data be committed in

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-05-14 Thread will-ob
critical that this data be committed in a single transaction, that two workers will have issues. -- Will O -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/practical-usage-of-the-new-exactly-once-supporting-DirectKafkaInputDStream-tp11916p12257.html

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-05-05 Thread Cody Koeninger
gt;>> >>>> I have tried moving the reduceByKey from the end of the .transform block >>>> into the partition level (at the end of the mapPartitionsWithIndex >>>> block.) >>>> This is what you're suggesting, yes? The results didn't cor

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-05-05 Thread Mark Stewart
ggesting, yes? The results didn't correct the >>> offset >>> update behavior; they still get out of sync pretty quickly. >>> >>> Some details: I'm using the kafka-console-producer.sh tool to drive the >>> process, calling it three or four times in succession

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread badgerpants
ll give this a try. Thanks, Cody. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/practical-usage-of-the-new-exactly-once-supporting-DirectKafkaInputDStream-tp11916p11928.html Sent from the Apache Spark Developers List mailing list

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread Cody Koeninger
; messages in each call. Once all the messages have been processed I wait >> for >> the output of the printOffsets method to stop changing and compare it to >> the >> txn_offsets table. (When no data is getting processed the printOffsets >> method yields something like the

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread Cody Koeninger
setRange(topic: > 'testmulti', partition: 1, range: [23602 -> 23602] OffsetRange(topic: > 'testmulti', partition: 2, range: [32503 -> 32503] OffsetRange(topic: > 'testmulti', partition: 0, range: [261

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread badgerpants
nge: [32503 -> 32503] OffsetRange(topic: 'testmulti', partition: 0, range: [26100 -> 26100] OffsetRange(topic: 'testmulti', partition: 3, range: [20900 -> 20900]]) Thanks, Mark -- View this message in context: http://apache-spark-developer

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread Cody Koeninger
} > writeOffset(currentOffset) // updates the offset positions > } > db.close() > } > } > > Thanks, > Mark > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/

practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread badgerpants
} db.close() } } Thanks, Mark -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/practical-usage-of-the-new-exactly-once-supporting-DirectKafkaInputDStream-tp1