artition` guarantee that a
>> single worker will process the passed function? Or might two workers split
>> the work?
>>
>> Eg. foreachPartition(f)
>>
>> Worker 1: f( Iterator[partition 1 records 1 - 50] )
>> Worker 2: f( Iterator[partition 1 record
ords 1 - 50] )
> Worker 2: f( Iterator[partition 1 records 51 - 100] )
>
> It is unclear from the scaladocs
> (
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
> ).
> But you can imagine, if it is critical that this data be committed in
critical that this data be committed in a
single transaction, that two workers will have issues.
-- Will O
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/practical-usage-of-the-new-exactly-once-supporting-DirectKafkaInputDStream-tp11916p12257.html
gt;>>
>>>> I have tried moving the reduceByKey from the end of the .transform block
>>>> into the partition level (at the end of the mapPartitionsWithIndex
>>>> block.)
>>>> This is what you're suggesting, yes? The results didn't cor
ggesting, yes? The results didn't correct the
>>> offset
>>> update behavior; they still get out of sync pretty quickly.
>>>
>>> Some details: I'm using the kafka-console-producer.sh tool to drive the
>>> process, calling it three or four times in succession
ll give this a try. Thanks, Cody.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/practical-usage-of-the-new-exactly-once-supporting-DirectKafkaInputDStream-tp11916p11928.html
Sent from the Apache Spark Developers List mailing list
; messages in each call. Once all the messages have been processed I wait
>> for
>> the output of the printOffsets method to stop changing and compare it to
>> the
>> txn_offsets table. (When no data is getting processed the printOffsets
>> method yields something like the
setRange(topic:
> 'testmulti', partition: 1, range: [23602 -> 23602] OffsetRange(topic:
> 'testmulti', partition: 2, range: [32503 -> 32503] OffsetRange(topic:
> 'testmulti', partition: 0, range: [261
nge: [32503 -> 32503] OffsetRange(topic:
'testmulti', partition: 0, range: [26100 -> 26100] OffsetRange(topic:
'testmulti', partition: 3, range: [20900 -> 20900]])
Thanks,
Mark
--
View this message in context:
http://apache-spark-developer
}
> writeOffset(currentOffset) // updates the offset positions
> }
> db.close()
> }
> }
>
> Thanks,
> Mark
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/
}
db.close()
}
}
Thanks,
Mark
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/practical-usage-of-the-new-exactly-once-supporting-DirectKafkaInputDStream-tp1
11 matches
Mail list logo