Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-11 Thread Amit Ramesh
Congratulations on the release of 1.4! I have been trying out the direct Kafka support in python but haven't been able to figure out how to get the offsets from the RDD. Looks like the documentation is yet to be updated to include Python examples ( https://spark.apache.org/docs/latest/streaming-ka

Re: Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-11 Thread Amit Ramesh
Hi, > > What is your meaning of getting the offsets from the RDD, from my > understanding, the offsetRange is a parameter you offered to KafkaRDD, why > do you still want to get the one previous you set into? > > Thanks > Jerry > > 2015-06-12 12:36 GMT+08:00 Amit Ramesh :

Re: Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-11 Thread Amit Ramesh
I think currently Python based Kafka direct API do not > provide such equivalence like Scala, maybe we should figure out to add this > into Python API also. > > 2015-06-12 13:48 GMT+08:00 Amit Ramesh : > >> >> Hi Jerry, >> >> Take a look at this example: >>

Re: Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-12 Thread Amit Ramesh
t 1:05 AM, Saisai Shao >> wrote: >> >>> Scala KafkaRDD uses a trait to handle this problem, but it is not so >>> easy and straightforward in Python, where we need to have a specific API to >>> handle this, I'm not sure is there any simple workaround to fi

How does one use s3 for checkpointing?

2015-09-21 Thread Amit Ramesh
A lot of places in the documentation mention using s3 for checkpointing, however I haven't found any examples or concrete evidence of anyone having done this. 1. Is this a safe/reliable option given the read-after-write consistency for PUTS in s3? 2. Is s3 access broken for hadoop 2.6 (SP