Honestly, I would stay far away from saving offsets in Zookeeper if at
all possible. It's better to store them alongside your results.
On Wed, Oct 26, 2016 at 10:44 AM, Sunita Arvind wrote:
> This is enough to get it to work:
>
> df.save(conf.getString("ParquetOutputPath")+offsetSaved, "parquet",
This is enough to get it to work:
df.save(conf.getString("ParquetOutputPath")+offsetSaved, "parquet",
SaveMode.Overwrite)
And tests so far (in local env) seem good with the edits. Yet to test
on the cluster. Cody, appreciate your thoughts on the edits.
Just want to make sure I am not doing an ov
The error in the file I just shared is here:
val partitionOffsetPath:String = topicDirs.consumerOffsetDir + "/" +
partition._2(0); --> this was just partition and hence there was an
error
fetching the offset.
Still testing. Somehow Cody, your code never lead to file already
exists sort of error
Attached is the edited code. Am I heading in right direction? Also, I am
missing something due to which, it seems to work well as long as the
application is running and the files are created right. But as soon as I
restart the application, it goes back to fromOffset as 0. Any thoughts?
regards
Sun
Thanks for confirming Cody.
To get to use the library, I had to do:
val offsetsStore = new
ZooKeeperOffsetsStore(conf.getString("zkHosts"), "/consumers/topics/"+
topics + "/0")
It worked well. However, I had to specify the partitionId in the zkPath.
If I want the library to pick all the partition
You are correct that you shouldn't have to worry about broker id.
I'm honestly not sure specifically what else you are asking at this point.
On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind wrote:
> Just re-read the kafka architecture. Something that slipped my mind is, it
> is leader based. So to
Just re-read the kafka architecture. Something that slipped my mind is, it
is leader based. So topic/partitionId pair will be same on all the brokers.
So we do not need to consider brokerid while storing offsets. Still
exploring rest of the items.
regards
Sunita
On Tue, Oct 25, 2016 at 11:09 AM, S
Hello Experts,
I am trying to use the saving to ZK design. Just saw Sudhir's comments that
it is old approach. Any reasons for that? Any issues observed with saving
to ZK. The way we are planning to use it is:
1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka-
achieving-zero-data-lo
See
https://github.com/koeninger/kafka-exactly-once
On Aug 23, 2016 10:30 AM, "KhajaAsmath Mohammed"
wrote:
> Hi Experts,
>
> I am looking for some information on how to acheive zero data loss while
> working with kafka and Spark. I have searched online and blogs have
> different answer. Please l
saving offsets to zookeeper is old approach, check-pointing internally
saves the offsets to HDFS/location of checkpointing.
more details here:
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
On Tue, Aug 23, 2016 at 10:30 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wro
Hi Experts,
I am looking for some information on how to acheive zero data loss while
working with kafka and Spark. I have searched online and blogs have
different answer. Please let me know if anyone has idea on this.
Blog 1:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-
But compile scope is supposed to be added to the assembly.
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope
On Thu, Jun 4, 2015 at 1:24 PM, algermissen1971
wrote:
> Hi Iulian,
>
> On 26 May 2015, at 13:04, Iulian Dragoș
> wrote:
>
> >
> >
Hi Iulian,
On 26 May 2015, at 13:04, Iulian Dragoș wrote:
>
> On Tue, May 26, 2015 at 10:09 AM, algermissen1971
> wrote:
> Hi,
>
> I am setting up a project that requires Kafka support and I wonder what the
> roadmap is for Scala 2.11 Support (including Kafka).
>
> Can we expect to see 2.1
On Tue, May 26, 2015 at 10:09 AM, algermissen1971 <
algermissen1...@icloud.com> wrote:
> Hi,
>
> I am setting up a project that requires Kafka support and I wonder what
> the roadmap is for Scala 2.11 Support (including Kafka).
>
> Can we expect to see 2.11 support anytime soon?
>
The upcoming 1.
Hi,
I am setting up a project that requires Kafka support and I wonder what the
roadmap is for Scala 2.11 Support (including Kafka).
Can we expect to see 2.11 support anytime soon?
Jan
-
To unsubscribe, e-mail: user-unsubscr...
Take a look at
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md if
you haven't already.
If you're fine with saving offsets yourself, I'd stick with KafkaRDD, as
Koert said.
I haven't tried 2 hour stream batch durations, so I can't vouch for using
createDirectStream in that
I mean to say it is simpler in case of failures, restarts, upgrades, etc.
Not just failures.
But they did do a lot of work on streaming from kafka in spark 1.3.x to
make it simpler (streaming simple calls KafkaRDD for every batch if you use
KafkaUtils.createDirectStream), so maybe i am wrong and s
Yeah I think would pick the second approach because it is simpler
operationally in case of any failures. But of course the smaller the window
gets the more attractive the streaming solution gets.
We do daily extracts, not every 2 hours.
On Sat, Apr 18, 2015 at 2:57 PM, Shushant Arora
wrote:
> T
Thanks Koert.
So in short for Highlevel api I ll have to go with spark streaming only and
there the issue is of handling cluster restart , thats why you opted for
second approach of batch job or due to batch interval (2 hours is large for
stream job) or some other reason?
On Sun, Apr 19, 2015 a
KafkaRDD uses the simple consumer api. and i think you need to handle
offsets yourself, unless things changed since i last looked.
I would do second approach.
On Sat, Apr 18, 2015 at 2:42 PM, Shushant Arora
wrote:
> Thanks !!
> I have few more doubts :
>
> Does kafka RDD uses simpleAPI for kafk
Thanks !!
I have few more doubts :
Does kafka RDD uses simpleAPI for kafka consumer or highlevel API, I mean
do I need to handle offset of partitions myself or it will be taken care by
KafkaRDD, Plus which one is better for batch programming. I have a
requirement to read kafka messages by a spark
That's a much better idea :)
On Sat, Apr 18, 2015 at 11:22 AM Koert Kuipers wrote:
> Use KafkaRDD directly. It is in spark-streaming-kafka package
>
> On Sat, Apr 18, 2015 at 6:43 AM, Shushant Arora > wrote:
>
>> Hi
>>
>> I want to consume messages from kafka queue using spark batch program not
Use KafkaRDD directly. It is in spark-streaming-kafka package
On Sat, Apr 18, 2015 at 6:43 AM, Shushant Arora
wrote:
> Hi
>
> I want to consume messages from kafka queue using spark batch program not
> spark streaming, Is there any way to achieve this, other than using low
> level(simple api) of
ime
To: user
Subject: spark with kafka
Hi
I want to consume messages from kafka queue using spark batch program not spark
streaming, Is there any way to achieve this, other than using low level(simple
api) of kafka consumer.
Thanks
___
Hi
I want to consume messages from kafka queue using spark batch program not
spark streaming, Is there any way to achieve this, other than using low
level(simple api) of kafka consumer.
Thanks
25 matches
Mail list logo