Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Susan Zhang
Have you tried partitionBy? Something like hiveWindowsEvents.foreachRDD( rdd => { val eventsDataFrame = rdd.toDF() eventsDataFrame.write.mode(SaveMode.Append).partitionBy(" windows_event_time_bin").saveAsTable("windows_event") }) On Wed, Oct 28, 2015 at 7:41 AM, Bryan Jeffrey

Streaming: BatchTime OffsetRange Mapping?

2015-08-21 Thread Susan Zhang
I've been reading documentation on accessing offsetRanges and updating ZK yourself when using DirectKafkaInputDStream (from createDirectStream), along with the code changes in this PR: https://github.com/apache/spark/pull/4805. I'm planning on adding a listener to update ZK (for monitoring purpose

Re: Spark Direct Streaming With ZK Updates

2015-08-24 Thread Susan Zhang
Thanks Cody (forgot to reply-all earlier, apologies)! One more question for the list: I'm now seeing a java.lang.ClassNotFoundException for kafka.OffsetRange upon relaunching the streaming job after a previous run (via spark-submit) 15/08/24 13:07:11 INFO CheckpointReader: Attempting to load ch

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-25 Thread Susan Zhang
No; first batch only contains messages received after the second job starts (messages come in at a steady rate of about 400/second). On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger wrote: > Does the first batch after restart contain all the messages received while > the job was down? > > On Tue

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-25 Thread Susan Zhang
Yeah. All messages are lost while the streaming job was down. On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger wrote: > Are you actually losing messages then? > > On Tue, Aug 25, 2015 at 1:15 PM, Susan Zhang wrote: > >> No; first batch only contains messages received af

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-25 Thread Susan Zhang
ZkUtils.updatePersistentPath(zkClient, zkPath, osr.untilOffset.toString) } } ssc } On Tue, Aug 25, 2015 at 12:07 PM, Cody Koeninger wrote: > Sounds like something's not set up right... can you post a minimal code > example that reproduces the issue? > > O

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-26 Thread Susan Zhang
. > > Also, is there a reason you're calling > > directKStream.checkpoint(checkpointDuration) > > Just calling checkpoint on the streaming context should be sufficient to > save the metadata > > > > On Tue, Aug 2

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-26 Thread Susan Zhang
> > Just out of curiosity, does removing the call to checkpoint on the stream > affect anything? > > > > On Wed, Aug 26, 2015 at 1:04 PM, Susan Zhang wrote: > >> Thanks for the suggestions! I tried the following: >> >> I removed >> >> createOnErro

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-26 Thread Susan Zhang
ual job output? > > The log lines you posted indicate that the checkpoint was restored and > those offsets were processed; what are the log lines for the following > KafkaRDD ? > > > On Wed, Aug 26, 2015 at 2:04 PM, Susan Zhang wrote: > >> Compared offsets, and