Have you tried partitionBy?
Something like
hiveWindowsEvents.foreachRDD( rdd => {
val eventsDataFrame = rdd.toDF()
eventsDataFrame.write.mode(SaveMode.Append).partitionBy("
windows_event_time_bin").saveAsTable("windows_event")
})
On Wed, Oct 28, 2015 at 7:41 AM, Bryan Jeffrey
I've been reading documentation on accessing offsetRanges and updating ZK
yourself when using DirectKafkaInputDStream (from createDirectStream),
along with the code changes in this PR:
https://github.com/apache/spark/pull/4805.
I'm planning on adding a listener to update ZK (for monitoring purpose
Thanks Cody (forgot to reply-all earlier, apologies)!
One more question for the list: I'm now seeing a
java.lang.ClassNotFoundException for kafka.OffsetRange upon relaunching the
streaming job after a previous run (via spark-submit)
15/08/24 13:07:11 INFO CheckpointReader: Attempting to load ch
No; first batch only contains messages received after the second job starts
(messages come in at a steady rate of about 400/second).
On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger wrote:
> Does the first batch after restart contain all the messages received while
> the job was down?
>
> On Tue
Yeah. All messages are lost while the streaming job was down.
On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger wrote:
> Are you actually losing messages then?
>
> On Tue, Aug 25, 2015 at 1:15 PM, Susan Zhang wrote:
>
>> No; first batch only contains messages received af
ZkUtils.updatePersistentPath(zkClient, zkPath,
osr.untilOffset.toString)
}
}
ssc
}
On Tue, Aug 25, 2015 at 12:07 PM, Cody Koeninger wrote:
> Sounds like something's not set up right... can you post a minimal code
> example that reproduces the issue?
>
> O
.
>
> Also, is there a reason you're calling
>
> directKStream.checkpoint(checkpointDuration)
>
> Just calling checkpoint on the streaming context should be sufficient to
> save the metadata
>
>
>
> On Tue, Aug 2
>
> Just out of curiosity, does removing the call to checkpoint on the stream
> affect anything?
>
>
>
> On Wed, Aug 26, 2015 at 1:04 PM, Susan Zhang wrote:
>
>> Thanks for the suggestions! I tried the following:
>>
>> I removed
>>
>> createOnErro
ual job output?
>
> The log lines you posted indicate that the checkpoint was restored and
> those offsets were processed; what are the log lines for the following
> KafkaRDD ?
>
>
> On Wed, Aug 26, 2015 at 2:04 PM, Susan Zhang wrote:
>
>> Compared offsets, and