Hi,
I have a spark app which internally splits into 2 jobs coz we write to 2
different cassandra tables. The input data comes from the same cassandra
table, so after reading data from cassandra and apply few transformations I
cache one of the RDD and fork the program to compute both the metrics.
I
ich saves the actual intermediate RDD data)
>
> TD
>
> On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak > wrote:
>
>> Tried using local checkpointing as well, and even that becomes slow after
>> sometime. Any idea what can be wrong?
>>
>> Thanks,
>> Sour
Tried using local checkpointing as well, and even that becomes slow after
sometime. Any idea what can be wrong?
Thanks,
Sourabh
On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak
wrote:
> I can see the entries processed in the table very fast but after that it
> takes a long time f
ure it's checkpointing speed?
>
> Have you compared it against checkpointing to hdfs, s3, or local disk?
>
> On Fri, Oct 2, 2015 at 1:17 AM, Sourabh Chandak
> wrote:
>
>> Hi,
>>
>> I have a receiverless kafka streaming job which was started yesterday
>>
Hi,
I have a receiverless kafka streaming job which was started yesterday
evening and was running fine till 4 PM today. Suddenly post that writing of
checkpoint has slowed down and it is now not able to catch up with the
incoming data. We are using the DSE stack with Spark 1.2 and Cassandra for
ch
> *Sent:* Thursday, October 1, 2015 11:46 PM
> *To:* Sourabh Chandak
> *Cc:* user
> *Subject:* Re: spark.streaming.kafka.maxRatePerPartition for direct stream
>
> That depends on your job, your cluster resources, the number of seconds
> per batch...
>
> You'll need to do s
Hi,
I am writing a spark streaming job using the direct stream method for kafka
and wanted to handle the case of checkpoint failure when we'll have to
reprocess the entire data from starting. By default for every new
checkpoint it tries to load everything from each partition and that takes a
lot o
I also have the same use case as Augustus, and have some basic questions
about recovery from checkpoint. I have a 10 node Kafka cluster and a 30
node Spark cluster running streaming job, how is the (topic, partition)
data handled in checkpointing. The scenario I want to understand is, in
case of no
ach
> individual error, instead of only printing the message.
>
>
>
>
> On Thu, Sep 24, 2015 at 5:00 PM, Sourabh Chandak > wrote:
>
>> I was able to get pass this issue. I was pointing the SSL port whereas
>> SimpleConsumer should point to the PLA
ing("Throwing this errir\n")),
ok => ok
)
}
On Thu, Sep 24, 2015 at 3:00 PM, Sourabh Chandak
wrote:
> I was able to get pass this issue. I was pointing the SSL port whereas
> SimpleConsumer should point to the PLAINTEXT port. But after fixing that I
> am getting the follo
a profiler on it to see what's taking
> up heap.
>
>
>
> On Thu, Sep 24, 2015 at 3:51 PM, Sourabh Chandak
> wrote:
>
>> Adding Cody and Sriharsha
>>
>> On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak
>> wrote:
>>
>>> Hi,
>>>
Adding Cody and Sriharsha
On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak
wrote:
> Hi,
>
> I have ported receiver less spark streaming for kafka to Spark 1.2 and am
> trying to run a spark streaming job to consume data form my broker, but I
> am getting the following error:
>
Hi,
I have ported receiver less spark streaming for kafka to Spark 1.2 and am
trying to run a spark streaming job to consume data form my broker, but I
am getting the following error:
15/09/24 20:17:45 ERROR BoundedByteBufferReceive: OOME with size 352518400
java.lang.OutOfMemoryError: Java heap
Can we use the existing kafka spark streaming jar to connect to a kafka
server running in SSL mode?
We are fine with non SSL consumer as our kafka cluster and spark cluster
are in the same network
Thanks,
Sourabh
On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira wrote:
> I can't speak for the Sp
ct Kafka approach. That is quite flexible, can
> give exactly-once guarantee without WAL, and is more robust and performant.
> Consider using it.
>
>
> On Wed, Aug 5, 2015 at 5:48 PM, Sourabh Chandak
> wrote:
>
>> Hi,
>>
>> I am trying to replicate the Kafka Stre
Hi,
I am trying to replicate the Kafka Streaming Receiver for a custom version
of Kafka and want to create a Reliable receiver. The current implementation
uses BlockGenerator which is a private class inside Spark streaming hence I
can't use that in my code. Can someone help me with some resources
16 matches
Mail list logo