Re: can spark take advantage of ordered data?

2017-03-10 Thread sourabh chaki
that project does not utilise the pre-existing partitions in the feed. Any pointer will be helpful. Thanks Sourabh On Thu, Mar 12, 2015 at 6:35 AM, Imran Rashid wrote: > Hi Jonathan, > > you might be interested in https://issues.apache.org/ > jira/browse/SPARK-3655 (not yet availabl

Caching in Spark

2016-01-22 Thread Sourabh Chandak
sometimes lead to re-reading data from cassandra or shuffling a lot of data. Thanks, Sourabh

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
ich saves the actual intermediate RDD data) > > TD > > On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak > wrote: > >> Tried using local checkpointing as well, and even that becomes slow after >> sometime. Any idea what can be wrong? >> >> Thanks, >> Sour

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Tried using local checkpointing as well, and even that becomes slow after sometime. Any idea what can be wrong? Thanks, Sourabh On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak wrote: > I can see the entries processed in the table very fast but after that it > takes a long time f

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
I can see the entries processed in the table very fast but after that it takes a long time for the checkpoint update. Haven't tried other methods of checkpointing yet, we are using DSE on Azure. Thanks, Sourabh On Fri, Oct 2, 2015 at 6:52 AM, Cody Koeninger wrote: > Why are you s

Checkpointing is super slow

2015-10-01 Thread Sourabh Chandak
checkpointing. Spark streaming is done using a backported code. Running nodetool shows that the Read latency of the cfs keyspace is ~8.5 ms. Can someone please help me resolve this? Thanks, Sourabh

Re: spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-01 Thread Sourabh Chandak
Thanks Cody, will try to do some estimation. Thanks Nicolae, will try out this config. Thanks, Sourabh On Thu, Oct 1, 2015 at 11:01 PM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > Hi, > > > Set 10ms and spark.streaming.backpressure.enabled=true > > >

spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-01 Thread Sourabh Chandak
10 MB. Thanks, Sourabh

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Sourabh Chandak
node failure how will a new node know the checkpoint of the failed node? The amount of data we have is huge and we can't run from the smallest offset. Thanks, Sourabh On Mon, Sep 28, 2015 at 11:43 AM, Augustus Hong wrote: > Got it, thank you! > > > On Mon, Sep 28, 2015 at 11:37 A

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-25 Thread Sourabh Chandak
ach > individual error, instead of only printing the message. > > > > > On Thu, Sep 24, 2015 at 5:00 PM, Sourabh Chandak > wrote: > >> I was able to get pass this issue. I was pointing the SSL port whereas >> SimpleConsumer should point to the PLA

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
ing("Throwing this errir\n")), ok => ok ) } On Thu, Sep 24, 2015 at 3:00 PM, Sourabh Chandak wrote: > I was able to get pass this issue. I was pointing the SSL port whereas > SimpleConsumer should point to the PLAINTEXT port. But after fixing that I > am getting the follo

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
a) Thanks, Sourabh On Thu, Sep 24, 2015 at 2:04 PM, Cody Koeninger wrote: > That looks like the OOM is in the driver, when getting partition metadata > to create the direct stream. In that case, executor memory allocation > doesn't matter. > > Allocate more driver memory, or put

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
Adding Cody and Sriharsha On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak wrote: > Hi, > > I have ported receiver less spark streaming for kafka to Spark 1.2 and am > trying to run a spark streaming job to consume data form my broker, but I > am getting the following error: >

ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) I have tried allocating 100G of memory with 1 executor but it is still failing. Spark version: 1.2.2 Kafka version ported: 0.8.2 Kafka server version: trunk version with SSL enabled Can someone please help me debug this. Thanks, Sourabh

Re: SSL between Kafka and Spark Streaming API

2015-08-28 Thread Sourabh Chandak
Can we use the existing kafka spark streaming jar to connect to a kafka server running in SSL mode? We are fine with non SSL consumer as our kafka cluster and spark cluster are in the same network Thanks, Sourabh On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira wrote: > I can't speak

Re: Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
Thanks Tathagata. I tried that but BlockGenerator internally uses SystemClock which is again private. We are using DSE so stuck with Spark 1.2 hence can't use the receiver-less version. Is it possible to use the same code as a separate API with 1.2? Thanks, Sourabh On Wed, Aug 5, 2015 at

Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
urces to tackle this issue? Thanks, Sourabh

Re: JAVA_HOME problem

2015-04-27 Thread sourabh chaki
in the same cluster working without any problem. Any pointer why this could happen? Thanks Sourabh On Fri, Apr 24, 2015 at 3:52 PM, sourabh chaki wrote: > Yes Akhil. This is the same issue. I have updated my comment in that > ticket. > > Thanks > Sourabh > > On Fri,

Re: JAVA_HOME problem

2015-04-24 Thread sourabh chaki
Yes Akhil. This is the same issue. I have updated my comment in that ticket. Thanks Sourabh On Fri, Apr 24, 2015 at 12:02 PM, Akhil Das wrote: > Isn't this related to this > https://issues.apache.org/jira/browse/SPARK-6681 > > Thanks > Best Regards > > On Fri, Apr 24,

Re: JAVA_HOME problem

2015-04-23 Thread sourabh chaki
error-with-upgrade-to-spark-1-3-0 Any pointer will be helpful. Thanks Sourabh On Thu, Apr 2, 2015 at 1:23 PM, 董帅阳 <917361...@qq.com> wrote: > spark 1.3.0 > > > spark@pc-zjqdyyn1:~> tail /etc/profile > export JAVA_HOME=/usr/jdk64/jdk1.7.0_45 > export PATH=$PATH:$JAVA_H

Re: train many decision tress with a single spark job

2015-01-13 Thread sourabh chaki
{ (data) => DecisionTree.trainClassifier(toLabelPoints(data)) } def toLablePoint(data: RDD[Double]) : RDD[LabeledPoint] = { // convert data RDD to lablepoint RDD } For your case, I think, you need custom logic to split the dataset. Thanks Sourabh On Tue, Jan 13, 2015 at 3:55 PM, Sean O

Re: Serialize mllib's MatrixFactorizationModel

2014-12-15 Thread sourabh chaki
the mllib trained model to a different system. Thanks Sourabh On Mon, Dec 15, 2014 at 10:39 PM, Albert Manyà wrote: > > In that case, what is the strategy to train a model in some background > batch process and make recommendations for some other service in real > time? Run both proc

Re: MLLIB model export: PMML vs MLLIB serialization

2014-12-15 Thread sourabh
Thanks Vincenzo. Are you trying out all the models implemented in mllib? Actually I don't see decision tree there. Sorry if I missed it. When are you planning to merge this to spark branch? Thanks Sourabh On Sun, Dec 14, 2014 at 5:54 PM, selvinsource [via Apache Spark User List] <

MLLIB model export: PMML vs MLLIB serialization

2014-12-03 Thread sourabh
ystem -> Model serialized using one version of Mllib entity, may not be deserializable using a different version of mllib entity(?). I think this is a quite common problem.I am really interested to hear from you people how you are solving this and what are the approaches and pros and con