Re: Sporadic ClassNotFoundException with Kryo

2017-01-12 Thread Nirmal Fernando
I faced a similar issue and had to do two things; 1. Submit Kryo jar with the spark-submit 2. Set spark.executor.userClassPathFirst true in Spark conf On Fri, Nov 18, 2016 at 7:39 PM, chrism wrote: > Regardless of the different ways we have tried deploying a jar together > with > Spark, when ru

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
eg: xyzDF.filter(col("x").equalTo(x)) > > It like split a dataframe to multiple dataframe. Currently, we can only > apply simple sql function to this GroupedData like agg, max etc. > > What we want is apply one ML algorithm to each group. > > Regards. > > [ima

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
MLlib to grouped dataframe? > > Regards. > Wenpei. > > [image: Inactive hide details for Nirmal Fernando ---08/23/2016 10:26:36 > AM---You can use Spark MLlib http://spark.apache.org/docs/late]Nirmal > Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib > http://s

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
You can use Spark MLlib http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu wrote: > Hi > > We have a dataframe, then want group it and apply a ML algorithm or > statistics(say t test) to each one. Is there

Re: thought experiment: use spark ML to real time prediction

2015-11-12 Thread Nirmal Fernando
; From: "Kothuvatiparambil, Viju" > > Date: 11/12/2015 3:09 PM (GMT-05:00) > To: DB Tsai , Sean Owen > Cc: Felix Cheung , Nirmal Fernando < > nir...@wso2.com>, Andy Davidson , Adrian > Tanase , "user @spark" , > Xiangrui Meng , hol...@pigscanfly.ca

Re: thought experiment: use spark ML to real time prediction

2015-11-11 Thread Nirmal Fernando
As of now, we are basically serializing the ML model and then deserialize it for prediction at real time. On Wed, Nov 11, 2015 at 4:39 PM, Adrian Tanase wrote: > I don’t think this answers your question but here’s how you would evaluate > the model in realtime in a streaming app > > https://data

Re: Applying transformations on a JavaRDD using reflection

2015-09-08 Thread Nirmal Fernando
Any thoughts? On Tue, Sep 8, 2015 at 3:37 PM, Nirmal Fernando wrote: > Hi All, > > I'd like to apply a chain of Spark transformations (map/filter) on a given > JavaRDD. I'll have the set of Spark transformations as Function, and > even though I can determine the

Applying transformations on a JavaRDD using reflection

2015-09-08 Thread Nirmal Fernando
Hi All, I'd like to apply a chain of Spark transformations (map/filter) on a given JavaRDD. I'll have the set of Spark transformations as Function, and even though I can determine the classes of T and A at the runtime, due to the type erasure, I cannot call JavaRDD's transformations as they expect

Re: How to speed up Spark process

2015-07-13 Thread Nirmal Fernando
If you press on the +details you could see the code that takes time. Did you already check it? On Tue, Jul 14, 2015 at 9:56 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > Job view. Others are fast, but the first one (repartition) is taking 95% > of job run time. > > On Mon, Jul 13, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) >

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
Can it be the limited memory causing this slowness? On Tue, Jul 14, 2015 at 9:00 AM, Nirmal Fernando wrote: > Thanks Burak. > > Now it takes minutes to repartition; > > Active Stages (1) Stage IdDescriptionSubmittedDurationTasks: > Succeeded/TotalInputOutputShuffle Read Shuff

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
> something like, (I'm assuming you are using Java): > ``` > JavaRDD input = data.repartition(8).cache(); > org.apache.spark.mllib.clustering.KMeans.train(input.rdd(), 3, 20); > ``` > > On Mon, Jul 13, 2015 at 11:10 AM, Nirmal Fernando wrote: > >> I'

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
ing k=3? What about # of > runs? How many partitions do you have? How many cores does your machine > have? > > Thanks, > Burak > > On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando wrote: > >> Hi Burak, >> >> k = 3 >> dimension = 785 features >> S

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
; On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando wrote: > >> Hi, >> >> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of >> time (16+ mints). >> >> It takes lot of time at this task; >> &g

[MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
Hi, For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of time (16+ mints). It takes lot of time at this task; org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) Can this be im

Spark MLLib 140 - logistic regression with SGD model accuracy is different in local mode and cluster mode

2015-07-02 Thread Nirmal Fernando
Hi All, I'm facing a quite strange case, where after migrating to Spark 140, I'm seen SparkMLLib produces different results when runs on local mode and cluster mode. Is there any possibility of that happening? (I feel this is an issue in my environment, but just wanted to get confirmed.) Thanks.

Re: Run multiple Spark jobs concurrently

2015-07-01 Thread Nirmal Fernando
Thanks Akhil! On Wed, Jul 1, 2015 at 1:08 PM, Akhil Das wrote: > Have a look at https://spark.apache.org/docs/latest/job-scheduling.html > > Thanks > Best Regards > > On Wed, Jul 1, 2015 at 12:01 PM, Nirmal Fernando wrote: > >> Hi All, >> >> Is there an

Run multiple Spark jobs concurrently

2015-06-30 Thread Nirmal Fernando
Hi All, Is there any additional configs that we have to do to perform $subject? -- Thanks & regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/

Re: path to hdfs

2015-06-08 Thread Nirmal Fernando
HDFS path should be something like; hdfs:// 127.0.0.1:8020/user/cloudera/inputs/ On Mon, Jun 8, 2015 at 4:15 PM, Pa Rö wrote: > hello, > > i submit my spark job with the following parameters: > > ./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \ > --class mgm.tp.bigdata.ma_spark.SparkMain \ > -

Re: Is there a way to disable the Spark UI?

2015-02-02 Thread Nirmal Fernando
Thanks Zhan! Was this introduced from Spark 1.2? or is this available in Spark 1.1 ? On Tue, Feb 3, 2015 at 11:52 AM, Zhan Zhang wrote: > You can set spark.ui.enabled to false to disable ui. > > Thanks. > > Zhan Zhang > > On Feb 2, 2015, at 8:06 PM, Nirmal Fernand

Is there a way to disable the Spark UI?

2015-02-02 Thread Nirmal Fernando
Hi All, Is there a way to disable the Spark UI? What I really need is to stop the startup of the Jetty server. -- Thanks & regards, Nirmal Senior Software Engineer- Platform Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/