from:"Sumona Routh"

Mean over window with minimum number of rows

2018-10-18 Thread Sumona Routh

Hi all, Before I go the route of rolling my own UDAF: I'm doing a calculation of last 5 mean so I have the following window defined: Window.partitionBy(person).orderBy(timestamp).rowsBetween(-4, Window.currentRow) Then I calculate the mean over that window. Within each partition, I'd like the f

Databricks 1/2 day certification course at Spark Summit

2018-05-25 Thread Sumona Routh

Hi all, My company just now approved for some of us to go to Spark Summit in SF this year. Unfortunately, the day long workshops on Monday are sold out now. We are considering what we might do instead. Have others done the 1/2 day certification course before? Is it worth considering? Does it cover

Re: DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

2017-07-13 Thread Sumona Routh

Yong Zhang wrote: > Can't you just catch that exception and return an empty dataframe? > > > Yong > > > ------ > *From:* Sumona Routh > *Sent:* Wednesday, July 12, 2017 4:36 PM > *To:* user > *Subject:* DataFrameReader read from S3 >

DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

2017-07-12 Thread Sumona Routh

Hi there, I'm trying to read a list of paths from S3 into a dataframe for a window of time using the following: sparkSession.read.parquet(listOfPaths:_*) In some cases, the path may not be there because there is no data, which is an acceptable scenario. However, Spark throws an AnalysisException:

Re: Random Forest hangs without trace of error

2017-05-30 Thread Sumona Routh

Hi Morten, Were you able to resolve your issue with RandomForest? I am having similar issues with a newly trained model (that does have larger number of trees, smaller minInstancesPerNode, which is by design to produce the best performing model). I wanted to get some feedback on how you solved you

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Sumona Routh

Hi Sam, I would absolutely be interested in reading a blog write-up of how you are doing this. We have pieced together a relatively decent pipeline ourselves, in jenkins, but have many kinks to work out. We also have some new requirements to start running side by side comparisons of different versi

Re: Dataframes na fill with empty list

2017-04-11 Thread Sumona Routh

last line which doesn't compile is what I would want to do (after outer joining of course, it's not necessary except in that particular case where a null could be populated in that field). Thanks, Sumona On Tue, Apr 11, 2017 at 9:50 AM Sumona Routh wrote: > The sequence you are ref

Re: Dataframes na fill with empty list

2017-04-11 Thread Sumona Routh

d1”,"numeric_field2")) > .na.fill("", Seq( >“text_field1","text_field2","text_field3”)) > > > Notice that you have to differentiate those fields that are meant to be > filled with an int, from those that require a different value, an empty

Dataframes na fill with empty list

2017-04-10 Thread Sumona Routh

Hi there, I have two dataframes that each have some columns which are of list type (array generated by the collect_list function actually). I need to outer join these two dfs, however by nature of an outer join I am sometimes left with null values. Normally I would use df.na.fill(...), however it

Re: Can't load a RandomForestClassificationModel in Spark job

2017-01-12 Thread Sumona Routh

t; Ayan > > On Fri, 13 Jan 2017 at 5:39 am, Sumona Routh wrote: > > Hi all, > I've been working with Spark mllib 2.0.2 RandomForestClassificationModel. > > I encountered two frustrating issues and would really appreciate some > advice: > > 1) RandomForestClassif

Can't load a RandomForestClassificationModel in Spark job

2017-01-12 Thread Sumona Routh

Hi all, I've been working with Spark mllib 2.0.2 RandomForestClassificationModel. I encountered two frustrating issues and would really appreciate some advice: 1) RandomForestClassificationModel is effectively not serializable (I assume it's referencing something that can't be serialized, since

Re: Upgrade from 1.2 to 1.6 - parsing flat files in working directory

2016-07-26 Thread Sumona Routh

Can anyone provide some guidance on how to get files on the classpath for our Spark job? This used to work in 1.2, however after upgrading we are getting nulls when attempting to load resources. Thanks, Sumona On Thu, Jul 21, 2016 at 4:43 PM Sumona Routh wrote: > Hi all, > We are runnin

Upgrade from 1.2 to 1.6 - parsing flat files in working directory

2016-07-21 Thread Sumona Routh

Hi all, We are running into a classpath issue when we upgrade our application from 1.2 to 1.6. In 1.2, we load properties from a flat file (from working directory of the spark-submit script) using classloader resource approach. This was executed up front (by the driver) before any processing happe

Spark UI shows finished when job had an error

2016-06-17 Thread Sumona Routh

Hi there, Our Spark job had an error (specifically the Cassandra table definition did not match what was in Cassandra), which threw an exception that logged out to our spark-submit log. However ,the UI never showed any failed stage or job. It appeared as if the job finished without error, which is

Re: Spark UI standalone "crashes" after an application finishes

2016-03-01 Thread Sumona Routh

KnHOPJxIX5_n_zXe51k8z9hVuw4svP6dqWF0JrjabAa&wd=&eqid=be50a4160f49000256d50b7b>, > and so you still need to set a big java heap for master. > > > > -- 原始邮件 -- > *发件人:* "Shixiong(Ryan) Zhu";; > *发送时间:* 2016年3月1日(星期二)

Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Sumona Routh

Hi there, I've been doing some performance tuning of our Spark application, which is using Spark 1.2.1 standalone. I have been using the spark metrics to graph out details as I run the jobs, as well as the UI to review the tasks and stages. I notice that after my application completes, or is near

Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-22 Thread Sumona Routh

stener is > used to monitor the job progress and collect job information, an you should > not submit jobs there. Why not submit your jobs in the main thread? > > On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routh wrote: > >> Can anyone provide some insight into the flow of Spar

Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-17 Thread Sumona Routh

Can anyone provide some insight into the flow of SparkListeners, specifically onApplicationEnd? I'm having issues with the SparkContext being stopped before my final processing can complete. Thanks! Sumona On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh wrote: > Hi there, > I a

SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-15 Thread Sumona Routh

Hi there, I am trying to implement a listener that performs as a post-processor which stores data about what was processed or erred. With this, I use an RDD that may or may not change during the course of the application. My thought was to use onApplicationEnd and then saveToCassandra call to pers

SparkListener - why is org.apache.spark.scheduler.JobFailed in scala private?

2016-02-10 Thread Sumona Routh

Hi there, I am trying to create a listener for my Spark job to do some additional notifications for failures using this Scala API: https://spark.apache.org/docs/1.2.1/api/scala/#org.apache.spark.scheduler.JobResult . My idea was to write something like this: override def onJobEnd(jobEnd: SparkLis

Mean over window with minimum number of rows

Databricks 1/2 day certification course at Spark Summit

Re: DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

Re: Random Forest hangs without trace of error

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

Re: Dataframes na fill with empty list

Re: Dataframes na fill with empty list

Dataframes na fill with empty list

Re: Can't load a RandomForestClassificationModel in Spark job

Can't load a RandomForestClassificationModel in Spark job

Re: Upgrade from 1.2 to 1.6 - parsing flat files in working directory

Upgrade from 1.2 to 1.6 - parsing flat files in working directory

Spark UI shows finished when job had an error

Re: Spark UI standalone "crashes" after an application finishes

Spark UI standalone "crashes" after an application finishes

Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

SparkListener - why is org.apache.spark.scheduler.JobFailed in scala private?

20 matches

Site Navigation

Mail list logo

Footer information