Re: more uniform exception handling?

2016-04-18 Thread Evan Chan
+1000. Especially if the UI can help correlate exceptions, and we can reduce some exceptions. There are some exceptions which are in practice very common, such as the nasty ClassNotFoundException, that most folks end up spending tons of time debugging. On Mon, Apr 18, 2016 at 12:16 PM, Reynold

Re: Using local-cluster mode for testing Spark-related projects

2016-04-17 Thread Evan Chan
y database multiple times. On Sun, Apr 17, 2016 at 9:51 AM, Jon Maurer wrote: > Take a look at spark testing base. > https://github.com/holdenk/spark-testing-base/blob/master/README.md > > On Apr 17, 2016 10:28 AM, "Evan Chan" wrote: >> >> What I want to fi

Re: Using local-cluster mode for testing Spark-related projects

2016-04-17 Thread Evan Chan
er` mode by > yourself like > 'https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55'? > > // maropu > > On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan wrote: >> >> Hey folks, >> >> I'd like to

Using local-cluster mode for testing Spark-related projects

2016-04-16 Thread Evan Chan
Hey folks, I'd like to use local-cluster mode in my Spark-related projects to test Spark functionality in an automated way in a simulated local cluster.The idea is to test multi-process things in a much easier fashion than setting up a real cluster. However, getting this up and running in a

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Evan Chan
Hi folks, Sorry to join the discussion late. I had a look at the design doc earlier in this thread, and it was not mentioned what types of projects are the targets of this new "spark extras" ASF umbrella Is the desire to have a maintained set of spark-related projects that keep pace with the

Spark Summit CFP - Tracks guidelines

2015-02-04 Thread Evan Chan
Hey guys, Is there any guidance on what the different tracks for Spark Summit West mean? There are some new ones, like "Third Party Apps", which seems like it would be similar to the "Use Cases". Any further guidance would be great. thanks, Evan ---

Re: SparkSubmit.scala and stderr

2015-02-03 Thread Evan Chan
Why not just use SLF4J? On Tue, Feb 3, 2015 at 2:22 PM, Reynold Xin wrote: > We can use ScalaTest's privateMethodTester also instead of exposing that. > > On Tue, Feb 3, 2015 at 2:18 PM, Marcelo Vanzin wrote: > >> Hi Jay, >> >> On Tue, Feb 3, 2015 at 6:28 AM, jayhutfles wrote: >> > // Expos

Re: Welcoming three new committers

2015-02-03 Thread Evan Chan
Congrats everyone!!! On Tue, Feb 3, 2015 at 3:17 PM, Timothy Chen wrote: > Congrats all! > > Tim > > >> On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe >> wrote: >> >> Congrats and welcome back!! >> >> >> >> Thank you!! >> >> Regards >> Pritish >> Nirvana International Inc. >> >> Big Data, Hadoop,

Re: renaming SchemaRDD -> DataFrame

2015-02-01 Thread Evan Chan
nar. does spark SQL already use something >> like >> that? Evan mentioned "Spark SQL columnar compression", which sounds like >> it. where can i find that? >> >> thanks >> >> On Thu, Jan 29, 2015 at 2:32 PM, Evan Chan >> wrote: >> >

Re: renaming SchemaRDD -> DataFrame

2015-01-29 Thread Evan Chan
"null". > > See, e.g. http://www.r-bloggers.com/r-na-vs-null/ > > > > On Wed, Jan 28, 2015 at 4:42 PM, Reynold Xin wrote: >> >> Isn't that just "null" in SQL? >> >> On Wed, Jan 28, 2015 at 4:41 PM, Evan Chan >> wrote: >>

Re: renaming SchemaRDD -> DataFrame

2015-01-28 Thread Evan Chan
wrote: > Isn't that just "null" in SQL? > > On Wed, Jan 28, 2015 at 4:41 PM, Evan Chan wrote: >> >> I believe that most DataFrame implementations out there, like Pandas, >> supports the idea of missing values / NA, and some support the idea of >> No

Re: renaming SchemaRDD -> DataFrame

2015-01-28 Thread Evan Chan
sql.types. After 1.3, sql.catalyst is hidden from users, and all public APIs > have first class classes/objects defined in sql directly. > > > > On Wed, Jan 28, 2015 at 4:20 PM, Evan Chan wrote: >> >> Hey guys, >> >> How does this impact the data sources API? I

Re: renaming SchemaRDD -> DataFrame

2015-01-28 Thread Evan Chan
Hey guys, How does this impact the data sources API? I was planning on using this for a project. +1 that many things from spark-sql / DataFrame is universally desirable and useful. By the way, one thing that prevents the columnar compression stuff in Spark SQL from being more useful is, at leas

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Evan Chan
Ashwin, I would say the strategies in general are: 1) Have each user submit separate Spark app (each its own Spark Context), with its own resource settings, and share data through HDFS or something like Tachyon for speed. 2) Share a single spark context amongst multiple users, using fair schedul

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-08 Thread Evan Chan
James, Michael at the meetup last night said there was some development activity around ORCFiles. I'm curious though, what are the pros and cons of ORCFiles vs Parquet? On Wed, Oct 8, 2014 at 10:03 AM, James Yu wrote: > Didn't see anyone asked the question before, but I was wondering if anyone

Re: [Spark SQL] off-heap columnar store

2014-09-02 Thread Evan Chan
d would be to read data from Cassandra/Vertica/etc. and write back into Parquet, but this would take a long time and incur huge I/O overhead. > > I'm sorry it just sounds like its worth clearly defining what your key > requirement/goal is. > > > On Thu, Aug 28, 2014 at

Re: [Spark SQL] off-heap columnar store

2014-08-28 Thread Evan Chan
> >> The reason I'm asking about the columnar compressed format is that >> there are some problems for which Parquet is not practical. > > > Can you elaborate? Sure. - Organization or co has no Hadoop, but significant investment in some other NoSQL store. - Need to efficiently add a new column to

Re: [Spark SQL] off-heap columnar store

2014-08-26 Thread Evan Chan
What would be the timeline for the parquet caching work? The reason I'm asking about the columnar compressed format is that there are some problems for which Parquet is not practical. On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust wrote: >> What is the plan for getting Tachyon/off-heap suppor

[Spark SQL] off-heap columnar store

2014-08-22 Thread Evan Chan
Hey guys, What is the plan for getting Tachyon/off-heap support for the columnar compressed store? It's not in 1.1 is it? In particular: - being able to set TACHYON as the caching mode - loading of hot columns or all columns - write-through of columnar store data to HDFS or backing store - b

Too late to contribute for 1.1.0?

2014-08-21 Thread Evan Chan
I'm hoping to get in some doc enhancements and small bug fixes for Spark SQL. Also possibly a small new API to list the tables in sqlContext. Oh, and to get the doc page I had talked about before, a list of community Spark projects. thanks, Evan -

Spark-JobServer moving to a new location

2014-08-21 Thread Evan Chan
Dear community, Wow, I remember when we first open sourced the job server, at the first Spark Summit in December. Since then, more and more of you have started using it and contributing to it. It is awesome to see! If you are not familiar with the spark job server, it is a REST API for managin

Re: Apache Spark and Graphx for Real Time Analytics

2014-04-08 Thread Evan Chan
> > My typical use case is a large scale distributed graph traversal in real > > time, with billions of nodes. > > > > Thanks, > > Love. > > > > > > > > -- > > View this message in context: > > > http://apache-spark-developers-list.1001

Would anyone mind having a quick look at PR#288?

2014-04-02 Thread Evan Chan
https://github.com/apache/spark/pull/288 It's for fixing SPARK-1154, which would help Spark be a better citizen for most deploys, and should be really small and easy to review. thanks, Evan -- -- Evan Chan Staff Engineer e...@ooyala.com | <http://www.ooyala.com/> <http://ww

Re: sbt-package-bin

2014-04-02 Thread Evan Chan
gt; > a single lib/ folder, so in some ways it's even easier to manage than the > > assembly. > > > > You might also check out the > sbt-native-packager<https://github.com/sbt/sbt-native-packager>. > > > Cheers, > Lee > -- -- Evan Chan Staff Engineer e

Re: sbt-package-bin

2014-04-01 Thread Evan Chan
can already be created from the Maven build: mvn > >> -Pdeb ... > >> > >> > >> On Tue, Apr 1, 2014 at 11:24 AM, Evan Chan wrote: > >> > >> > Also, I understand this is the last week / merge window for 1.0, so if > >>

Re: sbt-package-bin

2014-04-01 Thread Evan Chan
Also, I understand this is the last week / merge window for 1.0, so if folks are interested I'd like to get in a PR quickly. thanks, Evan On Tue, Apr 1, 2014 at 11:24 AM, Evan Chan wrote: > Hey folks, > > We are in the middle of creating a Chef recipe for Spark. As part of

sbt-package-bin

2014-04-01 Thread Evan Chan
/ folder, so in some ways it's even easier to manage than the assembly. Also I'm not sure if there's an equivalent plugin for Maven. thanks, Evan -- -- Evan Chan Staff Engineer e...@ooyala.com | <http://www.ooyala.com/> <http://www.facebook.com/ooyala><http://www.lin

Re: [DISCUSS] Shepherding PRs

2014-03-27 Thread Evan Chan
done by a single voice, preventing contradicting comments > > etc... Knowing that other projects actually demand the patch-submitter to > ask > > for shepherding, I figured why not doing the same. > > > > For that ExternalContainerizer baby, I would kindly like to call ou

Re: new Catalyst/SQL component merged into master

2014-03-25 Thread Evan Chan
anning code is not > considered a public API and so is likely to change quite a bit as we improve > the optimizer. Its not currently something that we plan to expose for > external components to modify. > > Michael > > > On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan wrote: >

Re: Spark 0.9.1 release

2014-03-25 Thread Evan Chan
it out. We have backported several bug fixes into the 0.9 and updated JIRA >>> >>> accordingly<https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)>. >>> >>> Please let me know if there are fixes that were not backported but you >>> would like to see them in 0.9.1. >>> >>> Thanks! >>> >>> TD >>> >> -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: spark jobserver

2014-03-24 Thread Evan Chan
Suhas, You're welcome. We are planning to speak about the job server at the Spark Summit by the way. -Evan On Mon, Mar 24, 2014 at 9:38 AM, Suhas Satish wrote: > Thanks a lot for this update Evan , really appreciate the effort. > > On Monday, March 24, 2014, Evan Chan wro

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
Modifying* Spark's dependency graph... >> -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: spark jobserver

2014-03-24 Thread Evan Chan
spark-contrib. On Sat, Mar 22, 2014 at 6:15 PM, Suhas Satish wrote: > Any plans of integrating SPARK-818 into spark trunk ? The pull request is > open. > It offers spark as a service with spark jobserver running as a separate > process. > > > Thanks, > Suhas. -- -- Ev

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
elease! You can submit the PR and we can merge > it branch-0.9. If we have to cut another release, then we can include it. > > > > On Sun, Mar 23, 2014 at 11:42 PM, Evan Chan wrote: > >> I also have a really minor fix for SPARK-1057 (upgrading fastutil), >> could that a

Re: new Catalyst/SQL component merged into master

2014-03-23 Thread Evan Chan
ver (and years of testing). Once SparkSQL graduates from Alpha > status, it'll likely become the new backend for Shark. -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: Spark 0.9.1 release

2014-03-23 Thread Evan Chan
rtant >> > bug >> > > fixes and we would like to make a bug-fix release of Spark 0.9.1. We >> are >> > > going to cut a release candidate soon and we would love it if people >> test >> > > it out. We have backported several bug fixes into the 0.9 and updated >> > JIRA >> > > accordingly< >> > > >> > >> https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed) >> > > >. >> > > Please let me know if there are fixes that were not backported but you >> > > would like to see them in 0.9.1. >> > > >> > > Thanks! >> > > >> > > TD >> > > >> > >> -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: Announcing the official Spark Job Server repo

2014-03-23 Thread Evan Chan
> > For sure, we'll try to share it when we'll reach this point to deploy using > marathon (should be planned for April) > > greetz and again, Nice Work Evan! > > Ndi > > On Wed, Mar 19, 2014 at 7:27 AM, Evan Chan wrote: > >> Andy, >> >> Yeah, w

Re: repositories for spark jars

2014-03-19 Thread Evan Chan
o rebuild and deploy >> spark manually. >> >> -- >> Nathan Kronenfeld >> Senior Visualization Developer >> Oculus Info Inc >> 2 Berkeley Street, Suite 600, >> Toronto, Ontario M5A 4J5 >> Phone: +1-416-203-3003 x 238 >> Email: nkronenf...@oculusinfo.com -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: Announcing the official Spark Job Server repo

2014-03-19 Thread Evan Chan
gt; repo set-up for the 1.0 release. >> >> On Tue, Mar 18, 2014 at 11:28 PM, Evan Chan wrote: >> > Matei, >> > >> > Maybe it's time to explore the spark-contrib idea again? Should I >> > start a JIRA ticket? >> > >> > -Evan >

Re: Announcing the official Spark Job Server repo

2014-03-18 Thread Evan Chan
Powered+By+Spark. > > Matei > > On Mar 18, 2014, at 1:51 PM, Evan Chan wrote: > >> Dear Spark developers, >> >> Ooyala is happy to announce that we have pushed our official, Spark >> 0.9.0 / Scala 2.10-compatible, job server as a github repo: >> >> https

Re: Announcing the official Spark Job Server repo

2014-03-18 Thread Evan Chan
ews, Evan + Ooyala team: Great Job again. > > andy > > On Tue, Mar 18, 2014 at 11:39 PM, Henry Saputra > wrote: > >> W00t! >> >> Thanks for releasing this, Evan. >> >> - Henry >> >> On Tue, Mar 18, 2014 at 1:51 PM, Evan Chan wrote: >>

Announcing the official Spark Job Server repo

2014-03-18 Thread Evan Chan
now closed. Please have a look; pull requests are very welcome. -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-14 Thread Evan Chan
> > Thanks! > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-tp2315p5682.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: spark config params conventions

2014-03-12 Thread Evan Chan
t; more values inside then it cannot be also a value itself, i think. so this >>> would work fine: >>> spark.speculation.enabled=true >>> spark.speculation.interval=0.5 >>> >>> just a heads up. i would probably suggest we avoid this situation. >>> >> >> -- -- Evan Chan Staff Engineer e...@ooyala.com |

Spark 0.9.0 and log4j

2014-03-07 Thread Evan Chan
Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: special case of custom partitioning

2014-03-06 Thread Evan Chan
ame allocation for > second RDD? (all 'a's from rdd2 going to the same machine where 'a's from > first RDD went to). > > Is there a way to achieve this? > > Manoj -- -- Evan Chan Staff Engineer e...@ooyala.com |

New blog post on Spark + Parquet + Scrooge

2014-02-28 Thread Evan Chan
back with a help email. -- -- Evan Chan Staff Engineer e...@ooyala.com |

New JIRA ticket: cleaning up app-* folders

2014-02-28 Thread Evan Chan
cron job to clean up old folders. thanks, -Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: Spark JIRA

2014-02-28 Thread Evan Chan
.org/jira/browse/SPARK >> >> Best, >> >> -- >> Nan Zhu >> >> >> On Friday, February 28, 2014 at 2:29 PM, Evan Chan wrote: >> >> > Hey guys, >> > >> > There is no plan to move the Spark JIRA from the current >> >

Spark JIRA

2014-02-28 Thread Evan Chan
Hey guys, There is no plan to move the Spark JIRA from the current https://spark-project.atlassian.net/ right? -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Evan Chan
tely > satisfactory SBT build from a Maven build would be quite challenging.) > > > On Wed, Feb 26, 2014 at 11:34 AM, Evan Chan wrote: > >> Mark, >> >> No, I haven't tried this myself yet :-p Also I would expect that >> sbt-pom-reader does not do assemblies at

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Evan Chan
etely. > > It's not completely obvious to me how to proceed with what sbt-pom-reader > produces in order build the assemblies, run the test suites, etc., so I'm > wondering if you have already worked out what that requires? > > > On Wed, Feb 26, 2014 at 9:31 AM, Evan

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Evan Chan
any objections to using sbt or maven ! > Too many exclude versions, pinned versions, etc would just make things > unmanageable in future. > > > Regards, > Mridul > > > > > On Wed, Feb 26, 2014 at 8:56 AM, Evan chan wrote: >> Actually you can control exactly h

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan chan
> >> >> I was wondering actually, do you know if it's possible to added shaded >> artifacts to the *spark jar* using this plug-in (e.g. not an uber >> jar)? That's something I could see being really handy in the future. >> >> - Patrick >>

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
ject that would allow this kind of thing? > > -Sandy > > > On Tue, Feb 25, 2014 at 4:23 PM, Evan Chan wrote: > >> Hi Patrick, >> >> If you include shaded dependencies inside of the main Spark jar, such >> that it would have combined classes from all depende

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
park-core_2.10/0.9.0-incubating/spark-core_2.10-0.9.0-incubating.jar > > On Tue, Feb 25, 2014 at 4:04 PM, Evan Chan wrote: >> Patrick -- not sure I understand your request, do you mean >> - somehow creating a shaded jar (eg with maven shader plugin) >> - then including

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
ifacts to the *spark jar* using this plug-in (e.g. not an uber > jar)? That's something I could see being really handy in the future. > > - Patrick > > On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan wrote: >> The problem is that plugins are not equivalent. There is AFAIK no &

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
rk clients. But I do agree to only > keep one if there is a promising way to generate correct configuration from > the other. > > -Shengzhe > > > On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan wrote: > >> The correct way to exclude dependencies in SBT is actually to d

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
other from different transitive dependencies. >> > >> > AFIAK we are only using the shade plug-in to deal with conflict >> > resolution in the assembly jar. These are dealt with in sbt via the >> > sbt assembly plug-in in an identical way. Is there a difference? >> >> I am bringing up the Sharder, because it is an awful hack, which is can't >> be >> used in real controlled deployment. >> >> Cos >> >> > [1] >> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master >> -- -- Evan Chan Staff Engineer e...@ooyala.com |