Custom resolution rules that grow query plans

2022-11-30 Thread Ted Chester Jenks
Hello, I wish to write a custom logical plan rule that modifies the output schema and grows the logical plan. The purpose of the rule is roughly to apply a projection on top of DatasourceV2Relation depending on some condition: case class MyRule extends Rule[LogicalPlan] { override def apply(

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Chester Chen
Congratulation to both. Holden, we need catch up. Chester Chen ■ Senior Manager – Data Science & Engineering 3000 Clearview Way San Mateo, CA 94402 [cid:image001.png@01D27678.9466E4D0] From: Felix Cheung Date: Tuesday, January 24, 2017 at 1:20 PM To: Reynold Xin , "dev@spark.a

Re: [discuss] DataFrame vs Dataset in Spark 2.0

2016-02-25 Thread Chester Chen
vote for Option 1. 1) Since 2.0 is major API, we are expecting some API changes, 2) It helps long term code base maintenance with short term pain on Java side 3) Not quite sure how large the code base is using Java DataFrame APIs. On Thu, Feb 25, 2016 at 3:23 PM, Reynold Xin wrote: >

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Chester @work
Jerry I thought you should not create more than one SparkContext within one Jvm, ... Chester Sent from my iPhone > On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: > > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave >

Re: Incremental Analysis with Spark

2015-11-25 Thread chester
For the 2nd use case, can you save the result for first 29 days, then just get the last day result and add yourself ? This can be done outside of spark. Does that work for you Sent from my iPad > On Nov 25, 2015, at 9:46 PM, Sachith Withana wrote: > > Hi folks! > > I'm wondering if Sparks

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Chester Chen
-cluster spark 1.3.1, but could not get spark 1.5.1 started. We upgrade the client to CDH5.4, then everything works. There are API changes between Apache 2.4 and 2.6, not sure you can mix match them. Chester On Fri, Nov 20, 2015 at 1:59 PM, Sandy Ryza wrote: > To answer your fourth quest

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread chester
. Company will have enough time to upgrade cluster. +1 for me as well Chester Sent from my iPad > On Nov 19, 2015, at 2:14 PM, Reynold Xin wrote: > > I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I > think everybody is for that. > > https://issues.apac

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Chester Chen
+1 Test against CDH5.4.2 with hadoop 2.6.0 version using yesterday's code, build locally. Regression running in Yarn Cluster mode against few internal ML ( logistic regression, linear regression, random forest and statistic summary) as well Mlib KMeans. all seems to work fine. Chester O

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Chester Chen
Thanks for the ticket. Chester On Thu, Oct 22, 2015 at 1:15 PM, Steve Loughran wrote: > > On 22 Oct 2015, at 19:32, Chester Chen wrote: > > Steven > You summarized mostly correct. But there is a couple points I want > to emphasize. > > Not eve

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Chester Chen
#x27;s hive-exec and orga.apache.hadoop.hive hive-exec behave differently for the same method. Chester On Thu, Oct 22, 2015 at 10:18 AM, Charmee Patel wrote: > A similar issue occurs when interacting with Hive secured by Sentry. > https://issues.apache.org/jira/browse/SPARK-904

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread chester
back and estimation by doing this. This is bit off the original topic. I still think there is a bug related to the spark yarn client in case of Kerberos + spark hive-exec dependency. Chester Sent from my iPad > On Oct 22, 2015, at 12:05 AM, Doug Balog wrote: > > >

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-21 Thread Chester Chen
hadoop cluster). The job submission actually failed in the client side. Currently we get around this by replace the spark's hive-exec with apache hive-exec. Chester On Wed, Oct 21, 2015 at 5:27 PM, Doug Balog wrote: > See comments below. > > > On Oct 21, 2015, at 5:33 PM, Ches

Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-21 Thread Chester Chen
found " + e); return } case e: Exception => { logError("Unexpected Exception " + e) throw new RuntimeException("Unexpected exception", e) } } } thanks Chester

Re: Build spark 1.5.1 branch fails

2015-10-17 Thread Chester Chen
, 2015 at 2:44 PM, Ted Yu wrote: > Have you set MAVEN_OPTS with the following ? > -Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m > > Cheers > > On Sat, Oct 17, 2015 at 2:35 PM, Chester Chen > wrote: > >> I was using jdk 1.7 and maven version is the same

Re: Build spark 1.5.1 branch fails

2015-10-17 Thread Chester Chen
skip, with mvn build, it fails with [ERROR] PermGen space -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging I am giving up on this. Just using 1.5.2-SNAPSHOT for now. Che

Build spark 1.5.1 branch fails

2015-10-08 Thread Chester Chen
ncies path: [warn] org.apache.spark:spark-network-common_2.10:1.5.1 ((com.typesafe.sbt.pom.MavenHelper) MavenHelper.scala#L76) [warn] +- org.apache.spark:spark-network-shuffle_2.10:1.5.1 [info] Packaging /Users/chester/projects/alpine/apache/spark/launcher/target/scala-2.10/spark-launcher_2.10

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread Chester Chen
release would build from > 1.5.0 before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from > 1.5.1 snapshot during release ? Or 1.5.0 rc3 would build from the last > commit of 1.5.0 (before changing to 1.5.1 snapshot) ? > >>> > >>> > >>> >

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread chester
s correct for the 1.5 branch, right? this doesn't mean that the >>> next RC would have this value. You choose the release version during >>> the release process. >>> >>>> On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen wrote: >>>> Seems that Githu

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread chester
On Sep 1, 2015, at 1:52 AM, Sean Owen wrote: > > That's correct for the 1.5 branch, right? this doesn't mean that the > next RC would have this value. You choose the release version during > the release process. > >> On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-31 Thread Chester Chen
Seems that Github branch-1.5 already changing the version to 1.5.1-SNAPSHOT, I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ? Chester On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin wrote: > I'm going to -1 the release myself since the issue @yhuai identified is

Re: High Availability of Spark Driver

2015-08-28 Thread Chester Chen
Ashish and Steve I am also working on the long running Yarn Spark Job. Just start to focus on failure recovery. This thread of discussion is really helpful. Chester On Fri, Aug 28, 2015 at 12:53 AM, Ashish Rawat wrote: > Thanks Steve. I had not spent many brain cycles on analysing

Re: Welcoming some new committers

2015-06-17 Thread Chester Chen
Congratulations to All. DB and Sandy, great works ! On Wed, Jun 17, 2015 at 3:12 PM, Matei Zaharia wrote: > Hey all, > > Over the past 1.5 months we added a number of new committers to the > project, and I wanted to welcome them now that all of their respective > forms, accounts, etc are in. J

Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
I put the design requirements and description in the commit comment. So I will close the PR. please refer the following commit https://github.com/AlpineNow/spark/commit/5b336bbfe92eabca7f4c20e5d49e51bb3721da4d On Mon, May 25, 2015 at 3:21 PM, Chester Chen wrote: > All, > I have c

Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
helps the discussion Chester On Fri, May 22, 2015 at 10:55 AM, Kevin Markey wrote: > Thanks. We'll look at it. > I've sent another reply addressing some of your other comments. > Kevin > > > On 05/22/2015 10:27 AM, Marcelo Vanzin wrote: > > Hi Kevin, > &g

Re: Change for submitting to yarn in 1.3.1

2015-05-14 Thread Chester At Work
uster or error messages directly in the application log. I will put some design doc and actual code in my pull request later, as Andrew requested. This PR is unlikely to get merge in, but it will show the idea I am talking about here. Thanks for listening and responding Che

Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Chester @work
. Thanks Chester Sent from my iPhone > On May 13, 2015, at 7:22 PM, Patrick Wendell wrote: > > Hey Chester, > > Thanks for sending this. It's very helpful to have this list. > > The reason we made the Client API private was that it was never > intende

Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Chester At Work
nning job with additional spark commands and interactions via this channel. Chester Sent from my iPad On May 12, 2015, at 20:54, Patrick Wendell wrote: > Hey Kevin and Ron, > > So is the main shortcoming of the launcher library the inability to > get an app

Re: Submit & Kill Spark Application program programmatically from another application

2015-05-03 Thread Chester Chen
Sounds like you are in Yarn-Cluster mode. I created a JIRA SPARK-3913 <https://issues.apache.org/jira/browse/SPARK-3913> and PR https://github.com/apache/spark/pull/2786 is this what you looking for ? Chester On Sat, May 2, 2015 at 10:32 PM, Yijie Shen wrote: > Hi, > > I

Question regarding some of the changes in [SPARK-3477]

2015-04-14 Thread Chester Chen
these Yarn Client related class private ? Any possibilities make these Client classes non-private ? thanks Chester

Re: broadcast hang out

2015-03-15 Thread Chester Chen
can you just replace "Duration.Inf" with a shorter duration ? how about import scala.concurrent.duration._ val timeout = new Timeout(10 seconds) Await.result(result.future, timeout.duration) or val timeout = new FiniteDuration(10, TimeUnit.SECONDS) Await.resu

Re: Using CUDA within Spark / boosting linear algebra

2015-03-13 Thread Chester At Work
Reyonld, Prof Canny gives me the slides yesterday I will posted the link to the slides to both SF BIg Analytics and SF Machine Learning meetups. Chester Sent from my iPad On Mar 12, 2015, at 22:53, Reynold Xin wrote: > Thanks for chiming in, John. I missed your meetup last night -

FYI: Prof John Canny is giving a talk on "Machine Learning at the limit" in SF Big Analytics Meetup

2015-02-10 Thread Chester Chen
Just in case you are in San Francisco, we are having a meetup by Prof John Canny http://www.meetup.com/SF-Big-Analytics/events/220427049/ Chester

Re: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Chester @work
Maybe you can ask prof john canny himself:-) as I invited him to give a talk at Alpine data labs in March's meetup (SF big Analytics & SF machine learning joined meetup) , 3/11. To be announced in next day or so. Chester Sent from my iPhone > On Feb 9, 2015, at 4:48 PM, "

Re: Intro to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt (for beginners)

2014-11-19 Thread Chester At Work
gen-idea should work. I use it all the time. But use the approach that works for you Sent from my iPad On Nov 18, 2014, at 11:12 PM, "Yiming \(John\) Zhang" wrote: > Hi Chester, thank you for your reply. But I tried this approach and it > failed. It seems that there are

Re: Intro to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt (for beginners)

2014-11-18 Thread Chester @work
For sbt You can simplify run sbt/sbt gen-idea To generate the IntelliJ idea project module for you. You can the just open the generated project, which includes all the needed dependencies Sent from my iPhone > On Nov 18, 2014, at 8:26 PM, Chen He wrote: > > Thank you Yiming. It is helpful.

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Chester Chen
ActorSystem, if it returns with correct identified message; then you can act on it, otherwise, ... hope this helps Chester On Wed, Oct 15, 2014 at 1:38 PM, Matthew Cheah wrote: > What's happening when I do this is that the Worker tries to get the Master > actor by calling context.act

Re: RFC: Deprecating YARN-alpha API's

2014-09-09 Thread Chester Chen
We were using it until recently, we are talking to our customers and see if we can get off it. Chester Alpine Data Labs On Tue, Sep 9, 2014 at 10:59 AM, Sean Owen wrote: > FWIW consensus from Cloudera folk seems to be that there's no need or > demand on this end for YARN alpha.

Re: Running Spark On Yarn without Spark-Submit

2014-08-29 Thread Chester @work
t. So far it does what we wants . Hope this helps Chester Sent from my iPhone > On Aug 29, 2014, at 2:36 AM, Archit Thakur wrote: > > including u...@spark.apache.org. > > >> On Fri, Aug 29, 2014 at 2:03 PM, Archit Thakur >> wrote: >> Hi, >> >&

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Chester Chen
Mridul, Thanks for the suggestion. I just updated the build today and changed the yarn/alpha/pom.xml to 1.1.1-SNAPSHOT then the command worked. I will create a JIRA and PR for it. Chester On Thu, Aug 21, 2014 at 8:03 AM, Chester @work wrote: > Do we have Jenkins te

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Chester @work
been updated properly in > the 1.1 branch. > > Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to > make it same as any other pom). > > > Regards, > Mridul > > >> On Thu, Aug 21, 2014 at 5:09 AM, Chester Chen wrote: >> I just upda

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread Chester Chen
Just tried on master branch, and the master branch works fine for yarn-alpha On Wed, Aug 20, 2014 at 4:39 PM, Chester Chen wrote: > I just updated today's build and tried branch-1.1 for both yarn and > yarn-alpha. > > For yarn build, this command seem to work fine. &

is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread Chester Chen
ideas Chester ᚛ |branch-1.1|$ *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects* Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from /Users/ch

Re: Master compilation with sbt

2014-07-19 Thread Chester Chen
Works for me as well: git branch branch-0.9 branch-1.0 * master Chesters-MacBook-Pro:spark chester$ git pull --rebase remote: Counting objects: 578, done. remote: Compressing objects: 100% (369/369), done. remote: Total 578 (delta 122), reused 418 (delta 71) Receiving objects: 100

Re: Possible bug in ClientBase.scala?

2014-07-17 Thread Chester Chen
citly. In fact > I think you can just call to ClientBase for this? PR it, I say. > > On Thu, Jul 17, 2014 at 3:24 PM, Chester Chen > wrote: > > val knownDefMRAppCP: Seq[String] = > > getFieldValue[String, Seq[String]](classOf[MRJobConfig], > > > &

Re: Possible bug in ClientBase.scala?

2014-07-17 Thread Chester Chen
the compile error, and are you setting yarn.version? the > > default is to use hadoop.version, but that defaults to 1.0.4 and there > > is no such YARN. > > > > Unless I missed it, I only see compile errors in yarn-stable, and you > > are trying to compile vs YARN

Re: Possible bug in ClientBase.scala?

2014-07-16 Thread Chester Chen
ers/chester/projects/spark/ [info]assembly [info]bagel [info]catalyst [info]core [info]examples [info]graphx [info]hive [info]mllib [info]oldDeps [info]repl [info]spark [info]sql [info]streaming [info]streaming-flume [i

Re: Possible bug in ClientBase.scala?

2014-07-16 Thread Chester Chen
Hmm looks like a Build script issue: I run the command with : sbt/sbt clean *yarn/*test:compile but errors came from [error] 40 errors found [error] (*yarn-stable*/compile:compile) Compilation failed Chester On Wed, Jul 16, 2014 at 5:18 PM, Chester Chen wrote: > Hi, Sandy > >

Re: Possible bug in ClientBase.scala?

2014-07-16 Thread Chester Chen
OME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from /Users/chester/projects/spark/project/project [info] Loading project definition from /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project [warn] Multiple resolvers havin

Re: Random forest - is it under implementation?

2014-07-11 Thread Chester At Work
Sung chung from alpine data labs presented the random Forrest implementation at Spark summit 2014. The work will be open sourced and contributed back to MLLib. Stay tuned Sent from my iPad On Jul 11, 2014, at 6:02 AM, Egor Pahomov wrote: > Hi, I have intern, who wants to implement some ML

Re: Application level progress monitoring and communication

2014-06-30 Thread Chester Chen
ctive query jobs. This gives me some thing to start with. I will try to with Akka first. Will let community know once we got somewhere. thanks Chester On Sun, Jun 29, 2014 at 11:07 PM, Reynold Xin wrote: > This isn't exactly about Spark itself, more about how an application on >

Application level progress monitoring and communication

2014-06-29 Thread Chester Chen
a approach ? Alternatives ? * Is there a way to get Spark's Akka host and port from Yarn Resource Manager to Yarn Client ? Any suggestions welcome Thanks Chester

Re: spark config params conventions

2014-03-14 Thread Chester Chen
Based on typesafe config maintainer's response, with latest version of typeconfig, the double quote is no longer needed for key like spark.speculation, so you don't need code to strip the quotes Chester Alpine data labs Sent from my iPhone On Mar 12, 2014, at 2:50 PM, Aaron David

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Chester Chen
with the same build.scala. We have being use this setup for last 6 months. The build includes different versions of Hadoop as well as spark. Hope this helps Chester Sent from my iPhone On Feb 25, 2014, at 4:36 PM, Sandy Ryza wrote: > To perhaps restate what some have said, Maven is by