Re: welcoming Burak and Holden as committers

2017-01-24 Thread Chester Chen
Congratulation to both. Holden, we need catch up. Chester Chen ■ Senior Manager – Data Science & Engineering 3000 Clearview Way San Mateo, CA 94402 [cid:image001.png@01D27678.9466E4D0] From: Felix Cheung Date: Tuesday, January 24, 2017 at 1:20 PM To: Reynold Xin , "dev@spark.a

Re: [discuss] DataFrame vs Dataset in Spark 2.0

2016-02-25 Thread Chester Chen
vote for Option 1. 1) Since 2.0 is major API, we are expecting some API changes, 2) It helps long term code base maintenance with short term pain on Java side 3) Not quite sure how large the code base is using Java DataFrame APIs. On Thu, Feb 25, 2016 at 3:23 PM, Reynold Xin wrote: >

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Chester Chen
for #1-3, the answer is likely No. Recently we upgrade to Spark 1.5.1, with CDH5.3, CDH5.4 and HDP2.2 and others. We were using CDH5.3 client to talk to CDH5.4. We were doing this to see if we support many different hadoop cluster versions without changing the build. This was ok for yarn-clu

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Chester Chen
+1 Test against CDH5.4.2 with hadoop 2.6.0 version using yesterday's code, build locally. Regression running in Yarn Cluster mode against few internal ML ( logistic regression, linear regression, random forest and statistic summary) as well Mlib KMeans. all seems to work fine. Chester On Tue, N

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Chester Chen
Thanks for the ticket. Chester On Thu, Oct 22, 2015 at 1:15 PM, Steve Loughran wrote: > > On 22 Oct 2015, at 19:32, Chester Chen wrote: > > Steven > You summarized mostly correct. But there is a couple points I want > to emphasize. > > Not eve

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Chester Chen
2 > > By changing how Hive Context instance is created, this issue might also be > resolved. > > On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran > wrote: > >> On 22 Oct 2015, at 08:25, Chester Chen wrote: >> >> Doug >> >>We are not trying to

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-21 Thread Chester Chen
hadoop cluster). The job submission actually failed in the client side. Currently we get around this by replace the spark's hive-exec with apache hive-exec. Chester On Wed, Oct 21, 2015 at 5:27 PM, Doug Balog wrote: > See comments below. > > > On Oct 21, 2015, at 5:33 PM, Ches

Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-21 Thread Chester Chen
All, just to see if this happens to other as well. This is tested against the spark 1.5.1 ( branch 1.5 with label 1.5.2-SNAPSHOT with commit on Tue Oct 6, 84f510c4fa06e43bd35e2dc8e1008d0590cbe266) Spark deployment mode : Spark-Cluster Notice that if we enable Kerberos mode, the

Re: Build spark 1.5.1 branch fails

2015-10-17 Thread Chester Chen
, 2015 at 2:44 PM, Ted Yu wrote: > Have you set MAVEN_OPTS with the following ? > -Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m > > Cheers > > On Sat, Oct 17, 2015 at 2:35 PM, Chester Chen > wrote: > >> I was using jdk 1.7 and maven version is the same

Re: Build spark 1.5.1 branch fails

2015-10-17 Thread Chester Chen
> Xiao Li > > 2015-10-08 10:35 GMT-07:00 Chester Chen : > >> Question regarding branch-1.5 build. >> >> Noticed that the spark project no longer publish the spark-assembly. We >> have to build ourselves ( until we find way to not depends on assembly >> jar). >

Build spark 1.5.1 branch fails

2015-10-08 Thread Chester Chen
Question regarding branch-1.5 build. Noticed that the spark project no longer publish the spark-assembly. We have to build ourselves ( until we find way to not depends on assembly jar). I check out the tag v.1.5.1 release version and using the sbt to build it, I get the following error build/s

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread Chester Chen
release would build from > 1.5.0 before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from > 1.5.1 snapshot during release ? Or 1.5.0 rc3 would build from the last > commit of 1.5.0 (before changing to 1.5.1 snapshot) ? > >>> > >>> > >>> >

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-31 Thread Chester Chen
Seems that Github branch-1.5 already changing the version to 1.5.1-SNAPSHOT, I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ? Chester On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin wrote: > I'm going to -1 the release myself since the issue @yhuai identified is > pretty serious

Re: High Availability of Spark Driver

2015-08-28 Thread Chester Chen
Ashish and Steve I am also working on the long running Yarn Spark Job. Just start to focus on failure recovery. This thread of discussion is really helpful. Chester On Fri, Aug 28, 2015 at 12:53 AM, Ashish Rawat wrote: > Thanks Steve. I had not spent many brain cycles on analysing the Yarn

Re: Welcoming some new committers

2015-06-17 Thread Chester Chen
Congratulations to All. DB and Sandy, great works ! On Wed, Jun 17, 2015 at 3:12 PM, Matei Zaharia wrote: > Hey all, > > Over the past 1.5 months we added a number of new committers to the > project, and I wanted to welcome them now that all of their respective > forms, accounts, etc are in. J

Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
I put the design requirements and description in the commit comment. So I will close the PR. please refer the following commit https://github.com/AlpineNow/spark/commit/5b336bbfe92eabca7f4c20e5d49e51bb3721da4d On Mon, May 25, 2015 at 3:21 PM, Chester Chen wrote: > All, > I have c

Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
All, I have created a PR just for the purpose of helping document the use case, requirements and design. As it is unlikely to get merge in. So it only used to illustrate the problems we trying and solve and approaches we took. https://github.com/apache/spark/pull/6398 Hope this helps

Re: Submit & Kill Spark Application program programmatically from another application

2015-05-03 Thread Chester Chen
Sounds like you are in Yarn-Cluster mode. I created a JIRA SPARK-3913 and PR https://github.com/apache/spark/pull/2786 is this what you looking for ? Chester On Sat, May 2, 2015 at 10:32 PM, Yijie Shen wrote: > Hi, > > I’ve posted this pro

Question regarding some of the changes in [SPARK-3477]

2015-04-14 Thread Chester Chen
While working on upgrading to Spark 1.3.x, notice that the Client and ClientArgument classes in yarn module are now defined as private[spark]. I know that these code are mostly used by spark-submit code; but we call Yarn client directly ( without going through spark-submit) in our spark integration

Re: broadcast hang out

2015-03-15 Thread Chester Chen
can you just replace "Duration.Inf" with a shorter duration ? how about import scala.concurrent.duration._ val timeout = new Timeout(10 seconds) Await.result(result.future, timeout.duration) or val timeout = new FiniteDuration(10, TimeUnit.SECONDS) Await.resu

FYI: Prof John Canny is giving a talk on "Machine Learning at the limit" in SF Big Analytics Meetup

2015-02-10 Thread Chester Chen
Just in case you are in San Francisco, we are having a meetup by Prof John Canny http://www.meetup.com/SF-Big-Analytics/events/220427049/ Chester

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Chester Chen
You can call resolve method on ActorSelection.resolveOne() to see if the actor is still there or the path is correct. The method returns a future and you can wait for it with timeout. This way, you know the actor is live or already dead or incorrect. Another way, is to send Identify method to Acto

Re: RFC: Deprecating YARN-alpha API's

2014-09-09 Thread Chester Chen
We were using it until recently, we are talking to our customers and see if we can get off it. Chester Alpine Data Labs On Tue, Sep 9, 2014 at 10:59 AM, Sean Owen wrote: > FWIW consensus from Cloudera folk seems to be that there's no need or > demand on this end for YARN alpha. It wouldn't ha

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Chester Chen
l has not been updated properly in > > the 1.1 branch. > > > > Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to > > make it same as any other pom). > > > > > > Regards, > > Mridul > > > > > >> On Thu,

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread Chester Chen
Just tried on master branch, and the master branch works fine for yarn-alpha On Wed, Aug 20, 2014 at 4:39 PM, Chester Chen wrote: > I just updated today's build and tried branch-1.1 for both yarn and > yarn-alpha. > > For yarn build, this command seem to work fine. &

is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread Chester Chen
I just updated today's build and tried branch-1.1 for both yarn and yarn-alpha. For yarn build, this command seem to work fine. sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects for yarn-alpha sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects I got the following Any ideas Che

Re: Master compilation with sbt

2014-07-19 Thread Chester Chen
Works for me as well: git branch branch-0.9 branch-1.0 * master Chesters-MacBook-Pro:spark chester$ git pull --rebase remote: Counting objects: 578, done. remote: Compressing objects: 100% (369/369), done. remote: Total 578 (delta 122), reused 418 (delta 71) Receiving objects: 100% (5

Re: Possible bug in ClientBase.scala?

2014-07-17 Thread Chester Chen
citly. In fact > I think you can just call to ClientBase for this? PR it, I say. > > On Thu, Jul 17, 2014 at 3:24 PM, Chester Chen > wrote: > > val knownDefMRAppCP: Seq[String] = > > getFieldValue[String, Seq[String]](classOf[MRJobConfig], > > > &

Re: Possible bug in ClientBase.scala?

2014-07-17 Thread Chester Chen
the compile error, and are you setting yarn.version? the > > default is to use hadoop.version, but that defaults to 1.0.4 and there > > is no such YARN. > > > > Unless I missed it, I only see compile errors in yarn-stable, and you > > are trying to compile vs YARN

Re: Possible bug in ClientBase.scala?

2014-07-16 Thread Chester Chen
nfo]streaming-kafka [info]streaming-mqtt [info]streaming-twitter [info]streaming-zeromq [info]tools [info]yarn [info] * yarn-stable On Wed, Jul 16, 2014 at 5:41 PM, Chester Chen wrote: > Hmm > looks like a Build script issue: > > I run the command with : > >

Re: Possible bug in ClientBase.scala?

2014-07-16 Thread Chester Chen
Hmm looks like a Build script issue: I run the command with : sbt/sbt clean *yarn/*test:compile but errors came from [error] 40 errors found [error] (*yarn-stable*/compile:compile) Compilation failed Chester On Wed, Jul 16, 2014 at 5:18 PM, Chester Chen wrote: > Hi, Sandy > >

Re: Possible bug in ClientBase.scala?

2014-07-16 Thread Chester Chen
19 PM, Sandy Ryza wrote: > Hi Ron, > > I just checked and this bug is fixed in recent releases of Spark. > > -Sandy > > > On Sun, Jul 13, 2014 at 8:15 PM, Chester Chen > wrote: > >> Ron, >> Which distribution and Version of Hadoop are you using ? &g

Re: Application level progress monitoring and communication

2014-06-30 Thread Chester Chen
progress. You can do it with a lot of different ways, > such as Akka, custom REST API, Thrift ... I think any of them will do. > > > > > On Sun, Jun 29, 2014 at 7:57 PM, Chester Chen > wrote: > > > Hi Spark dev community: > > > > I have several questions r

Application level progress monitoring and communication

2014-06-29 Thread Chester Chen
Hi Spark dev community: I have several questions regarding Application and Spark communication 1) Application Level Progress Monitoring Currently, our application using in YARN_CLUSTER model running Spark Jobs. This works well so far, but we would like to monitoring the application level progres

Re: spark config params conventions

2014-03-14 Thread Chester Chen
Based on typesafe config maintainer's response, with latest version of typeconfig, the double quote is no longer needed for key like spark.speculation, so you don't need code to strip the quotes Chester Alpine data labs Sent from my iPhone On Mar 12, 2014, at 2:50 PM, Aaron Davidson wrote:

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Chester Chen
@Sandy Yes, in sbt with multiple projects setup, you can easily set a variable in the build.scala and reference the version number from all dependent projects . Regarding mix of java and scala projects, in my workplace , we have both java and scala codes. The sbt can be used to build both with