Re: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Nan Zhu
, what happened to k-means in HiBench? Best, -- Nan Zhu http://codingcat.me On Friday, June 26, 2015 at 7:24 AM, Huang, Jie wrote: > Intel® Xeon® CPU E5-2697

Re: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Nan Zhu
Thank you, Jie! Very nice work! -- Nan Zhu http://codingcat.me On Friday, June 26, 2015 at 8:17 AM, Huang, Jie wrote: > Correct. Your calculation is right! > > We have been aware of that kmeans performance drop also. According to our > observation, it is caused by som

Re: Failing MiMa tests

2016-03-14 Thread Nan Zhu
I guess it’s Jenkins’ problem? My PR was failed for MiMa but still got a message from SparkQA (https://github.com/SparkQA) saying that "This patch passes all tests." I checked Jenkins’ history, there are other PRs with the same issue…. Best, -- Nan Zhu http://codingcat.me

Release Announcement: XGBoost4J - Portable Distributed XGBoost in Spark, Flink and Dataflow

2016-03-15 Thread Nan Zhu
! For more details of distributed XGBoost, you can refer to the recently published paper: http://arxiv.org/abs/1603.02754 Best, -- Nan Zhu http://codingcat.me

[Package Release] Widely accepted XGBoost now available in Spark

2016-03-16 Thread Nan Zhu
are more than welcome to join us and contribute to the project! For more details of distributed XGBoost, you can refer to the recently published paper: http://arxiv.org/abs/1603.02754 Best, -- Nan Zhu http://codingcat.me

Re: Paper on Spark SQL

2015-08-17 Thread Nan Zhu
an extra “,” is at the end -- Nan Zhu http://codingcat.me On Monday, August 17, 2015 at 9:28 AM, Ted Yu wrote: > I got 404 when trying to access the link. > > > > On Aug 17, 2015, at 5:31 AM, Todd mailto:bit1...@163.com)> > wrote: > > > Hi

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Nan Zhu
DocDB does have a java client? Anything prevent you using that? Get Outlook for iOS From: ayan guha Sent: Thursday, April 20, 2017 9:24:03 PM To: Ashish Singh Cc: user Subject: Re: Azure Event Hub with Pyspark Hi yes, its only scala. I am

--jars does not take remote jar?

2017-05-02 Thread Nan Zhu
Hi, all For some reason, I tried to pass in a HDFS path to the --jars option in spark-submit According to the document, http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management, --jars would accept remote path However, in the implementation, https://github.

Re: --jars does not take remote jar?

2017-05-02 Thread Nan Zhu
resolving some permission issues, etc.?) On Tue, May 2, 2017 at 9:00 AM, Marcelo Vanzin wrote: > Remote jars are added to executors' classpaths, but not the driver's. > In YARN cluster mode, they would also be added to the driver's class > path. > > On Tue, May 2, 2

Re: --jars does not take remote jar?

2017-05-02 Thread Nan Zhu
I see.Thanks! On Tue, May 2, 2017 at 9:12 AM, Marcelo Vanzin wrote: > On Tue, May 2, 2017 at 9:07 AM, Nan Zhu wrote: > > I have no easy way to pass jar path to those forked Spark > > applications? (except that I download jar from a remote path to a local > temp > >

Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
Hi, all Out of curious, I just found a bunch of Palantir release under org.apache.spark in maven central ( https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11)? Is it on purpose? Best, Nan

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
nvm On Tue, Jan 9, 2018 at 9:42 AM, Nan Zhu wrote: > Hi, all > > Out of curious, I just found a bunch of Palantir release under > org.apache.spark in maven central (https://mvnrepository.com/ > artifact/org.apache.spark/spark-core_2.11)? > > Is it on purpose? > > Best, > > Nan > > >

broken UI in 2.3?

2018-03-05 Thread Nan Zhu
Hi, all I am experiencing some issues in UI when using 2.3 when I clicked executor/storage tab, I got the following exception java.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.servic

Re: How to use more executors

2015-01-21 Thread Nan Zhu
…not sure when will it be reviewed… but for now you can work around by allowing multiple worker instances on a single machine http://spark.apache.org/docs/latest/spark-standalone.html search SPARK_WORKER_INSTANCES Best, -- Nan Zhu http://codingcat.me On Wednesday, January 21, 2015 at

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Nan Zhu
most of cases, that’s one of the existing actor receivers) The limitation might be that, all receivers are on the same machine... Here is a PR trying to expose the APIs to the user: https://github.com/apache/spark/pull/3984 Best, -- Nan Zhu http://codingcat.me On Monday, March 2, 2015

Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Nan Zhu
[Boolean] = new DynamicVariable[Boolean](false) I’m not sure if there is enough amount of benefits to make it worth exposing this variable to the user… Best, -- Nan Zhu http://codingcat.me On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote: > Found this thread: > http://search-hadoop

Re: How to use more executors

2015-03-11 Thread Nan Zhu
at least 1.4 I think now using YARN or allowing multiple worker instances are just fine Best, -- Nan Zhu http://codingcat.me On Wednesday, March 11, 2015 at 8:42 PM, Du Li wrote: > Is it being merged in the next release? It's indeed a critical patch! > > Du > &

Re: How to use more executors

2015-03-11 Thread Nan Zhu
I think this should go to another PR can you create a JIRA on that? Best, -- Nan Zhu http://codingcat.me On Wednesday, March 11, 2015 at 8:50 PM, Du Li wrote: > Is it possible to extend this PR further (or create another PR) to allow for > per-node configuration of w

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Nan Zhu
The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help Best, -- Nan Zhu http://codingcat.me On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote: > Yep, it's not serializable: > https://hbase

Re: What happened to the Row class in 1.3.0?

2015-04-06 Thread Nan Zhu
Row class was not documented mistakenly in 1.3.0 you can check the 1.3.1 API doc http://people.apache.org/~pwendell/spark-1.3.1-rc1-docs/api/scala/index.html#org.apache.spark.sql.Row Best, -- Nan Zhu http://codingcat.me On Monday, April 6, 2015 at 10:23 AM, ARose wrote: > I am trying

Re: What happened to the Row class in 1.3.0?

2015-04-06 Thread Nan Zhu
Hi, Ted It’s here: https://github.com/apache/spark/blob/61b427d4b1c4934bd70ed4da844b64f0e9a377aa/sql/catalyst/src/main/java/org/apache/spark/sql/RowFactory.java Best, -- Nan Zhu http://codingcat.me On Monday, April 6, 2015 at 10:44 AM, Ted Yu wrote: > I searched code base but did

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
I made the PR, the problem is …after many rounds of review, that configuration part is missed….sorry about that I will fix it Best, -- Nan Zhu On Monday, June 2, 2014 at 5:13 PM, Pierre Borckmans wrote: > I'm a bit confused because the PR mentioned by Patrick seems to ad

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
I remember that in the earlier version of that PR, I deleted files by calling HDFS API we discussed and concluded that, it’s a bit scary to have something directly deleting user’s files in Spark Best, -- Nan Zhu On Monday, June 2, 2014 at 10:39 PM, Patrick Wendell wrote: >

Re: overwriting output directory

2014-06-12 Thread Nan Zhu
Hi, SK For 1.0.0 you have to delete it manually in 1.0.1 there will be a parameter to enable overwriting https://github.com/apache/spark/pull/947/files Best, -- Nan Zhu On Thursday, June 12, 2014 at 1:57 PM, SK wrote: > Hi, > > When we have multiple runs of a program writi

Re: Writing data to HBase using Spark

2014-06-12 Thread Nan Zhu
you are using spark streaming? master = “local[n]” where n > 1? Best, -- Nan Zhu On Wednesday, June 11, 2014 at 4:23 AM, gaurav.dasgupta wrote: > Hi Kanwaldeep, > > I have tried your code but arrived into a problem. The code is working fine > in local mode. But if

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Nan Zhu
Actually this has been merged to the master branch https://github.com/apache/spark/pull/947 -- Nan Zhu On Thursday, June 12, 2014 at 2:39 PM, Daniel Siegmann wrote: > The old behavior (A) was dangerous, so it's good that (B) is now the default. > But in some cases I really

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Nan Zhu
ah, I see, I think it’s hard to do something like fs.delete() in spark code (it’s scary as we discussed in the previous PR ) so if you want (C), I guess you have to do some delete work manually Best, -- Nan Zhu On Thursday, June 12, 2014 at 3:31 PM, Daniel Siegmann wrote: > I

Re: long GC pause during file.cache()

2014-06-15 Thread Nan Zhu
SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t mind the WARNING in the logs you can set spark.executor.extraJavaOpts in your SparkConf obj Best, -- Nan Zhu On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote: > Hi, Wei > > You may try to set JVM opts

Re: long GC pause during file.cache()

2014-06-15 Thread Nan Zhu
Yes, I think in the spark-env.sh.template, it is listed in the comments (didn’t check….) Best, -- Nan Zhu On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote: > Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0? > > > > On Sun, Jun 15, 2014 at 4

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-07-07 Thread Nan Zhu
Hey, Cheney, The problem is still existing? Sorry for the delay, I’m starting to look at this issue, Best, -- Nan Zhu On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote: > Hi Nan, > > In worker's log, I see the following exception thrown when try to launch on &g

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-07-08 Thread Nan Zhu
Hi, Cheney, Thanks for the information which version are you using, 0.9.1? Best, -- Nan Zhu On Tuesday, July 8, 2014 at 10:09 AM, Cheney Sun wrote: > Hi Nan, > > The problem is still there, just as I described before. It's said that the > issue had already

try JDBC server

2014-07-11 Thread Nan Zhu
Hi, all I would like to give a try on JDBC server (which is supposed to be released in 1.1) where can I find the document about that? Best, -- Nan Zhu

Re: try JDBC server

2014-07-11 Thread Nan Zhu
nvm for others with the same question: https://github.com/apache/spark/commit/8032fe2fae3ac40a02c6018c52e76584a14b3438 -- Nan Zhu On Friday, July 11, 2014 at 7:02 PM, Nan Zhu wrote: > Hi, all > > I would like to give a try on JDBC server (which is supposed to be released

Re: How does Spark speculation prevent duplicated work?

2014-07-15 Thread Nan Zhu
ong…just went through the code roughly, welcome to correct me Best, -- Nan Zhu On Tuesday, July 15, 2014 at 1:55 PM, Mingyu Kim wrote: > Hi all, > > I was curious about the details of Spark speculation. So, my understanding is > that, when “speculated” tasks are newly sch

DROP IF EXISTS still throws exception about "table does not exist"?

2014-07-21 Thread Nan Zhu
dle well the IF EXISTS part of this query. Maybe you could fill a ticket on Spark JIRA. BUT, it's not a bug in HIVE IMHO.” My question is the DDL is executed by Hive itself, doesn’t it? Best, -- Nan Zhu

Re: DROP IF EXISTS still throws exception about "table does not exist"?

2014-07-21 Thread Nan Zhu
a related JIRA: https://issues.apache.org/jira/browse/SPARK-2605 -- Nan Zhu On Monday, July 21, 2014 at 10:10 AM, Nan Zhu wrote: > Hi, all > > When I try hiveContext.hql("drop table if exists abc") where abc is a > non-exist table > > I still received

broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
the variable is cleaned, since there are enough memory space? Best, -- Nan Zhu

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
, in usual, it will success in 1/10 times) I once suspected that it’s related to some concurrency issue, but even I disable the parallel test in built.sbt, the problem is still there --- Best, -- Nan Zhu On Monday, July 21, 2014 at 5:40 PM, Tathagata Das wrote: > The ContextCleaner cle

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
well, But I do not think it will bring this problem even the spark.cores.max is too large? Best, -- Nan Zhu On Monday, July 21, 2014 at 6:11 PM, Nan Zhu wrote: > Hi, TD, > > Thanks for the reply > > I tried to reproduce this in a simpler program, but no luck >

Re: DROP IF EXISTS still throws exception about "table does not exist"?

2014-07-21 Thread Nan Zhu
Ah, I see, thanks, Yin -- Nan Zhu On Monday, July 21, 2014 at 5:00 PM, Yin Huai wrote: > Hi Nan, > > It is basically a log entry because your table does not exist. It is not a > real exception. > > Thanks, > > Yin > > > On Mon,

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
Ah, sorry, sorry, my brain just damaged….. sent some wrong information not “spark.cores.max” but the minPartitions in sc.textFile() Best, -- Nan Zhu On Monday, July 21, 2014 at 7:17 PM, Tathagata Das wrote: > That is definitely weird. spark.core.max should not affect thing when t

SELECT DISTINCT generates random results?

2014-08-05 Thread Nan Zhu
Hi, all I use “SELECT DISTINCT” to query the data saved in hive it seems that this statement cannot understand the table structure and just output the data in other fields Anyone met the similar problem before? Best, -- Nan Zhu

Re: SELECT DISTINCT generates random results?

2014-08-05 Thread Nan Zhu
nvm, some problem brought by the ill-formatted raw data -- Nan Zhu On Tuesday, August 5, 2014 at 3:42 PM, Nan Zhu wrote: > Hi, all > > I use “SELECT DISTINCT” to query the data saved in hive > > it seems that this statement cannot understand the table structure and

Re: Upgrading 1.0.0 to 1.0.2

2014-08-26 Thread Nan Zhu
Hi, Victor, the issue for you to have different version in driver and cluster is that you the master will shutdown your application due to the inconsistent SerialVersionID in ExecutorState Best, -- Nan Zhu On Tuesday, August 26, 2014 at 10:10 PM, Matei Zaharia wrote: > Things w

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
Hi, Can you attach more logs to see if there is some entry from ContextCleaner? I met very similar issue before…but haven’t get resolved Best, -- Nan Zhu On Thursday, September 11, 2014 at 10:13 AM, Dibyendu Bhattacharya wrote: > Dear All, > > Not sure if this is a fa

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
java.lang.Thread.run(Thread.java:744) -- Nan Zhu On Thursday, September 11, 2014 at 10:42 AM, Nan Zhu wrote: > Hi, > > Can you attach more logs to see if there is some entry from ContextCleaner? > > I met very similar issue before…but haven’t get resolved > > Bes

Re: Distributed dictionary building

2014-09-23 Thread Nan Zhu
great, thanks -- Nan Zhu On Tuesday, September 23, 2014 at 9:58 AM, Sean Owen wrote: > Yes, Matei made a JIRA last week and I just suggested a PR: > https://github.com/apache/spark/pull/2508 > On Sep 23, 2014 2:55 PM, "Nan Zhu" (mailto:zhunanmcg...@gmail.com)> wrote:

Re: Distributed dictionary building

2014-09-23 Thread Nan Zhu
shall we document this in the API doc? Best, -- Nan Zhu On Sunday, September 21, 2014 at 12:18 PM, Debasish Das wrote: > zipWithUniqueId is also affected... > > I had to persist the dictionaries to make use of the indices lower down in > the flow... > > On Sun, Sep 2

Re: executorAdded event to DAGScheduler

2014-09-26 Thread Nan Zhu
such a deployment mode Best, -- Nan Zhu On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote: > Can someone explain the motivation behind passing executorAdded event to > DAGScheduler ? DAGScheduler does submitWaitingStages when executorAdded > method is

Re: Reading from HBase is too slow

2014-09-29 Thread Nan Zhu
can you look at your HBase UI to check whether your job is just reading from a single region server? Best, -- Nan Zhu On Monday, September 29, 2014 at 10:21 PM, Tao Xiao wrote: > I submitted a job in Yarn-Client mode, which simply reads from a HBase table > containing tens of milli

MLUtil.kfold generates overlapped training and validation set?

2014-10-09 Thread Nan Zhu
we allow overlapped training and validation set ? (counter intuitive to me) 2. I had some misunderstanding on the code? 3. it’s a bug? Anyone can explain it to me? Best, -- Nan Zhu

Re: MLUtil.kfold generates overlapped training and validation set?

2014-10-10 Thread Nan Zhu
Thanks, Xiangrui, I found the reason of overlapped training set and test set …. Another counter-intuitive issue related to https://github.com/apache/spark/pull/2508 Best, -- Nan Zhu On Friday, October 10, 2014 at 2:19 AM, Xiangrui Meng wrote: > 1. No. > > 2. The

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Nan Zhu
Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia (mailto

Re: Akka disassociation on Java SE Embedded

2014-10-10 Thread Nan Zhu
https://github.com/CodingCat/spark/commit/c5cee24689ac4ad1187244e6a16537452e99e771 -- Nan Zhu On Friday, October 10, 2014 at 4:31 PM, bhusted wrote: > How do you increase the spark block manager timeout? > > > > -- > View this message in context: > http://apache-sp

Re: Setting only master heap

2014-10-23 Thread Nan Zhu
h… my observation is that, master in Spark 1.1 has higher frequency of GC…… Also, before 1.1, I never encounter GC overtime in Master, after upgrade to 1.1, I have met for 2 times (we upgrade soon after 1.1 release)…. Best, -- Nan Zhu On Thursday, October 23, 2014 at 1:08 PM

Re: Exceptions not caught?

2014-10-23 Thread Nan Zhu
cannot catch anything in driver side Best, -- Nan Zhu On Thursday, October 23, 2014 at 6:40 PM, ankits wrote: > Hi, I'm running a spark job and encountering an exception related to thrift. > I wanted to know where this is being thrown, but the stack trace is > completely useless

Re: Workers not registering after master restart

2014-11-04 Thread Nan Zhu
Hi, Ashic, this is expected for the latest released version However, workers should be able to re-register since 1.2, since this patch https://github.com/apache/spark/pull/2828 was merged Best, -- Nan Zhu On Tuesday, November 4, 2014 at 6:00 PM, Ashic Mahtab wrote: > Hi, > I

enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
spark streaming, (like me), usually needs the detailed log for debugging…. Best, -- Nan Zhu http://codingcat.me

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
Hi, Ted, Thanks I know how to set in Akka’s context, my question is just how to pass this aka.loglevel=DEBUG to Spark’s actor system Best, -- Nan Zhu http://codingcat.me On Wednesday, January 14, 2015 at 6:09 PM, Ted Yu wrote: > I assume you have looked at: > > http://doc.akk

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
not pass akka.* to executor? Hi, Josh, would you mind giving some hints, as you created and closed the JIRA? Best, -- Nan Zhu On Wednesday, January 14, 2015 at 6:19 PM, Nan Zhu wrote: > Hi, Ted, > > Thanks > > I know how to set in Akka’s context, my question is just how

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
for others who have the same question: you can simply set logging level in log4j.properties to DEBUG to achieve this Best, -- Nan Zhu http://codingcat.me On Wednesday, January 14, 2015 at 6:28 PM, Nan Zhu wrote: > I quickly went through the code, > > In ExecutorBackend, we

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
sorry for the mistake, I found that those akka related messages are from Spark Akka-related component (ActorLogReceive) , instead of Akka itself, though it has been enough for the debugging purpose (in my case) the question in this thread is still in open status…. Best, -- Nan Zhu http

Re: spark-ec2 login expects at least 1 slave

2014-03-01 Thread Nan Zhu
Yes, I think open an issue in JIRA is good and I volunteer to help fixing this Best, -- Nan Zhu On Saturday, March 1, 2014 at 11:49 PM, Nicholas Chammas wrote: > Should I open an issue in JIRA to track this as a minor bug? > > > On Sat, Mar 1, 2014 at 8:07 PM

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Nan Zhu
9 66.7MB/s in 1.3s 2014-03-02 09:59:56 (66.7 MB/s) - ‘spark-assembly-0.9.0-incubating-hadoop1.0.4.jar.1’ saved [87878749/87878749] Best, -- Nan Zhu On Sunday, March 2, 2014 at 9:48 AM, Pierre B wrote: > Hi all! > > In spark 0.9.0, local mode, whenever I try to

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
Hi, Diana, See my inlined answer -- Nan Zhu On Monday, March 24, 2014 at 3:44 PM, Diana Carroll wrote: > Has anyone successfully followed the instructions on the Quick Start page of > the Spark home page to run a "standalone" Scala application? I can't, and

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
Hi, Diana, You don’t need to use spark-distributed sbt just download sbt from its official website and set your PATH to the right place Best, -- Nan Zhu On Monday, March 24, 2014 at 4:30 PM, Diana Carroll wrote: > Yeah, that's exactly what I did. Unfortunately it does

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
I found that I never read the document carefully and I never find that Spark document is suggesting you to use Spark-distributed sbt…… Best, -- Nan Zhu On Monday, March 24, 2014 at 5:47 PM, Diana Carroll wrote: > Thanks for your help, everyone. Several folks have explained that I

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread Nan Zhu
partition your input into even number partitions use mapPartition to operate on Iterator[Int] maybe there are some more efficient way…. Best, -- Nan Zhu On Monday, March 24, 2014 at 7:59 PM, yh18190 wrote: > Hi, I have large data set of numbers ie RDD and wanted to perfor

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
Yes, actually even for spark, I mostly use the sbt I installed…..so always missing this issue…. If you can reproduce the problem with a spark-distribtued sbt…I suggest proposing a PR to fix the document, before 0.9.1 is officially released Best, -- Nan Zhu On Monday, March 24, 2014

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread Nan Zhu
ions at :16 scala> res7.collect res10: Array[Int] = Array(3, 7) Best, -- Nan Zhu On Monday, March 24, 2014 at 8:40 PM, Nan Zhu wrote: > partition your input into even number partitions > > use mapPartition to operate on Iterator[Int] > > maybe there are some m

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
Spark UI -- Nan Zhu On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote: > Is it possible to run across cluster using Spark Interactive Shell ? > > To be more explicit, is the procedure similar to running standalone > master-slave spark. > > I want to execu

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
the executors, i.e. run in a distributed fashion Best, -- Nan Zhu On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote: > Nan Zhu, its the later, I want to distribute the tasks to the cluster > [machines available.] > > If i set the SPARK_MASTER_IP at the other machines

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
master does more work than that actually, I just explained why he should set MASTER_IP correctly a simplified list: 1. maintain the worker status 2. maintain in-cluster driver status 3. maintain executor status (the worker tells master what happened on the executor, -- Nan Zhu On

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes. " -- Nan Zhu On Wednesday, March 26, 2014 at 9:59 AM, Nan Zhu wrote: > master does more work than that actually, I just explained why h

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Nan Zhu
Montreal or Toronto? On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson wrote: > How about London? > > > -- > Martin Goodson | VP Data Science > (0)20 3397 1240 > [image: Inline image 1] > > > On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski > wrote: > >> Hi folks, >> >> We have seen a lot of com

Re: Status of MLI?

2014-04-01 Thread Nan Zhu
/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel -- Nan Zhu On Tuesday, April 1, 2014 at 10:38 PM, Krakna H wrote: > What is the current development status of MLI/MLBase? I see that the github > repo is lying dormant (https://github.com/amplab/MLI) and JI

Re: Status of MLI?

2014-04-01 Thread Nan Zhu
Ah, I see, I’m sorry, I didn’t read your email carefully then I have no idea about the progress on MLBase Best, -- Nan Zhu On Tuesday, April 1, 2014 at 11:05 PM, Krakna H wrote: > Hi Nan, > > I was actually referring to MLI/MLBase (http://www.mlbase.org); is this being &

Re: Only TraversableOnce?

2014-04-08 Thread Nan Zhu
so, the data structure looks like: D consists of D1, D2, D3 (DX is partition) and DX consists of d1, d2, d3 (dx is the part in your context)? what you want to do is to transform DX to (d1 + d2, d1 + d3, d2 + d3)? Best, -- Nan Zhu On Tuesday, April 8, 2014 at 8:09 AM, wxhsdp wrote

Re: Only TraversableOnce?

2014-04-08 Thread Nan Zhu
If that’s the case, I think mapPartition is what you need, but it seems that you have to load the partition into the memory as whole by toArray rdd.mapPartition{D => {val p = D.toArray; ...}} -- Nan Zhu On Tuesday, April 8, 2014 at 8:40 AM, wxhsdp wrote: > yes, how can i d

Re: Why doesn't the driver node do any work?

2014-04-08 Thread Nan Zhu
may be unrelated to the question itself, just FYI you can run your driver program in worker node with Spark-0.9 http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Best, -- Nan Zhu On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas

Re: Only TraversableOnce?

2014-04-09 Thread Nan Zhu
Yeah, should be right -- Nan Zhu On Wednesday, April 9, 2014 at 8:54 PM, wxhsdp wrote: > thank you, it works > after my operation over p, return p.toIterator, because mapPartitions has > iterator return type, is that right? > rdd.mapPartitions{D => {val p = D.toArray; ..

spark-0.9.1 compiled with Hadoop 2.3.0 doesn't work with S3?

2014-04-21 Thread Nan Zhu
Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 63 more Anyone else met the similar problem? Best,

Re: spark-0.9.1 compiled with Hadoop 2.3.0 doesn't work with S3?

2014-04-21 Thread Nan Zhu
Yes, I fixed in the same way, but didn’t get a change to get back to here I also made a PR: https://github.com/apache/spark/pull/468 Best, -- Nan Zhu On Monday, April 21, 2014 at 8:19 PM, Parviz Deyhim wrote: > I ran into the same issue. The problem seems to be with the jets3t libr

Spark-1.0.0-rc3 compiled against Hadoop 2.3.0 cannot read HDFS 2.3.0?

2014-05-03 Thread Nan Zhu
java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:256) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:54) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Anyone met the same issue before? Best, -- Nan Zhu

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-05-05 Thread Nan Zhu
Ah, I think this should be fixed in 0.9.1? Did you see the exception is thrown in the worker side? Best, -- Nan Zhu On Sunday, May 4, 2014 at 10:15 PM, Cheney Sun wrote: > Hi Nan, > > Have you found a way to fix the issue? Now I run into the same problem with > v

executor processes are still there even I killed the app and the workers

2014-05-10 Thread Nan Zhu
Hi, all With Spark 1.0 RC3, I found that the executor processes are still there even I killed the app and the workers? Any one found the same problem (maybe also exist in other versions)? Best, -- Nan Zhu

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-05-15 Thread Nan Zhu
This is a bit different from what I met before, I’m suspecting that this is a new bug, I will look at this further -- Nan Zhu On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote: > Hi Nan, > > In worker's log, I see the following exception thrown when try to launch

Re: sbt run with spark.ContextCleaner ERROR

2014-05-15 Thread Nan Zhu
same problem +1, though does not change the program result -- Nan Zhu On Tuesday, May 6, 2014 at 11:58 PM, Tathagata Das wrote: > Okay, this needs to be fixed. Thanks for reporting this! > > > > On Mon, May 5, 2014 at 11:00 PM, wxhsdp (mailto:wxh...@gmail.com)>

Re: Spark unit testing best practices

2014-05-16 Thread Nan Zhu
+1, at least with current code just watch the log printed by DAGScheduler… -- Nan Zhu On Wednesday, May 14, 2014 at 1:58 PM, Mark Hamstra wrote: > serDe

IllegelAccessError when writing to HBase?

2014-05-18 Thread Nan Zhu
(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Can anyone give some hint to the issue? Best, -- Nan Zhu

Re: IllegelAccessError when writing to HBase?

2014-05-18 Thread Nan Zhu
I tried hbase-0.96.2/0.98.1/0.98.2 HDFS version is 2.3 -- Nan Zhu On Sunday, May 18, 2014 at 4:18 PM, Nan Zhu wrote: > Hi, all > > I tried to write data to HBase in a Spark-1.0 rc8 application, > > the application is terminated due to the java.lang.IllegalAccessError,

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
Hi, Patrick, I think https://issues.apache.org/jira/browse/SPARK-1677 is talking about the same thing? How about assigning it to me? I think I missed the configuration part in my previous commit, though I declared that in the PR description…. Best, -- Nan Zhu On Monday, June 2