Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-13 Thread Akhil Das
Yep, and it works fine for operations which does not involve any shuffle (like foreach,, count etc) and those which involves shuffle operations ends up in an infinite loop. Spark should somehow indicate this instead of going in an infinite loop. Thanks Best Regards On Thu, Aug 13, 2015 at 11:37 P

Re: please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi, Sea Problem solved, it turn out to be that I have updated spark cluster to 1.4.1, however the client has not been updated. Thank you so much. Sea <261810...@qq.com>于2015年8月14日周五 下午1:01写道: > I have no idea... We use scala. You upgrade to 1.4 so quickly..., are you > using spark in p

Re?? please help with ClassNotFoundException

2015-08-13 Thread Sea
I have no idea... We use scala. You upgrade to 1.4 so quickly..., are you using spark in production? Spark 1.3 is better than spark1.4. -- -- ??: "??";; : 2015??8??14??(??) 11:14 ??: "Sea"<261810...@qq.com>; "dev@spark.apach

RE: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Cheng, Hao
OK, thanks, probably just myself… From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, August 14, 2015 11:04 AM To: Cheng, Hao Cc: Josh Rosen; dev Subject: Re: Automatically deleting pull request comments left by AmplabJenkins I tried accessing just now. It took several seconds before the page

Re: please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi Sea I have updated spark to 1.4.1, however the problem still exists, any idea? Sea <261810...@qq.com>于2015年8月14日周五 上午12:36写道: > Yes, I guess so. I see this bug before. > > > -- 原始邮件 -- > *发件人:* "周千昊";; > *发送时间:* 2015年8月13日(星期四) 晚上9:30 > *收件人:* "Sea"<261810.

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
I tried accessing just now. It took several seconds before the page showed up. FYI On Thu, Aug 13, 2015 at 7:56 PM, Cheng, Hao wrote: > I found the https://spark-prs.appspot.com/ is super slow while open it in > a new window recently, not sure just myself or everybody experience the > same, is

RE: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Cheng, Hao
I found the https://spark-prs.appspot.com/ is super slow while open it in a new window recently, not sure just myself or everybody experience the same, is there anyways to speed up? From: Josh Rosen [mailto:rosenvi...@gmail.com] Sent: Friday, August 14, 2015 10:21 AM To: dev Subject: Re: Automat

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
Thanks Josh for the initiative. I think reducing the redundancy in QA bot posts would make discussion on GitHub UI more focused. Cheers On Thu, Aug 13, 2015 at 7:21 PM, Josh Rosen wrote: > Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 > > On Wed, Aug 12, 2015 at 7:51

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Josh Rosen
Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 On Wed, Aug 12, 2015 at 7:51 PM, Josh Rosen wrote: > *TL;DR*: would anyone object if I wrote a script to auto-delete pull > request comments from AmplabJenkins? > > Currently there are two bots which post Jenkins test resul

Re: Developer API & plugins for Hive & Hadoop ?

2015-08-13 Thread Sandy Ryza
Hi Tom, Not sure how much this helps, but are you aware that you can build Spark with the -Phadoop-provided profile to avoid packaging Hadoop dependencies in the assembly jar? -Sandy On Fri, Aug 14, 2015 at 6:08 AM, Thomas Dudziak wrote: > Unfortunately it doesn't because our version of Hive h

Re: Developer API & plugins for Hive & Hadoop ?

2015-08-13 Thread Thomas Dudziak
Unfortunately it doesn't because our version of Hive has different syntax elements and thus I need to patch them in (and a few other minor things). It would be great if there would be a developer api on a somewhat higher level. On Thu, Aug 13, 2015 at 2:19 PM, Reynold Xin wrote: > I believe for

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread rfarrjr
That works. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/possible-bug-user-SparkConf-properties-not-copied-to-worker-process-tp13665p13689.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

Re: Developer API & plugins for Hive & Hadoop ?

2015-08-13 Thread Reynold Xin
I believe for Hive, there is already a client interface that can be used to build clients for different Hive metastores. That should also work for your heavily forked one. For Hadoop, it is definitely a bigger project to refactor. A good way to start evaluating this is to list what needs to be cha

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread Reynold Xin
Is this through Java properties? For java properties, you can pass them using spark.executor.extraJavaOptions. On Thu, Aug 13, 2015 at 2:11 PM, rfarrjr wrote: > Thanks for the response. > > In this particular case we passed a url that would be leveraged when > configuring some serialization s

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread rfarrjr
Thanks for the response. In this particular case we passed a url that would be leveraged when configuring some serialization support for Kryo. We are using a schema registry and leveraging it to efficiently serialize avro objects without the need to register specific records or schemas up front.

Developer API & plugins for Hive & Hadoop ?

2015-08-13 Thread Thomas Dudziak
Hi, I have asked this before but didn't receive any comments, but with the impending release of 1.5 I wanted to bring this up again. Right now, Spark is very tightly coupled with OSS Hive & Hadoop which causes me a lot of work every time there is a new version because I don't run OSS Hive/Hadoop v

Re: What does NativeMethodAccessorImpl.java do?

2015-08-13 Thread freedafeng
Thanks Marcelo! The reason I was asking that question is that I was expecting my spark job to be a "map only" job. In other words, it should finish after the mapPartitions run for all partitions. This is because the job is only mapPartitions() plus count() where mapPartitions only yield one integ

Fwd: [ANNOUNCE] Spark 1.5.0-preview package

2015-08-13 Thread Reynold Xin
(I tried to send this last night but somehow ASF mailing list rejected my mail) In order to facilitate community testing of the 1.5.0 release, I've built a preview package. This is not a release candidate, so there is no voting involved. However, it'd be great if community members can start testi

Re: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Dirceu Semighini Filho
Hi Naga, If you are trying to use classes from this jar, you will need to call the addJar method from the sparkcontext, which will put this jar in the all workers context. Even when you execute it in standalone. 2015-08-13 16:02 GMT-03:00 Naga Vij : > Hi Dirceu, > > Thanks for getting back to me

Re: 答复: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-13 Thread Ted Malaska
Cool seems like the design are very close. Here is my latest blog on my work with HBase and Spark. Let me know if you have any questions. There should be two more blogs next month talking about bulk load through spark 14150 which is committed, and SparkSQL 14181 which should be done next week.

Fwd: [ANNOUNCE] Spark 1.5.0-preview package

2015-08-13 Thread Reynold Xin
Retry sending this again ... -- Forwarded message -- From: Reynold Xin Date: Thu, Aug 13, 2015 at 12:15 AM Subject: [ANNOUNCE] Spark 1.5.0-preview package To: "dev@spark.apache.org" In order to facilitate community testing of the 1.5.0 release, I've built a preview package. Thi

Fwd: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Naga Vij
Hello, Any idea on why this is happening? Thanks Naga -- Forwarded message -- From: Naga Vij Date: Wed, Aug 12, 2015 at 5:47 PM Subject: - Spark 1.4.1 - run-example SparkPi - Failure ... To: u...@spark.apache.org Hi, I am evaluating Spark 1.4.1 Any idea on why run-example Sp

Re: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Dirceu Semighini Filho
Hi Naga, This happened here sometimes when the memory of the spark cluster wasn't enough, and Java GC enters into an infinite loop trying to free some memory. To fix this I just added more memory to the Workers of my cluster, or you can increase the number of partitions of your RDD, using the repar

Fwd: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Naga Vij
Has anyone run into this? -- Forwarded message -- From: Naga Vij Date: Wed, Aug 12, 2015 at 5:47 PM Subject: - Spark 1.4.1 - run-example SparkPi - Failure ... To: u...@spark.apache.org Hi, I am evaluating Spark 1.4.1 Any idea on why run-example SparkPi fails? Here's what I am

Re: subscribe

2015-08-13 Thread Ted Yu
See first section on https://spark.apache.org/community On Thu, Aug 13, 2015 at 9:44 AM, Naga Vij wrote: > subscribe >

subscribe

2015-08-13 Thread Naga Vij
subscribe

Re: possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread Reynold Xin
That was intentional - what's your use case that require configs not starting with spark? On Thu, Aug 13, 2015 at 8:16 AM, rfarrjr wrote: > Ran into an issue setting a property on the SparkConf that wasn't made > available on the worker. After some digging[1] I noticed that only > properties t

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-13 Thread Imran Rashid
oh I see, you are defining your own RDD & Partition types, and you had a bug where partition.index did not line up with the partitions slot in rdd.getPartitions. Is that correct? On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das wrote: > I figured that out, And these are my findings: > > -> It just en

possible bug: user SparkConf properties not copied to worker process

2015-08-13 Thread rfarrjr
Ran into an issue setting a property on the SparkConf that wasn't made available on the worker. After some digging[1] I noticed that only properties that start with "spark." are sent by the schedular. I'm not sure if this was intended behavior or not. Using Spark Streaming 1.4.1 running on Java

Re: What does NativeMethodAccessorImpl.java do?

2015-08-13 Thread Marcelo Vanzin
That's not a program, it's just a class in the Java library. Spark looks at the call stack and uses it to describe the job in the UI. If you look at the whole stack trace you'll see more things that might tell you what's really going on in that job. On Thu, Aug 13, 2015 at 9:13 AM, freedafeng wro

Re?? please help with ClassNotFoundException

2015-08-13 Thread Sea
Yes, I guess so. I see this bug before. -- -- ??: "??";; : 2015??8??13??(??) 9:30 ??: "Sea"<261810...@qq.com>; "dev@spark.apache.org"; : Re: please help with ClassNotFoundException Hi seaIs it the same issue as h

What does NativeMethodAccessorImpl.java do?

2015-08-13 Thread freedafeng
I am running a spark job with only two operations: mapPartition and then collect(). The output data size of mapPartition is very small. One integer per partition. I saw there is a stage 2 for this job that runs this java program. I am not a java programmer. Could anyone please let me know what this

Graphx - how to add vertices to a HashSet of vertices ?

2015-08-13 Thread Ranjana Rajendran
Hi, sampledVertices is a HashSet of vertices var sampledVertices: HashSet[VertexId] = HashSet() In each iteration, I am making a list of neighborVertexIds val neighborVertexIds = burnEdges.map((e:Edge[Int]) => e.dstId) I want to add this neighborVertexIds to the sampledVertices Has

Re: Switch from Sort based to Hash based shuffle

2015-08-13 Thread Akhil Das
Have a look at spark.shuffle.manager, You can switch between sort and hash with this configuration. spark.shuffle.managersortImplementation to use for shuffling data. There are two implementations available:sort and hash. Sort-based shuffle is more memory-efficient and is the default option starti

Re: please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi sea Is it the same issue as https://issues.apache.org/jira/browse/SPARK-8368 Sea <261810...@qq.com>于2015年8月13日周四 下午6:52写道: > Are you using 1.4.0? If yes, use 1.4.1 > > > -- 原始邮件 -- > *发件人:* "周千昊";; > *发送时间:* 2015年8月13日(星期四) 晚上6:04 > *收件人:* "dev"; > *主题:* pl

Re: Switch from Sort based to Hash based shuffle

2015-08-13 Thread Ranjana Rajendran
Hi Cheez, You can set the parameter spark.shuffle.manager when you submit the Spark job. --conf spark.shuffle.manager=hash Thank you, Ranjana On Thu, Aug 13, 2015 at 2:26 AM, cheez <11besemja...@seecs.edu.pk> wrote: > I understand that the current master branch of Spark uses Sort based > shuff

Switch from Sort based to Hash based shuffle

2015-08-13 Thread cheez
I understand that the current master branch of Spark uses Sort based shuffle. Is there a way to change that to Hash based shuffle, just for experimental purposes by modifying the source code ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Switch-from

??????please help with ClassNotFoundException

2015-08-13 Thread Sea
Are you using 1.4.0? If yes, use 1.4.1 -- -- ??: "??";; : 2015??8??13??(??) 6:04 ??: "dev"; : please help with ClassNotFoundException Hi,I am using spark 1.4 when an issue occurs to me. I am trying to use the

please help with ClassNotFoundException

2015-08-13 Thread 周千昊
Hi, I am using spark 1.4 when an issue occurs to me. I am trying to use the aggregate function: JavaRdd rdd = some rdd; HashMap zeroValue = new HashMap(); // add initial key-value pair for zeroValue rdd.aggregate(zeroValue, new Function2,