Re: Handling stale PRs

2014-08-25 Thread Patrick Wendell
Hey Nicholas, Thanks for bringing this up. There are a few dimensions to this... one is that it's actually precedurally difficult for us to close pull requests. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas:

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
Hey Amnon, So just to make sure I understand - you also saw the same issue with 1.0.2? Just asking because whether or not this regresses the 1.0.2 behavior is important for our own bug tracking. - Patrick On Mon, Aug 25, 2014 at 10:22 PM, Amnon Khen wrote: > There were no failures nor excepti

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Amnon Khen
There were no failures nor exceptions. On Tue, Aug 26, 2014 at 1:31 AM, Matei Zaharia wrote: > Got it. Another thing that would help is if you spot any exceptions or > failed tasks in the web UI (http://:4040). > > Matei > > On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote

Re: Handling stale PRs

2014-08-25 Thread Matei Zaharia
Hey Nicholas, In general we've been looking at these periodically (at least I have) and asking people to close out of date ones, but it's true that the list has gotten fairly large. We should probably have an expiry time of a few months and close them automatically. I agree that it's daunting t

RDD replication in Spark

2014-08-25 Thread rapelly kartheek
Hi, I've exercised multiple options available for persist() including RDD replication. I have gone thru the classes that involve in caching/storing the RDDS at different levels. StorageLevel class plays a pivotal role by recording whether to use memory or disk or to replicate the RDD on multiple

Handling stale PRs

2014-08-25 Thread Nicholas Chammas
Check this out: https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc We're hitting close to 300 open PRs. Those are the least recently updated ones. I think having a low number of stale (i.e. not recently updated) PRs is a good thing to shoot for. It doesn't leave contribu

Re: Graphx seems to be broken while Creating a large graph(6B nodes in my case)

2014-08-25 Thread Ankur Dave
I posted the fix on the JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3190). To update the user list, this is indeed an integer overflow problem when summing up the partition sizes. The fix is to use Longs for the sum: https://github.com/apache/spark/pull/2106. Ankur --

too many CancelledKeyException throwed from ConnectionManager

2014-08-25 Thread yao
Hi Folks, We are testing our home-made KMeans algorithm using Spark on Yarn. Recently, we've found that the application failed frequently when doing clustering over 300,000,000 users (each user is represented by a feature vector and the whole data set is around 600,000,000). After digging into the

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
I don't think it solves Cody's problem which still need more investigating, but I believe it does solve the problem you described earlier. I just confirmed with Mesos folks that we no longer need the minimum memory requirement so we'll be dropping that soon and the workaround might not be needed f

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Matei Zaharia
Got it. Another thing that would help is if you spot any exceptions or failed tasks in the web UI (http://:4040). Matei On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote: Hi Matei, The original issue happened on a spark-1.0.2-bin-hadoop2 installation. I will try the synth

Re: [Spark SQL] off-heap columnar store

2014-08-25 Thread Henry Saputra
Hi Michael, This is great news. Any initial proposal or design about the caching to Tachyon that you can share so far? I don't think there is a JIRA ticket open to track this feature yet. - Henry On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust wrote: >> >> What is the plan for getting Tachyo

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
My problem is that I'm not sure this workaround would solve things, given the issue described here (where there was a lot of memory free but it didn't get re-offered). If you think it does, it would be good to explain why it behaves like that. Matei On August 25, 2014 at 2:28:18 PM, Timothy Ch

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread amnonkhen
Hi Matei, The original issue happened on a spark-1.0.2-bin-hadoop2 installation. I will try the synthetic operation and see if I get the same results or not. Amnon On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark Developers List] wrote: > Was the original issue with Spark 1.1 (

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-25 Thread Graham Dennis
Hi, Unless you manually patched Spark, if you have Reynold’s patch for SPARK-2878, you also have the patch for SPARK-2893 which makes the underlying cause much more obvious and explicit. So the below is unlikely to be related to SPARK-2878. Graham On 26 Aug 2014, at 4:13 am, npanj wrote: >

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
Hi Patrick, Here's the process: java -cp /root/ephemeral-hdfs/conf/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-1.1.1-SNAPSHOT-hadoop1.0.4.jar -XX:MaxPermSize=128m -Djava.library.path=/root/ephemeral-hdfs/lib/native/ -Xms5g -Xmx10g -XX:MaxPermS

RE: Working Formula for Hive 0.13?

2014-08-25 Thread Andrew Lee
>From my perspective, there're few benefits regarding Hive 0.13.1+. The >following are the 4 major ones that I can see why people are asking to upgrade >to Hive 0.13.1 recently. 1. Performance and bug fix, patches. (Usual case) 2. Native support for Parquet format, no need to provide custom JARs

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
Hi Matei, I'm going to investigate from both Mesos and Spark side will hopefully have a good long term solution. In the mean time having a work around to start with is going to unblock folks. Tim On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia wrote: > Anyway it would be good if someone from the

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
Hi Matei, At least in my case, the s3 bucket is in the same region. Running count() works and so does generating synthetic data. What I saw was that the job would hang for over an hour with no progress but tasks would immediately start finishing if I cached the data. - jerry On Mon, Aug 25, 2014

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Patrick Wendell
Hey Nicholas, That seems promising - I prefer having a proper link to having that fairly verbose comment though, because in some cases there will be dozens of comments and it could get lost. I wonder if they could do something where it posts a link instead... - Patrick On Mon, Aug 25, 2014 at 1

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
One other idea - when things freeze up, try to run jstack on the spark shell process and on the executors and attach the results. It could be that somehow you are encountering a deadlock somewhere. On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia wrote: > Was the original issue with Spark 1.1 (i.

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Matei Zaharia
Was the original issue with Spark 1.1 (i.e. master branch) or an earlier release? One possibility is that your S3 bucket is in a remote Amazon region, which would make it very slow. In my experience though saveAsTextFile has worked even for pretty large datasets in that situation, so maybe ther

Re: Storage Handlers in Spark SQL

2014-08-25 Thread Michael Armbrust
- dev list + user list You should be able to query Spark SQL using JDBC, starting with the 1.1 release. There is some documentation is the repo , and we'll update the official docs once the r

Re: [Spark SQL] off-heap columnar store

2014-08-25 Thread Michael Armbrust
> > What is the plan for getting Tachyon/off-heap support for the columnar > compressed store? It's not in 1.1 is it? It is not in 1.1 and there are not concrete plans for adding it at this point. Currently, there is more engineering investment going into caching parquet data in Tachyon instead

Re: Working Formula for Hive 0.13?

2014-08-25 Thread Michael Armbrust
Thanks for working on this! Its unclear at the moment exactly how we are going to handle this, since the end goal is to be compatible with as many versions of Hive as possible. That said, I think it would be great to open a PR in this case. Even if we don't merge it, thats a good way to get it o

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
Anyway it would be good if someone from the Mesos side investigates this and proposes a solution. The 32 MB per task hack isn't completely foolproof either (e.g. people might allocate all the RAM to their executor and thus stop being able to launch tasks), so maybe we wait on a Mesos fix for thi

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
This is kind of weird then, seems perhaps unrelated to this issue (or at least to the way I understood it). Is the problem maybe that Mesos saw 0 MB being freed and didn't re-offer the machine *even though there was more than 32 MB free overall*? Matei On August 25, 2014 at 12:59:59 PM, Cody K

Re: Mesos/Spark Deadlock

2014-08-25 Thread Cody Koeninger
I definitely saw a case where a. the only job running was a 256m shell b. I started a 2g job c. a little while later the same user as in a started another 256m shell My job immediately stopped making progress. Once user a killed his shells, it started again. This is on nodes with ~15G of memory

Re: I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-08-25 Thread Xiangrui Meng
The evaluation metrics are definitely useful. How do they differ from traditional IR metrics like prec@k and ndcg@k? -Xiangrui On Mon, Aug 25, 2014 at 2:14 AM, Lizhengbing (bing, BIPA) < zhengbing...@huawei.com> wrote: > Hi: > > In paper “Item-Based Top-N Recommendation Algorithms”( > https://s

Re: take() reads every partition if the first one is empty

2014-08-25 Thread Andrew Ash
Filed as https://issues.apache.org/jira/browse/SPARK-3211 On Fri, Aug 22, 2014 at 1:06 PM, Andrew Ash wrote: > Yep, anyone can create a bug at > https://issues.apache.org/jira/browse/SPARK > > Then if you make a pull request on GitHub and have the bug number in the > header like "[SPARK-1234] M

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread amnonkhen
Hi jerryye, Maybe if you voted up my question on Stack Overflow it would get some traction and we would get nearer to a solution. Thanks, Amnon -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
BTW it seems to me that even without that patch, you should be getting tasks launched as long as you leave at least 32 MB of memory free on each machine (that is, the sum of the executor memory sizes is not exactly the same as the total size of the machine). Then Mesos will be able to re-offer t

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-25 Thread npanj
I am running the code with @rxin's patch in standalone mode. In my case I am registering "org.apache.spark.graphx.GraphKryoRegistrator" . Recently I started to see "com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR" . Has anyone seen this

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Nicholas Chammas
FYI: Looks like the Mesos folk also have a bot to do automatic linking, but it appears to have been provided to them somehow by ASF. See this comment as an example: https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078&page=com.atlassian.jira.plugin.system.issuetabpanels:comme

Re: Mesos/Spark Deadlock

2014-08-25 Thread Gary Malouf
We have not tried the work-around because there are other bugs in there that affected our set-up, though it seems it would help. On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen wrote: > +1 to have the work around in. > > I'll be investigating from the Mesos side too. > > Tim > > On Sun, Aug 24,

I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-08-25 Thread Lizhengbing (bing, BIPA)
Hi: In paper "Item-Based Top-N Recommendation Algorithms"(https://stuyresearch.googlecode.com/hg/blake/resources/10.1.1.102.4451.pdf), there are two parameters measuring the quality of recommendation: HR and ARHR. If I use ALS(Implicit) for top-N recommendation system, I want to check it's quali