Re: Contribute code to MLlib

2015-05-18 Thread Joseph Bradley
Hi Tarek, Thanks for your interest & for checking the guidelines first! On 2 points: Algorithm: PCA is of course a critical algorithm. The main question is how your algorithm/implementation differs from the current PCA. If it's different and potentially better, I'd recommend opening up a JIRA

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-18 Thread Imran Rashid
On Fri, May 8, 2015 at 4:16 AM, Steve Loughran wrote: > Would there be a place in the code tree for some tests to run against > things like this? They're cloud integration tests rather than unit tests > and nobody would want them to be on by default, but it could be good for > regression testing

[SparkSQL] HiveContext multithreading bug?

2015-05-18 Thread Yana Kadiyska
Hi folks, wanted to get a sanity check before opening a JIRA. I am trying to do the following: create a HiveContext, then from different threads: 1. Create a DataFrame 2. Name said df via registerTempTable 3. do a simple query via sql and dropTempTable My understanding is that since HiveContext

Fwd: Problem building master on 2.11

2015-05-18 Thread Fernando O.
I just noticed I sent this to users instead of dev: -- Forwarded message -- From: Fernando O. Date: Sat, May 16, 2015 at 4:09 PM Subject: Problem building master on 2.11 To: "u...@spark.apache.org" Is anyone else having issues when building spark from git? I created a jira ticke

Re: What is the location in the source code of the computation of the elements in a map transformation?

2015-05-18 Thread Tom Hubregtsen
Hi Patrick, Thank you very much for your response. I am almost there, but am not sure about my conclusion. Let me try to approach it from a different angle. I would like to time the impact of a particular lambda function, or if possible, more broadly measure the the impact of any map function. I

Contribute code to MLlib

2015-05-18 Thread Tarek Elgamal
Hi, I would like to contribute an algorithm to the MLlib project. I have implemented a scalable PCA algorithm on spark. It is scalable for both tall and fat matrices and the paper around it is accepted for publication in SIGMOD 2015 conference. I looked at the guidelines in the following link: ht