Re: Write access to wiki

2016-01-12 Thread Nick Pentreath
I'd also like to get Wiki write access - at the least it allows a few of us to amend the "Powered By" and similar pages when those requests come through (Sean has been doing a lot of that recently :) On Mon, Jan 11, 2016 at 11:01 PM, Sean Owen wrote: > ... I forget who can give access -- is it I

ROSE: Spark + R on the JVM.

2016-01-12 Thread David
Hi all, I'd like to share news of the recent release of a new Spark package, [ROSE](http://spark-packages.org/package/onetapbeyond/opencpu-spark-executor). ROSE is a Scala library offering access to the full scientific computing power of the R programming language to Apache Spark batch and stre

Tungsten in a mixed endian environment

2016-01-12 Thread Adam Roberts
Hi all, I've been experimenting with DataFrame operations in a mixed endian environment - a big endian master with little endian workers. With tungsten enabled I'm encountering data corruption issues. For example, with this simple test code: import org.apache.spark.SparkContext import org.apach

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Ted Yu
I logged SPARK-12778 where endian awareness in Platform.java should help in mixed endian set up. There could be other parts of the code base which are related. Cheers On Tue, Jan 12, 2016 at 7:01 AM, Adam Roberts wrote: > Hi all, I've been experimenting with DataFrame operations in a mixed > e

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Corey Nolet
David, Thank you very much for announcing this! It looks like it could be very useful. Would you mind providing a link to the github? On Tue, Jan 12, 2016 at 10:03 AM, David wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark package, ROSE. > > > ROSE is a Scala lib

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread David Russell
Hi Corey, > Would you mind providing a link to the github? Sure, here is the github link you're looking for: https://github.com/onetapbeyond/opencpu-spark-executor David "All that is gold does not glitter, Not all those who wander are lost." Original Message Subject: Re: R

Re: Write access to wiki

2016-01-12 Thread shane knapp
> Ok, sounds good. I think it would be great, if you could add installing the > 'docker-engine' package and starting the 'docker' service in there too. I > was planning to update the playbook if there were one in the apache/spark > repo but I didn't see one, hence my question. > we currently have d

Eigenvalue solver

2016-01-12 Thread Lydia Ickler
Hi, I wanted to know if there are any implementations yet within the Machine Learning Library or generally that can efficiently solve eigenvalue problems? Or if not do you have suggestions on how to approach a parallel execution maybe with BLAS or Breeze? Thanks in advance! Lydia Von meinem i

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Reynold Xin
How big of a deal this use case is in a heterogeneous endianness environment? If we do want to fix it, we should do it when right before Spark shuffles data to minimize performance penalty, i.e. turn big-endian encoded data into little-indian encoded data before it goes on the wire. This is a prett

Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Alex Nastetsky
Ran into this need myself. Does Spark have an equivalent of "mapreduce. input.fileinputformat.list-status.num-threads"? Thanks. On Thu, Jul 23, 2015 at 8:50 PM, Cheolsoo Park wrote: > Hi, > > I am wondering if anyone has successfully enabled > "mapreduce.input.fileinputformat.list-status.num-t

Dependency on TestingUtils in a Spark package

2016-01-12 Thread Robert Dodier
Hi, I'm putting together a Spark package (in the spark-packages.org sense) and I'd like to make use of the class org.apache.spark.mllib.util.TestingUtils which appears in mllib/src/test. Can I declare a dependency in my build.sbt to pull in a suitable jar? I have searched around but I have not bee

Re: Dependency on TestingUtils in a Spark package

2016-01-12 Thread Ted Yu
There is no annotation in TestingUtils class indicating whether it is suitable for consumption by external projects. You should assume the class is not public since its methods may change in future Spark releases. Cheers On Tue, Jan 12, 2016 at 12:36 PM, Robert Dodier wrote: > Hi, > > I'm putt

Re: Dependency on TestingUtils in a Spark package

2016-01-12 Thread Reynold Xin
If you need it, just copy it over to your own package. That's probably the safest option. On Tue, Jan 12, 2016 at 12:50 PM, Ted Yu wrote: > There is no annotation in TestingUtils class indicating whether it is > suitable for consumption by external projects. > > You should assume the class is

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Steve Loughran
On 12 Jan 2016, at 10:49, Reynold Xin mailto:r...@databricks.com>> wrote: How big of a deal this use case is in a heterogeneous endianness environment? If we do want to fix it, we should do it when right before Spark shuffles data to minimize performance penalty, i.e. turn big-endian encoded d

Re: Eigenvalue solver

2016-01-12 Thread David Hall
(I don't know anything spark specific, so I'm going to treat it like a Breeze question...) As I understand it, Spark uses ARPACK via Breeze for SVD, and presumably the same approach can be used for EVD. Basically, you make a function that multiplies your "matrix" (which might be represented implic

Re: Dependency on TestingUtils in a Spark package

2016-01-12 Thread Robert Dodier
On Tue, Jan 12, 2016 at 12:55 PM, Reynold Xin wrote: > If you need it, just copy it over to your own package. That's probably the > safest option. OK, not a big deal, I was just hoping to avoid that, in part because the stuff I'm working on is also proposed as a pull request, and it seems like i

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Sean Owen
(x86 is little-endian and SPARC / POWER / ARM are big-endian; I'm sure that was just a typo) On Tue, Jan 12, 2016 at 9:13 PM, Steve Loughran wrote: > It's notable that Hadoop doesn't like mixed-endianness; there is work > (primarily from Oracle) to have consistent byteswapping —that is: work > re

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Steve Loughran
On 12 Jan 2016, at 10:49, Reynold Xin mailto:r...@databricks.com>> wrote: How big of a deal this use case is in a heterogeneous endianness environment? If we do want to fix it, we should do it when right before Spark shuffles data to minimize performance penalty, i.e. turn big-endian encoded d

Spark and Export Classification

2016-01-12 Thread Luciano Resende
I was looking into the Apache export control page, and didn't see Spark listed there, which from my initial investigation seemed ok because i couldn't find any handling of cryptography in Spark code. Could someone more familiar with the Spark dependency hierarchy confirm that there is no specific

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Randy Swanberg
FWIW, POWER is bi-endian. AIX still runs big-endian on POWER, but the latest Linux distros for POWER run little-endian (in fact Ubuntu for POWER only runs LE). > (x86 is little-endian and SPARC / POWER / ARM are big-endian; I'm sure > that was just a typo) > > On Tue, Jan 12, 2016 at 9:13

Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Cheolsoo Park
Alex, see this jira- https://issues.apache.org/jira/browse/SPARK-9926 On Tue, Jan 12, 2016 at 10:55 AM, Alex Nastetsky < alex.nastet...@vervemobile.com> wrote: > Ran into this need myself. Does Spark have an equivalent of "mapreduce. > input.fileinputformat.list-status.num-threads"? > > Thanks.

Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Alex Nastetsky
Thanks. I was actually able to get mapreduce.input. fileinputformat.list-status.num-threads working in Spark against a regular fileset in S3, in Spark 1.5.2 ... looks like the issue is isolated to Hive. On Tue, Jan 12, 2016 at 6:48 PM, Cheolsoo Park wrote: > Alex, see this jira- > https://issues

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread David Russell
Hi Richard, > Would it be possible to access the session API from within ROSE, > to get for example the images that are generated by R / openCPU Technically it would be possible although there would be some potentially significant runtime costs per task in doing so, primarily those related to e