date:20141031

Re: Parquet Migrations

2014-10-31 Thread Michael Armbrust

You can't change parquet schema without reencoding the data as you need to recalculate the footer index data. You can manually do what SPARK-3851 is going to do today however. Consider two schemas: Old Schema: (a: Int, b: String) New Schema, whe

Re: Spark consulting

2014-10-31 Thread Stephen Boesch

HI Alessandro, It is important to me and probably others as well to be able to focus on the technical issues and not be distracted that way. thanks stephenb 2014-10-31 13:48 GMT-07:00 Alessandro Baretta : > Stephen, > > Sorry for being OT. On the other hand, there is no j...@spark.apach

Re: Spark consulting

2014-10-31 Thread Alessandro Baretta

Stephen, Sorry for being OT. On the other hand, there is no j...@spark.apache.org, and the LinkedIn Spark group is a desert. Alex On Fri, Oct 31, 2014 at 1:44 PM, Stephen Boesch wrote: > May we please refrain from using spark mailing list for job inquiries. > Thanks. > > 2014-10-31 13:35 GMT-0

Parquet Migrations

2014-10-31 Thread Gary Malouf

Outside of what is discussed here as a future solution, is there any path for being able to modify a Parquet schema once some data has been written? This seems like the kind of thing that should make people pause when considering whether or not to

Re: Spark consulting

2014-10-31 Thread Stephen Boesch

May we please refrain from using spark mailing list for job inquiries. Thanks. 2014-10-31 13:35 GMT-07:00 Alessandro Baretta : > Hello, > > Is anyone open to do some consulting work on Spark in San Mateo? > > Thanks. > > Alex >

Spark consulting

2014-10-31 Thread Alessandro Baretta

Hello, Is anyone open to do some consulting work on Spark in San Mateo? Thanks. Alex

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Kay Ousterhout

There's been an effort in the AMPLab at Berkeley to set up a shared codebase that makes it easy to run TPC-DS on SparkSQL, since it's something we do frequently in the lab to evaluate new research. Based on this thread, it sounds like making this more widely-available is something that would be us

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas

I believe that benchmark has a pending certification on it. See http://sortbenchmark.org under "Process". It's true they did not share enough details on the blog for readers to reproduce the benchmark, but they will have to share enough with the committee behind the benchmark in order to be certif

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Steve Nunez

To be fair, we (Spark community) haven’t been any better, for example this benchmark: https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html For which no details or code have been released to allow others to reproduce it. I would encourage anyone doing a Spark benchmark in futur

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas

Thanks for the response, Patrick. I guess the key takeaways are 1) the tuning/config details are everything (they're not laid out here), 2) the benchmark should be reproducible (it's not), and 3) reach out to the relevant devs before publishing (didn't happen). Probably key takeaways for any kind

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Patrick Wendell

Hey Nick, Unfortunately Citus Data didn't contact any of the Spark or Spark SQL developers when running this. It is really easy to make one system look better than others when you are running a benchmark yourself because tuning and sizing can lead to a 10X performance improvement. This benchmark d

Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas

I know we don't want to be jumping at every benchmark someone posts out there, but this one surprised me: http://www.citusdata.com/blog/86-making-postgresql-scale-hadoop-style This benchmark has Spark SQL failing to complete several queries in the TPC-H benchmark. I don't understand much about th

Re: matrix factorization cross validation

2014-10-31 Thread Sean Owen

No, excepting approximate methods like LSH to figure out the relatively small set of candidates for the users in the partition, and broadcast or join those. On Fri, Oct 31, 2014 at 5:45 AM, Nick Pentreath wrote: > Sean, re my point earlier do you know a more efficient way to compute top k > for e

Re: Parquet Migrations

Re: Spark consulting

Re: Spark consulting

Parquet Migrations

Re: Spark consulting

Spark consulting

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Surprising Spark SQL benchmark

Re: matrix factorization cross validation

13 matches

Site Navigation

Mail list logo

Footer information