Re: Spark on Mesos 0.20

2014-10-05 Thread Andrew Ash
Hi Gurvinder, Is there a SPARK ticket tracking the issue you describe? On Mon, Oct 6, 2014 at 2:44 AM, Gurvinder Singh wrote: > On 10/06/2014 08:19 AM, Fairiz Azizi wrote: > > The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 > > > > I've gotten it to work just fine on 0

Re: Spark on Mesos 0.20

2014-10-05 Thread Gurvinder Singh
On 10/06/2014 08:19 AM, Fairiz Azizi wrote: > The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 > > I've gotten it to work just fine on 0.18.1 and 0.18.2 > > Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? > > -Fi > Yeah we are using Spark 1.1.0 w

Spark on Mesos 0.20

2014-10-05 Thread Fairiz Azizi
The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi

Too big data Spark SQL on Hive table on version 1.0.2 has some strange output

2014-10-05 Thread Trident
Dear Developers, I'm limited in using Spark 1.0.2 currently. I use Spark SQL on Hive table to load amplab benchmark, which is 25.6GiB approximately. I run: CREATE EXTERNAL TABLE uservisits (sourceIP STRING,destURL STRING, visitDate STRING,adRevenue DOUBLE,userAgent STRING,countryCode STRING,

Re: SPARK-3660 : Initial RDD for updateStateByKey transformation

2014-10-05 Thread Soumitra Kumar
Hello, I have submitted a pull request (Adding support of initial value for state update. #2665), please review and let me know. Excited to submit my first pull request. -Soumitra. - Original Message - From: "Soumitra Kumar" To: dev@spark.apache.org Sent: Tuesday, September 23, 2014 1

Hyper Parameter Tuning Algorithms

2014-10-05 Thread Lochana Menikarachchi
Found this thread from April.. http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccabjxkq6b7sfaxie4+aqtcmd8jsqbznsxsfw6v5o0wwwouob...@mail.gmail.com%3E Wondering what the status of this.. We are thinking about implementing these algorithms.. Would be a waste if they are already

Re: Parquet schema migrations

2014-10-05 Thread Michael Armbrust
Hi Cody, Assuming you are talking about 'safe' changes to the schema (i.e. existing column names are never reused with incompatible types), this is something I'd love to support. Perhaps you can describe more what sorts of changes you are making, and if simple merging of the schemas would be suff

Re: Jython importing pyspark?

2014-10-05 Thread Matei Zaharia
PySpark doesn't attempt to support Jython at present. IMO while it might be a bit faster, it would lose a lot of the benefits of Python, which are the very strong data processing libraries (NumPy, SciPy, Pandas, etc). So I'm not sure it's worth supporting unless someone demonstrates a really maj

Re: Parquet schema migrations

2014-10-05 Thread Andrew Ash
Hi Cody, I wasn't aware there were different versions of the parquet format. What's the difference between "raw parquet" and the Hive-written parquet files? As for your migration question, the approaches I've often seen are convert-on-read and convert-all-at-once. Apache Cassandra for example d

Re: Impact of input format on timing

2014-10-05 Thread Matei Zaharia
Hi Tom, HDFS and Spark don't actually have a minimum block size -- so in that first dataset, the files won't each be costing you 64 MB. However, the main reason for difference in performance here is probably the number of RDD partitions. In the first case, Spark will create an RDD with 1 pa

Impact of input format on timing

2014-10-05 Thread Tom Hubregtsen
Hi, I ran the same version of a program with two different types of input containing equivalent information. Program 1: 10,000 files with on average 50 IDs, one every line Program 2: 1 file containing 10,000 lines. On average 50 IDs per line My program takes the input, creates key/value pairs of

Jython importing pyspark?

2014-10-05 Thread Robert C Senkbeil
Hi there, I wanted to ask whether or not anyone has successfully used Jython with the pyspark library. I wasn't sure if the C extension support was needed for pyspark itself or was just a bonus of using Cython. There was a claim ( http://apache-spark-developers-list.1001551.n3.nabble.com/PySpar