Re: Should spark-ec2 get its own repo?

2015-07-17 Thread Sean Owen
On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman wrote: > I am not sure why the ASF JIRA can be only used to track one set of > artifacts that are packaged and released together. I agree that marking a > fix version as 1.5 for a change in another repo doesn't make a lot of sense, > but we co

RE: Model parallelism with RDD

2015-07-17 Thread Ulanov, Alexander
Hi Shivaram, Thank you for the explanation. Is there a direct way to check the length of the lineage i.e. that the computation is repeated? Best regards, Alexander From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] Sent: Friday, July 17, 2015 10:10 AM To: Ulanov, Alexander Cc: shiv

Re: Should spark-ec2 get its own repo?

2015-07-17 Thread Shivaram Venkataraman
Some replies inline On Wed, Jul 15, 2015 at 1:08 AM, Sean Owen wrote: > The code can continue to be a good reference implementation, no matter > where it lives. In fact, it can be a better more complete one, and > easier to update. > > I agree that ec2/ needs to retain some kind of pointer to th

Re: BlockMatrix multiplication

2015-07-17 Thread Burak Yavuz
Hi Alexander, Feel free to submit an "improvement" JIRA. Best, Burak On Thu, Jul 16, 2015 at 4:20 PM, Ulanov, Alexander wrote: > Hi Burak, > > > > If I change the code as you suggested then it fails with (given that > blockSize is 1): > > “org.apache.spark.SparkException: The MatrixBlock

Re: Model parallelism with RDD

2015-07-17 Thread Shivaram Venkataraman
You can also use checkpoint to truncate the lineage and the data can be persisted to HDFS. Fundamentally the state of the RDD needs to be saved to memory or disk if you don't want to repeat the computation. Thanks Shivaram On Thu, Jul 16, 2015 at 4:59 PM, Ulanov, Alexander wrote: > Dear Spark

Re: Apache gives exception when running groupby on df temp table

2015-07-17 Thread nipun
You are right. There are some odd things about this connector. Earlier I got OOM exception with this connector just because there was a bug in the connector which transferred only 64 bytes before closing the connection and now this one Strangely I copied the data into another data frame and it work

Re: Apache gives exception when running groupby on df temp table

2015-07-17 Thread Yana Kadiyska
I think that might be a connector issue. You say you are using Spark 1.4, are you also using 1.4 version of the Spark-cassandra-connector? The do have some bugs around this, e.g. https://datastax-oss.atlassian.net/browse/SPARKC-195. Also, I see that you import org.apache.spark.sql.cassandra.Cassand

Re: why doesn't jenkins like me?

2015-07-17 Thread Josh Rosen
The "It is not a test" failed test message means that something went wrong in a suite-wide setup or teardown method. This could be some sort of race or flakiness. If this problem persists, we should file a JIRA and label it with "flaky-test" so that we can find it later. On Thu, Jul 16, 2015 at

Re: KryoSerializer gives class cast exception

2015-07-17 Thread Josh Rosen
We've run into other problems caused by our old Kryo versions. I agree that the Chill dependency is one of the main blockers to upgrading Kryo, but I don't think that it's insurmountable: if necessary, we could just publish our own forked version of Chill under our own namespace, similar to what we

Re: Apache gives exception when running groupby on df temp table

2015-07-17 Thread nipun
spark version 1.4 import com.datastax.spark.connector._ import org.apache.spark._ import org.apache.spark.sql.cassandra.CassandraSQLContext import org.apache.spark.SparkConf //import com.microsoft.sqlserver.jdbc.SQLServerDriver import java.sql.Connection import java.sql.DriverManager import java.

Re: Hive Table with large number of partitions

2015-07-17 Thread Michael Armbrust
https://github.com/apache/spark/pull/7421 On Fri, Jul 17, 2015 at 3:26 AM, Xiaoyu Ma wrote: > Hi guys, > I saw when Hive Table object created it tries to load all existing > partitions. > > > @transient val hiveQlPartitions: Seq[Partition] = table.getAllPartitions.map > { p => > val tPartitio

Hive Table with large number of partitions

2015-07-17 Thread Xiaoyu Ma
Hi guys, I saw when Hive Table object created it tries to load all existing partitions. @transient val hiveQlPartitions: Seq[Partition] = table.getAllPartitions.map { p => val tPartition = new org.apache.hadoop.hive.metastore.api.Partition tPartition.setDbName(databaseName) tPartition.setT