Replacement for SparkSqlSerializer.deserialize[

2016-09-06 Thread Ted Yu
Hi, In hbase-spark module of hbase, we previously had this code: def hbaseFieldToScalaType( f: Field, src: Array[Byte], offset: Int, length: Int): Any = { ... case BinaryType => val newArray = new Array[Byte](length) System.arraycopy(src, offse

df.groupBy('m).agg(sum('n)).show dies with 10^3 elements?

2016-09-06 Thread Jacek Laskowski
Hi, I'm concerned with the OOME in local mode with the version built today: scala> val intsMM = 1 to math.pow(10, 3).toInt intsMM: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,

Re: df.groupBy('m).agg(sum('n)).show dies with 10^3 elements?

2016-09-06 Thread Josh Rosen
I think that this is a simpler case of https://issues.apache.org/jira/browse/SPARK-17405. I'm going to comment on that ticket with your simpler reproduction. On Tue, Sep 6, 2016 at 1:32 PM Jacek Laskowski wrote: > Hi, > > I'm concerned with the OOME in local mode with the version built today: >

BlockMatrix Multiplication fails with Out of Memory

2016-09-06 Thread vinodep
Hi, I am trying to multiply Matrix of size 67584*67584 in a loop. In the first iteration, multiplication goes through, but in the second iteration, it fails with Java heap out of memory issue. I'm using pyspark and below is the configuration. Setup: 70 nodes (1driver+69 workers) with SPARK_DRIVER_

Unable to run docker jdbc integrations test ?

2016-09-06 Thread Suresh Thalamati
Hi, I am getting the following error , when I am trying to run jdbc docker integration tests on my laptop. Any ideas , what I might be be doing wrong ? build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive-thriftserver -Phive -DskipTests clean install build/mvn -Pdocker-integration-t

Re: df.groupBy('m).agg(sum('n)).show dies with 10^3 elements?

2016-09-06 Thread Jacek Laskowski
Hi Josh, Yes, that seems to be the issue. As I commented out in the JIRA, just yesterday (after I had sent the email), such simple queries like the following killed spark-shell: Seq(1).toDF.groupBy('value).count.show Hoping to see it get resolved soon. If there's anything I could help you with t

Discuss SparkR executors/workers support virtualenv

2016-09-06 Thread Yanbo Liang
Hi All, Many users have requirements to use third party R packages in executors/workers, but SparkR can not satisfy this requirements elegantly. For example, you should to mess with the IT/administrators of the cluster to deploy these R packages on each executors/workers node which is very inflex