Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
I actually do agree with this - let's see if we can find a solution that doesn't regress this behavior. Maybe we can simply move the one kafka example into its own project instead of having it in the examples project. On Wed, Nov 12, 2014 at 11:07 PM, Sandy Ryza wrote: > Currently there are no ma

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Sandy Ryza
Currently there are no mandatory profiles required to build Spark. I.e. "mvn package" just works. It seems sad that we would need to break this. On Wed, Nov 12, 2014 at 10:59 PM, Patrick Wendell wrote: > I think printing an error that says "-Pscala-2.10 must be enabled" is > probably okay. It'

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
I think printing an error that says "-Pscala-2.10 must be enabled" is probably okay. It's a slight regression but it's super obvious to users. That could be a more elegant solution than the somewhat complicated monstrosity I proposed on the JIRA. On Wed, Nov 12, 2014 at 10:37 PM, Prashant Sharma

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Prashant Sharma
One thing we can do it is print a helpful error and break. I don't know about how this can be done, but since now I can write groovy inside maven build so we have more control. (Yay!!) Prashant Sharma On Thu, Nov 13, 2014 at 12:05 PM, Patrick Wendell wrote: > Yeah Sandy and I were chatting ab

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
Yeah Sandy and I were chatting about this today and din't realize -Pscala-2.10 was mandatory. This is a fairly invasive change, so I was thinking maybe we could try to remove that. Also if someone doesn't give -Pscala-2.10 it fails in a way that is initially silent, which is bad because most people

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Prashant Sharma
For scala 2.11.4, there are minor changes needed in repl code. I can do that if that is a high priority. Prashant Sharma On Thu, Nov 13, 2014 at 11:59 AM, Prashant Sharma wrote: > Thanks Patrick, I have one suggestion that we should make passing > -Pscala-2.10 mandatory for maven users. I am

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Prashant Sharma
Thanks Patrick, I have one suggestion that we should make passing -Pscala-2.10 mandatory for maven users. I am sorry for not mentioning this before. There is no way around not passing that option for maven users(only). However, this is unnecessary for sbt users because it is added automatically if

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-12 Thread Andrew Or
I will start the vote with a +1 2014-11-12 20:34 GMT-08:00 Andrew Or : > Please vote on releasing the following candidate as Apache Spark version 1 > .1.1. > > This release fixes a number of bugs in Spark 1.1.0. Some of the notable > ones are > - [SPARK-3426] Sort-based shuffle compression settin

Re: Cache sparkSql data without uncompressing it in memory

2014-11-12 Thread Cheng Lian
Currently there’s no way to cache the compressed sequence file directly. Spark SQL uses in-memory columnar format while caching table rows, so we must read all the raw data and convert them into columnar format. However, you can enable in-memory columnar compression by setting |spark.sql.inMemo

Cache sparkSql data without uncompressing it in memory

2014-11-12 Thread Sadhan Sood
We noticed while caching data from our hive tables which contain data in compressed sequence file format that it gets uncompressed in memory when getting cached. Is there a way to turn this off and cache the compressed data as is ?

Re: Too many failed collects when trying to cache a table in SparkSQL

2014-11-12 Thread Sadhan Sood
: > This is the log output: > > 2014-11-12 19:07:16,561 INFO thriftserver.SparkExecuteStatementOperation > (Logging.scala:logInfo(59)) - Running query 'CACHE TABLE xyz_cached AS > SELECT * FROM xyz where date_prefix = 20141112' > > 2014-11-12 19:07:17,455

Re: Spark-Submit issues

2014-11-12 Thread Ted Malaska
Other wish include them at the time of execution. here is an example. spark-submit --jars /opt/cloudera/parcels/CDH/lib/zookeeper/zookeeper-3.4.5-cdh5.1.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/guava-12.0.1.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/protobuf-java-2.5.0.jar,/opt/cloudera/par

Re: Spark-Submit issues

2014-11-12 Thread Hari Shreedharan
Yep, you’d need to shade jars to ensure all your dependencies are in the classpath. Thanks, Hari On Wed, Nov 12, 2014 at 3:23 AM, Ted Malaska wrote: > Hey this is Ted > Are you using Shade when you build your jar and are you using the bigger > jar? Looks like classes are not included in you

Re: Wrong temp directory when compressing before sending text file to S3

2014-11-12 Thread Josh Rosen
Hi Gary, Could you create a Spark JIRA ticket for this so that it doesn't fall through the cracks? Thanks! On Thu, Nov 6, 2014 at 2:10 PM, Gary Malouf wrote: > We have some data that we are exporting from our HDFS cluster to S3 with > some help from Spark. The final RDD command we run is: > >

Re: Too many failed collects when trying to cache a table in SparkSQL

2014-11-12 Thread Sadhan Sood
This is the log output: 2014-11-12 19:07:16,561 INFO thriftserver.SparkExecuteStatementOperation (Logging.scala:logInfo(59)) - Running query 'CACHE TABLE xyz_cached AS SELECT * FROM xyz where date_prefix = 20141112' 2014-11-12 19:07:17,455 INFO Configuration.d

Too many failed collects when trying to cache a table in SparkSQL

2014-11-12 Thread Sadhan Sood
We are running spark on yarn with combined memory > 1TB and when trying to cache a table partition(which is < 100G), seeing a lot of failed collect stages in the UI and this never succeeds. Because of the failed collect, it seems like the mapPartitions keep getting resubmitted. We have more than en

Re: Spark-Submit issues

2014-11-12 Thread Ted Malaska
Hey this is Ted Are you using Shade when you build your jar and are you using the bigger jar? Looks like classes are not included in you jar. On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson < jeniba.john...@lntinfotech.com> wrote: > Hi Hari, > > Now Iam trying out the same FlumeEventCount examp

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Sean Owen
- Tip: when you rebase, IntelliJ will temporarily think things like the Kafka module are being removed. Say 'no' when it asks if you want to remove them. - Can we go straight to Scala 2.11.4? On Wed, Nov 12, 2014 at 5:47 AM, Patrick Wendell wrote: > Hey All, > > I've just merged a patch that add