I actually do agree with this - let's see if we can find a solution
that doesn't regress this behavior. Maybe we can simply move the one
kafka example into its own project instead of having it in the
examples project.
On Wed, Nov 12, 2014 at 11:07 PM, Sandy Ryza wrote:
> Currently there are no ma
Currently there are no mandatory profiles required to build Spark. I.e.
"mvn package" just works. It seems sad that we would need to break this.
On Wed, Nov 12, 2014 at 10:59 PM, Patrick Wendell
wrote:
> I think printing an error that says "-Pscala-2.10 must be enabled" is
> probably okay. It'
I think printing an error that says "-Pscala-2.10 must be enabled" is
probably okay. It's a slight regression but it's super obvious to
users. That could be a more elegant solution than the somewhat
complicated monstrosity I proposed on the JIRA.
On Wed, Nov 12, 2014 at 10:37 PM, Prashant Sharma
One thing we can do it is print a helpful error and break. I don't know
about how this can be done, but since now I can write groovy inside maven
build so we have more control. (Yay!!)
Prashant Sharma
On Thu, Nov 13, 2014 at 12:05 PM, Patrick Wendell
wrote:
> Yeah Sandy and I were chatting ab
Yeah Sandy and I were chatting about this today and din't realize
-Pscala-2.10 was mandatory. This is a fairly invasive change, so I was
thinking maybe we could try to remove that. Also if someone doesn't
give -Pscala-2.10 it fails in a way that is initially silent, which is
bad because most people
For scala 2.11.4, there are minor changes needed in repl code. I can do
that if that is a high priority.
Prashant Sharma
On Thu, Nov 13, 2014 at 11:59 AM, Prashant Sharma
wrote:
> Thanks Patrick, I have one suggestion that we should make passing
> -Pscala-2.10 mandatory for maven users. I am
Thanks Patrick, I have one suggestion that we should make passing
-Pscala-2.10 mandatory for maven users. I am sorry for not mentioning this
before. There is no way around not passing that option for maven
users(only). However, this is unnecessary for sbt users because it is added
automatically if
I will start the vote with a +1
2014-11-12 20:34 GMT-08:00 Andrew Or :
> Please vote on releasing the following candidate as Apache Spark version 1
> .1.1.
>
> This release fixes a number of bugs in Spark 1.1.0. Some of the notable
> ones are
> - [SPARK-3426] Sort-based shuffle compression settin
Currently there’s no way to cache the compressed sequence file directly.
Spark SQL uses in-memory columnar format while caching table rows, so we
must read all the raw data and convert them into columnar format.
However, you can enable in-memory columnar compression by setting
|spark.sql.inMemo
We noticed while caching data from our hive tables which contain data in
compressed sequence file format that it gets uncompressed in memory when
getting cached. Is there a way to turn this off and cache the compressed
data as is ?
:
> This is the log output:
>
> 2014-11-12 19:07:16,561 INFO thriftserver.SparkExecuteStatementOperation
> (Logging.scala:logInfo(59)) - Running query 'CACHE TABLE xyz_cached AS
> SELECT * FROM xyz where date_prefix = 20141112'
>
> 2014-11-12 19:07:17,455
Other wish include them at the time of execution. here is an example.
spark-submit --jars
/opt/cloudera/parcels/CDH/lib/zookeeper/zookeeper-3.4.5-cdh5.1.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/guava-12.0.1.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/protobuf-java-2.5.0.jar,/opt/cloudera/par
Yep, you’d need to shade jars to ensure all your dependencies are in the
classpath.
Thanks,
Hari
On Wed, Nov 12, 2014 at 3:23 AM, Ted Malaska
wrote:
> Hey this is Ted
> Are you using Shade when you build your jar and are you using the bigger
> jar? Looks like classes are not included in you
Hi Gary,
Could you create a Spark JIRA ticket for this so that it doesn't fall
through the cracks? Thanks!
On Thu, Nov 6, 2014 at 2:10 PM, Gary Malouf wrote:
> We have some data that we are exporting from our HDFS cluster to S3 with
> some help from Spark. The final RDD command we run is:
>
>
This is the log output:
2014-11-12 19:07:16,561 INFO thriftserver.SparkExecuteStatementOperation
(Logging.scala:logInfo(59)) - Running query 'CACHE TABLE xyz_cached AS
SELECT * FROM xyz where date_prefix = 20141112'
2014-11-12 19:07:17,455 INFO Configuration.d
We are running spark on yarn with combined memory > 1TB and when trying to
cache a table partition(which is < 100G), seeing a lot of failed collect
stages in the UI and this never succeeds. Because of the failed collect, it
seems like the mapPartitions keep getting resubmitted. We have more than
en
Hey this is Ted
Are you using Shade when you build your jar and are you using the bigger
jar? Looks like classes are not included in you jar.
On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson <
jeniba.john...@lntinfotech.com> wrote:
> Hi Hari,
>
> Now Iam trying out the same FlumeEventCount examp
- Tip: when you rebase, IntelliJ will temporarily think things like the
Kafka module are being removed. Say 'no' when it asks if you want to remove
them.
- Can we go straight to Scala 2.11.4?
On Wed, Nov 12, 2014 at 5:47 AM, Patrick Wendell wrote:
> Hey All,
>
> I've just merged a patch that add
18 matches
Mail list logo