I'm not 100% sure I understand your question, but yes, Spark (both the RDD
API and SQL/DataFrame) does partial aggregation.
On Tue, Feb 9, 2016 at 8:37 PM, Rishitesh Mishra
wrote:
> Can anybody confirm, whether ANY operator in Spark SQL uses
> map-side-combine ? If not, is it safe to assume Sor
reminder: this is happening tomorrow morning.
On Mon, Feb 8, 2016 at 9:27 AM, shane knapp wrote:
> happy monday!
>
> i will be bringing down jenkins and the workers thursday morning to
> upgrade docker on all of the workers from 1.5.0-1 to 1.7.1-2.
>
> as of december last year, docker 1.5 and ol
Hello community,
Joseph and I would like to introduce a new Spark package that should
be useful for python users that depend on scikit-learn.
Among other tools:
- train and evaluate multiple scikit-learn models in parallel.
- convert Spark's Dataframes seamlessly into numpy arrays
- (experiment
Yes Ted, spark.executor.extraClassPath will work if hbase client jars is
present in all Spark Worker / NodeManager machines.
spark.yarn.dist.files is the easier way, as hbase client jars can be copied
from driver machine or hdfs into container / spark-executor classpath
automatically. No need to m
+ Spark-Dev
For a Spark job on YARN accessing hbase table, added all hbase client jars
into spark.yarn.dist.files, NodeManager when launching container i.e
executor, does localization and brings all hbase-client jars into executor
CWD, but still the executor tasks fail with ClassNotFoundException