You can't use existing aggregation functions with that. Besides, the
execution plan of `mapPartitions` doesn't support wholestage codegen.
Without that and some optimization around aggregation, that might be
possible performance degradation. Also when you have more than one keys in a
partition, yo
It seems that this aggregation is for dataset operations only. I would have
hoped to be able to do dataframe aggregation. Something along the line of:
sort_df(df).agg(my_agg_func)
In any case, note that this kind of sorting is less efficient than the sorting
done in window functions for example
I would love this feature
On Thu, 22 Dec 2016, 18:45 assaf.mendelson, wrote:
> It seems that this aggregation is for dataset operations only. I would
> have hoped to be able to do dataframe aggregation. Something along the line
> of: sort_df(df).agg(my_agg_func)
>
>
>
> In any case, note that th
yes it's less optimal because an abstraction is missing and with
mapPartitions it is done without optimizations. but aggregator is not the
right abstraction to begin with, is assumes a monoid which means no
ordering guarantees. you need a fold operation.
On Dec 22, 2016 02:20, "Liang-Chi Hsieh" w
Hi,
I quoted the description of `sampleByKeyExact`:
"This method differs from [[sampleByKey]] in that we make additional passes
over the RDD to
create a sample size that's exactly equal to the sum of math.ceil(numItems *
samplingRate)
over all key values with a 99.99% confidence. When sampling w
Hi,
I think there is an issue in `ExternalAppendOnlyMap.forceSpill` which is
called to release memory when there is another memory consumer tried to ask
more memory than current available.
I created a Jira and submit a PR for it. Please check out
https://issues.apache.org/jira/browse/SPARK-18986
Hello Spark Community,
For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
then submit to spark-submit.
Example,
bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
/home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
But other folks has debate with for Uber Less