Re: Unmarking most things as experimental, evolving for 3.0?

2019-08-21 Thread Dongjoon Hyun
+1 for unmarking old ones (made in `2.3.x` and before). Thank you, Sean. Bests, Dongjoon. On Wed, Aug 21, 2019 at 6:46 PM Sean Owen wrote: > There are currently about 130 things marked as 'experimental' in > Spark, and some have been around since Spark 1.x. A few may be > legitimately still exp

Unmarking most things as experimental, evolving for 3.0?

2019-08-21 Thread Sean Owen
There are currently about 130 things marked as 'experimental' in Spark, and some have been around since Spark 1.x. A few may be legitimately still experimental (e.g. barrier mode), but, would it be safe to say most of these annotations should be removed for 3.0? What's the theory for evolving vs e

Fwd: Custom aggregations: modular and lightweight solutions?

2019-08-21 Thread Andrew Leverentz
Hi All, Apologies for cross-posting this, but I'm wondering if the dev list might be a better place for my questions below. For now, I'm developing set of utilities for my own use, but if I can get these utilities working, I'd like to see if it might be worth contributing them to the Spark projec

Re: Data Property Accumulators

2019-08-21 Thread Erik Erlandson
I'm wondering whether keeping track of accumulation in "consistent mode" is like a case for mapping straight to the Try value, so parsedData has type RDD[Try[...]], and counting failures is parsedData.filter(_.isFailure).count, etc Put another way: Consistent mode accumulation seems (to me) like i

Please help the question of repartition for dataset from partiitoned hive table

2019-08-21 Thread zhangliyun
Hi All: i have a question about repartition api and sparksql partition. I have an table which partition key is day ``` ./bin/spark-sql -e "CREATE TABLE t_original_partitioned_spark (cust_id int, loss double) PARTITIONED BY (day STRING) location 'hdfs://localhost:9000/t_original_partitione