Re: Why were changes of SPARK-9241 removed?

2020-03-12 Thread Xiao Li
I do not think we intentionally dropped it. Could you open a ticket in Spark JIRA with your query? Cheers, Xiao On Thu, Mar 12, 2020 at 8:24 PM 马阳阳 wrote: > Hi, > I wonder why the changes made in > "[SPARK-9241][SQL] Supporting > multiple DISTINCT columns (2) - > Rewriting Rule" are not presen

Why were changes of SPARK-9241 removed?

2020-03-12 Thread 马阳阳
Hi, I wonder why the changes made in "[SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule" are not present in Spark (verson 2.4) now. This caused execution of count distinct in Spark much slower than Spark 1.6 and hive (Spark 2.4.4 more than 18 minutes; hive about 80s, spar

Re: Hostname :BUG

2020-03-12 Thread Zahid Rahman
hey Dodgy Bob, Linux & C programmers, conscientious non - objector, I have a great idea I want share with you. In linux I am familiar with wc {wc = word count} (linux users don't like long winded typing ). wc flags are : -c, --bytes print the byte counts -m, --chars print th

Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-12 Thread Ben Roling
I've noticed that DataSet.sqlContext is public in Scala but the equivalent (DataFrame._sc) in PySpark is named as if it should be treated as private. Is this intentional? If so, what's the rationale? If not, then it feels like a bug and DataFrame should have some form of public access back to th

[Spark MicroBatchExecution] Error fetching kafka/checkpoint/state/0/0/1.delta does not exist

2020-03-12 Thread Miguel Silvestre
Hi community, I'm having this error in some kafka streams: Caused by: java.io.FileNotFoundException: File file:/efs/.../kafka/checkpoint/state/0/0/1.delta does not exist Because of this I have some streams down. How can I fix this? Thank you. -- Miguel Silvestre

Exception during writing a spark Dataframe to Redshift

2020-03-12 Thread Sandeep Patra
This is where the exception occurs: myAppDes.coalesce(1) .write .format("com.databricks.spark.redshift") .option("url", redshiftURL) .option("dbtable", redshiftTableName) .option("forward_spark_s3_credentials", "true") .option("tempdir", "s3a://zest-