You could
df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save
It could get some data skew problem but might work for you
From: Burak Yavuz
Sent: Tuesday, May 7, 2019 9:35:10 AM
To: Shubham Chaurasia
Cc: dev; user@spark.apache.org
Subject: Re: Static parti
I have a “directory” in S3 containing Parquet files created from Avro using the
AvroParquetWriter in the parquet-mr project.
I can load the contents of these files as a DataFrame using
val it = spark.read.parquet("s3a://coolbeth/file=testy")
but I have not found a way to define a permanent t
It depends on the data source. Delta Lake (https://delta.io) allows you to
do it with the .option("replaceWhere", "c = c1"). With other file formats,
you can write directly into the partition directory (tablePath/c=c1), but
you lose atomicity.
On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia
wrote:
Hey Spark Experts,
After listening to some of you, and the presentations at Spark Summit in
SF, I am transitioning from d-streams to structured streaming however I am
seeing some weird results.
My use case is as follows: I am reading in a stream from a kafka topic,
transforming a message, and wri
Hi All,
Is there a way I can provide static partitions in partitionBy()?
Like:
df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save
Above code gives following error as it tries to find column `c=c1` in df.
org.apache.spark.sql.AnalysisException: Partition column `c=c1` not
Hi all:
My spark's version is 2.3.2. I start thriftserver with default spark
config. On another hand, I use java-application to query result via JDBC .
The query application has plenty of statement to execute. The previous
statement executes very quickly, and the latter statement execut
It would be a dream to have an easy-to-use dynamic metric system AND a
reliable counting system (accumulator-like) in Spark...
Thanks
Roberto
On Tue, May 7, 2019 at 3:54 AM Saisai Shao wrote:
> I think the main reason why that was not merged is that Spark itself
> doesn't have such requirement,
Hi,
I'm curious about "I found the bug code". Can you point me at it? Thanks.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https:
Ok... I am sure it is a bug of spark, I found the bug code, but the code is
removed in 2.2.3, so I just upgrade spark to fix the problem.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mai