I encountered a similar problem when trying to:
ds.write().save(“s3a://some-bucket/some/path/table”);
which writes the content as a bunch of parquet files in the “folder” named
“table”.
I am using a Flintrock cluster with the Spark 3.0 preview FWIW.
Anyway, I just used the AWS SDK to remove it
Thanks Georg,
Batch import job frequency is different than the read job. Import job will run
every 15mins - 1 hour, and Read/Transform job will run once a day.
In this case i think write with sortWithinPartitions doesnt make any difference
as the combined data stored in HDFS will not be sorted
I solved the problem with the option below
spark.sql ("SET spark.hadoop.metastore.catalog.default = hive")
spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false")
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
I solved the problem with the option below
spark.sql ("SET spark.hadoop.metastore.catalog.default = hive")
spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false")
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
Maybe set spark.hadoop.validateOutputSpecs=false?
发件人: Gautham Acharya
发送时间: 2020年3月15日 3:23
收件人: user@spark.apache.org
主题: [PySpark] How to write HFiles as an 'append' to the same directory?
I have a process in Apache Spark that attempts to write HFiles to S3
Ur, are you comparing the number of SELECT statement with TRIM and CREATE
statements with `CHAR`?
> I looked up our usage logs (sorry I can't share this publicly) and trim
has at least four orders of magnitude higher usage than char.
We need to discuss more about what to do. This thread is what I
BTW I'm not opposing us sticking to SQL standard (I'm in general for it). I was
merely pointing out that if we deviate away from SQL standard in any way we are
considered "wrong" or "incorrect". That argument itself is flawed when plenty
of other popular database systems also deviate away from t
I looked up our usage logs (sorry I can't share this publicly) and trim has at
least four orders of magnitude higher usage than char.
On Mon, Mar 16, 2020 at 5:27 PM, Dongjoon Hyun < dongjoon.h...@gmail.com >
wrote:
>
> Thank you, Stephen and Reynold.
>
>
> To Reynold.
>
>
> The way I see
Thank you, Stephen and Reynold.
To Reynold.
The way I see the following is a little different.
> CHAR is an undocumented data type without clearly defined semantics.
Let me describe in Apache Spark User's View point.
Apache Spark started to claim `HiveContext` (and `hql/hiveql` function)
Hi there,
I’m kind of new around here, but I have had experience with all of all the so
called “big iron” databases such as Oracle, IBM DB2 and Microsoft SQL Server as
well as Postgresql.
They all support the notion of “ANSI padding” for CHAR columns - which means
that such columns are always
I haven't spent enough time thinking about it to give a strong opinion, but
this is of course very different from TRIM.
TRIM is a publicly documented function with two arguments, and we silently
swapped the two arguments. And trim is also quite commonly used from a long
time ago.
CHAR is an un
Hi, Reynold.
(And +Michael Armbrust)
If you think so, do you think it's okay that we change the return value
silently? Then, I'm wondering why we reverted `TRIM` functions then?
> Are we sure "not padding" is "incorrect"?
Bests,
Dongjoon.
On Sun, Mar 15, 2020 at 11:15 PM Gourav Sengupta
wrote
I use related spark config value but not works like below(success in spark
2.1.1) :
spark.hive.mapred.supports.subdirectories=true
spark.hive.supports.subdirectories=true
spark.mapred.input.dir.recursive=true
spark.hive.mapred.supports.subdirectories=true
And when I query, I also use related hive
13 matches
Mail list logo