Re: [PySpark] How to write HFiles as an 'append' to the same directory?

2020-03-16 Thread Stephen Coy
I encountered a similar problem when trying to: ds.write().save(“s3a://some-bucket/some/path/table”); which writes the content as a bunch of parquet files in the “folder” named “table”. I am using a Flintrock cluster with the Spark 3.0 preview FWIW. Anyway, I just used the AWS SDK to remove it

Re: Optimising multiple hive table join and query in spark

2020-03-16 Thread Manjunath Shetty H
Thanks Georg, Batch import job frequency is different than the read job. Import job will run every 15mins - 1 hour, and Read/Transform job will run once a day. In this case i think write with sortWithinPartitions doesnt make any difference as the combined data stored in HDFS will not be sorted

Re: pyspark(sparksql-v 2.4) cannot read hive table which is created

2020-03-16 Thread dominic kim
I solved the problem with the option below spark.sql ("SET spark.hadoop.metastore.catalog.default = hive") spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false") -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: pyspark(sparksql-v 2.4) cannot read hive table which is created

2020-03-16 Thread dominic kim
I solved the problem with the option below spark.sql ("SET spark.hadoop.metastore.catalog.default = hive") spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false") -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

回复: [PySpark] How to write HFiles as an 'append' to the same directory?

2020-03-16 Thread Zhang Victor
Maybe set spark.hadoop.validateOutputSpecs=false? 发件人: Gautham Acharya 发送时间: 2020年3月15日 3:23 收件人: user@spark.apache.org 主题: [PySpark] How to write HFiles as an 'append' to the same directory? I have a process in Apache Spark that attempts to write HFiles to S3

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Ur, are you comparing the number of SELECT statement with TRIM and CREATE statements with `CHAR`? > I looked up our usage logs (sorry I can't share this publicly) and trim has at least four orders of magnitude higher usage than char. We need to discuss more about what to do. This thread is what I

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
BTW I'm not opposing us sticking to SQL standard (I'm in general for it). I was merely pointing out that if we deviate away from SQL standard in any way we are considered "wrong" or "incorrect". That argument itself is flawed when plenty of other popular database systems also deviate away from t

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
I looked up our usage logs (sorry I can't share this publicly) and trim has at least four orders of magnitude higher usage than char. On Mon, Mar 16, 2020 at 5:27 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > Thank you, Stephen and Reynold. > > > To Reynold. > > > The way I see

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Thank you, Stephen and Reynold. To Reynold. The way I see the following is a little different. > CHAR is an undocumented data type without clearly defined semantics. Let me describe in Apache Spark User's View point. Apache Spark started to claim `HiveContext` (and `hql/hiveql` function)

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Stephen Coy
Hi there, I’m kind of new around here, but I have had experience with all of all the so called “big iron” databases such as Oracle, IBM DB2 and Microsoft SQL Server as well as Postgresql. They all support the notion of “ANSI padding” for CHAR columns - which means that such columns are always

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
I haven't spent enough time thinking about it to give a strong opinion, but this is of course very different from TRIM. TRIM is a publicly documented function with two arguments, and we silently swapped the two arguments. And trim is also quite commonly used from a long time ago. CHAR is an un

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Hi, Reynold. (And +Michael Armbrust) If you think so, do you think it's okay that we change the return value silently? Then, I'm wondering why we reverted `TRIM` functions then? > Are we sure "not padding" is "incorrect"? Bests, Dongjoon. On Sun, Mar 15, 2020 at 11:15 PM Gourav Sengupta wrote

pyspark(sparksql-v 2.4) cannot read hive table which is created

2020-03-16 Thread dominic kim
I use related spark config value but not works like below(success in spark 2.1.1) : spark.hive.mapred.supports.subdirectories=true spark.hive.supports.subdirectories=true spark.mapred.input.dir.recursive=true spark.hive.mapred.supports.subdirectories=true And when I query, I also use related hive