Re: A question about radd bytes size

2019-12-01 Thread Wenchen Fan
When we talk about bytes size, we need to specify how the data is stored. For example, if we cache the dataframe, then the bytes size is the number of bytes of the binary format of the table cache. If we write to hive tables, then the bytes size is the total size of the data files of the table. On

Subscribe

2019-12-01 Thread CharSyam

Spark 2.4.5 release?

2019-12-01 Thread jm
Hi all, Is there any desire to prepare a 2.4.5 release? It’s been 3 months since 2.4.4 was released and there have been quite a few bug fixes since then (the k8s client upgrade is the one I'm interested in hence the question). Cheers! Jason. -- Sent from: http://apache-spark-developers-list.1

A question about radd bytes size

2019-12-01 Thread zhangliyun
Hi: I want to get the total bytes of a DataFrame by following function , but when I insert the DataFrame into hive , I found the value of the function is different from spark.sql.statistics.totalSize . The spark.sql.statistics.totalSize is less than the result of following function getRDDB

[DISCUSS] Consistent relation resolution behavior in SparkSQL

2019-12-01 Thread Terry Kim
Hi all, As discussed in SPARK-29900, Spark currently has two different relation resolution behaviors: 1. Look up temp view first, then table/persistent view 2. Look up table/persistent view The first behavior is used in SELECT, INSERT and a few commands that support temp views such as DESC

Status of Scala 2.13 support

2019-12-01 Thread Sean Owen
As you can see, I've been working on Scala 2.13 support. The umbrella is https://issues.apache.org/jira/browse/SPARK-25075 I wanted to lay out status and strategy. This will not be done for 3.0. At the least, there are a few key dependencies (Chill, Kafka) that aren't published for 2.13, and at le

Re: [DISCUSS] PostgreSQL dialect

2019-12-01 Thread Driesprong, Fokko
+1 (non-binding) Cheers, Fokko Op do 28 nov. 2019 om 03:47 schreef Dongjoon Hyun : > +1 > > Bests, > Dongjoon. > > On Tue, Nov 26, 2019 at 3:52 PM Takeshi Yamamuro > wrote: > >> Yea, +1, that looks pretty reasonable to me. >> > Here I'm proposing to hold off the PostgreSQL dialect. Let's remove

Re: override collect_list

2019-12-01 Thread Driesprong, Fokko
Hi Abhnav, this sounds to me like a bad design, since it isn't scalable. Would it be possible to store all the data in a database like hbase/bigtable/cassandra? This would allow you to write the data from all the workers in parallel to the database/ Cheers, Fokko Op wo 27 nov. 2019 om 06:58 schr