[DISCUSS] PostgreSQL dialect

2019-11-26 Thread Wenchen Fan
Hi all, Recently we start an effort to achieve feature parity between Spark and PostgreSQL: https://issues.apache.org/jira/browse/SPARK-27764 This goes very well. We've added many missing features(parser rules, built-in functions, etc.) to Spark, and also corrected several inappropriate behaviors

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Sean Owen
Without knowing much about it, I have had the same question. How much is how important about this to justify the effort? One particular negative effect has been that new postgresql tests add well over an hour to tests, IIRC. So, tend to agree about drawing any reasonable line on compatibility and m

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Maciej Szymkiewicz
I think it is important to distinguish between two different concepts: * Adherence to standards and their well established implementations. * Enabling migrations from some product X to Spark. While these two problems are related, there are independent and one can be achieved without the other

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Xiao Li
+1 > One particular negative effect has been that new postgresql tests add well > over an hour to tests, Adding postgresql tests is for improving the test coverage of Spark SQL. We should continue to do this by importing more test cases. The quality of Spark highly depends on the test coverage.

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Gengliang Wang
+1 with the practical proposal. To me, the major concern is that the code base becomes complicated, while the PostgreSQL dialect has very limited features. I tried introducing one big flag `spark.sql.dialect` and isolating related code in #25697 , but it

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Takeshi Yamamuro
Yea, +1, that looks pretty reasonable to me. > Here I'm proposing to hold off the PostgreSQL dialect. Let's remove it from the codebase before it's too late. Curently we only have 3 features under PostgreSQL dialect: I personally think we could at least stop work about the Dialect until 3.0 release

override collect_list

2019-11-26 Thread Ranjan, Abhinav
Hi all, I want to collect some rows in a list by using the spark's collect_list function. However, the no. of rows getting in the list is overflowing the memory. Is there any way to force the collection of rows onto the disk rather than in memory, or else instead of collecting it as a list,