saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

2018-04-03 Thread cane
Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause data loss. I check the comment of thi api: We should make sure our tasks are idempotent when speculation is enabled, i.e. do * not use output committer that writes data directly. * There is an example in https://

Re: [build system] experiencing network issues, git fetch timeouts likely

2018-04-03 Thread shane knapp
this apparently caused jenkins to get wedged overnight. i'll restarting it now. On Mon, Apr 2, 2018 at 9:12 PM, shane knapp wrote: > the problem was identified and fixed, and we should be good as of about an > hour ago. > > sorry for any inconvenience! > > On Mon, Apr 2, 2018 at 4:15 PM, shane

Re: [build system] experiencing network issues, git fetch timeouts likely

2018-04-03 Thread shane knapp
...and we're back! On Tue, Apr 3, 2018 at 8:10 AM, shane knapp wrote: > this apparently caused jenkins to get wedged overnight. i'll restarting > it now. > > On Mon, Apr 2, 2018 at 9:12 PM, shane knapp wrote: > >> the problem was identified and fixed, and we should be good as of about >> an ho

Clarify window behavior in Spark SQL

2018-04-03 Thread Li Jin
Hi Devs, I am seeing some behavior with window functions that is a bit unintuitive and would like to get some clarification. When using aggregation function with window, the frame boundary seems to change depending on the order of the window. Example: (1) df = spark.createDataFrame([[0, 1], [0,

Re: Hadoop 3 support

2018-04-03 Thread Steve Loughran
On 3 Apr 2018, at 01:30, Saisai Shao mailto:sai.sai.s...@gmail.com>> wrote: Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) doesn't support run on Hadoop 3. Hive will check the Hadoop version in the runtime [1]. Besides this I think some pom changes should be enou

Re: Hadoop 3 support

2018-04-03 Thread Steve Loughran
On 3 Apr 2018, at 01:30, Saisai Shao mailto:sai.sai.s...@gmail.com>> wrote: Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) doesn't support run on Hadoop 3. Hive will check the Hadoop version in the runtime [1]. Besides this I think some pom changes should be enou

Re: saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

2018-04-03 Thread Steve Loughran
> On 3 Apr 2018, at 11:19, cane wrote: > > Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause > data loss. > I check the comment of thi api: > > We should make sure our tasks are idempotent when speculation is enabled, > i.e. do > * not use output committer that w

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Seems like a bug. On Tue, Apr 3, 2018 at 1:26 PM, Li Jin wrote: > Hi Devs, > > I am seeing some behavior with window functions that is a bit unintuitive > and would like to get some clarification. > > When using aggregation function with window, the frame boundary seems to > change depending o

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Li Jin
Here is the original code and comments: https://github.com/apache/spark/commit/b6b50efc854f298d5b3e11c05dca995a85bec962#diff-4a8f00ca33a80744965463dcc6662c75L277 Seems this is intentional. Although I am not really sure why - maybe to match other SQL systems behavior? On Tue, Apr 3, 2018 at 5:09 P

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Do other (non-Hive) SQL systems do the same thing? On Tue, Apr 3, 2018 at 3:16 PM, Herman van Hövell tot Westerflier < her...@databricks.com> wrote: > This is something we inherited from Hive: https://cwiki.apache. > org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics > > When ORDER

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-03 Thread Dilip Biswal
Congrats, Zhenhua!  Very well deserved !!     Regards,Dilip Biswal      - Original message -From: Nick Pentreath To: "wangzhenhua (G)" Cc: Spark dev list Subject: Re: Welcome Zhenhua Wang as a Spark committerDate: Mon, Apr 2, 2018 11:13 PM  Congratulations!  On Tue, 3 Apr 2018 at 05:34 wan

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-04-03 Thread Kimoon Kim
> I'm also wondering if we should support running in other QoS classes - https://kubernetes.io/docs/tasks/configure-pod-container/ quality-service-pod/#qos-classes, like maybe best-effort as well i.e. launching in a configuration that has neither the limit nor the request specified. I haven't seen

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-03 Thread Josh Goldsborough
Congrats Zhenhua! On Tue, Apr 3, 2018 at 5:38 PM, Dilip Biswal wrote: > Congrats, Zhenhua! Very well deserved !! > > > Regards, > Dilip Biswal > > > > > - Original message - > From: Nick Pentreath > To: "wangzhenhua (G)" > Cc: Spark dev list > Subject: Re: Welcome Zhenhua Wang as a S

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-03 Thread Bhupendra Mishra
Welcome and congratulation Zhenhua. Cheers On Mon, Apr 2, 2018 at 10:58 AM, Wenchen Fan wrote: > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of Spark f

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Xingbo Jiang
This is actually by design, without a `ORDER BY` clause, all rows are considered as the peer row of the current row, which means that the frame is effectively the entire partition. This behavior follows the window syntax of PGSQL. You can refer to the comment by yhuai: https://github.com/apache/spa

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Ah ok. Thanks for commenting. Everyday I learn something new about SQL. For others to follow, SQL Server has a good explanation of the behavior: https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause- transact-sql Can somebody (Li?) update the API documentation to specify the gotc

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Li Jin
Thanks all for the explanation. I am happy to update the API doc. https://issues.apache.org/jira/browse/SPARK-23861 On Tue, Apr 3, 2018 at 8:54 PM, Reynold Xin wrote: > Ah ok. Thanks for commenting. Everyday I learn something new about SQL. > > For others to follow, SQL Server has a good explan

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Thanks Li! On Tue, Apr 3, 2018 at 7:23 PM Li Jin wrote: > Thanks all for the explanation. I am happy to update the API doc. > > https://issues.apache.org/jira/browse/SPARK-23861 > > On Tue, Apr 3, 2018 at 8:54 PM, Reynold Xin wrote: > >> Ah ok. Thanks for commenting. Everyday I learn something