RE: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread assaf.mendelson
I may be mistaken but if I remember correctly spark behaves differently when it is bounded in the past and when it is not. Specifically I seem to recall a fix which made sure that when there is no lower bound then the aggregation is done one by one instead of doing the whole range for each windo

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Koert Kuipers
after seeing Hyukjin Kwon's comment in SPARK-17583 i think its safe to say that what i am seeing with csv is not bug or regression. it was unintended and/or unreliable behavior in spark 2.0.x On Wed, Nov 30, 2016 at 5:56 PM, Koert Kuipers wrote: > running our inhouse unit-tests (that work with s

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Michael Armbrust
Unfortunately the FileFormat APIs are not stable yet, so if you are using spark-avro, we are going to need to update it for this release. On Wed, Nov 30, 2016 at 2:56 PM, Koert Kuipers wrote: > running our inhouse unit-tests (that work with spark 2.0.2) against spark > 2.1.0-rc1 i see the follow

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Koert Kuipers
running our inhouse unit-tests (that work with spark 2.0.2) against spark 2.1.0-rc1 i see the following issues. any test that use avro (spark-avro 3.1.0) have this error: java.lang.AbstractMethodError at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.(File

Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread Reynold Xin
Yes I'd define unboundedPreceding to -sys.maxsize, but also any value less than min(-sys.maxsize, _JAVA_MIN_LONG) are considered unboundedPreceding too. We need to be careful with long overflow when transferring data over to Java. On Wed, Nov 30, 2016 at 10:04 AM, Maciej Szymkiewicz wrote: > It

Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread Maciej Szymkiewicz
It is platform specific so theoretically can be larger, but 2**63 - 1 is a standard on 64 bit platform and 2**31 - 1 on 32bit platform. I can submit a patch but I am not sure how to proceed. Personally I would set unboundedPreceding = -sys.maxsize unboundedFollowing = sys.maxsize to keep backwar

Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread Reynold Xin
Ah ok for some reason when I did the pull request sys.maxsize was much larger than 2^63. Do you want to submit a patch to fix this? On Wed, Nov 30, 2016 at 9:48 AM, Maciej Szymkiewicz wrote: > The problem is that -(1 << 63) is -(sys.maxsize + 1) so the code which > used to work before is off by

Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread Maciej Szymkiewicz
The problem is that -(1 << 63) is -(sys.maxsize + 1) so the code which used to work before is off by one. On 11/30/2016 06:43 PM, Reynold Xin wrote: > Can you give a repro? Anything less than -(1 << 63) is considered > negative infinity (i.e. unbounded preceding). > > On Wed, Nov 30, 2016 at 8:27

Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread Reynold Xin
Can you give a repro? Anything less than -(1 << 63) is considered negative infinity (i.e. unbounded preceding). On Wed, Nov 30, 2016 at 8:27 AM, Maciej Szymkiewicz wrote: > Hi, > > I've been looking at the SPARK-17845 and I am curious if there is any > reason to make it a breaking change. In Spa

[SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-11-30 Thread Maciej Szymkiewicz
Hi, I've been looking at the SPARK-17845 and I am curious if there is any reason to make it a breaking change. In Spark 2.0 and below we could use: Window().partitionBy("foo").orderBy("bar").rowsBetween(-sys.maxsize, sys.maxsize)) In 2.1.0 this code will silently produce incorrect results (R

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Maciej Szymkiewicz
Sorry :) BTW There is another related issue here https://issues.apache.org/jira/browse/SPARK-17756 On 11/30/2016 05:12 PM, Nicholas Chammas wrote: > > -1 (non binding) https://issues.apache.org/jira/browse/SPARK-16589 > No matter how useless in practice this shouldn't go to another major > releas

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Nicholas Chammas
> -1 (non binding) https://issues.apache.org/jira/browse/SPARK-16589 No matter how useless in practice this shouldn't go to another major release. I agree that that issue is a major one since it relates to correctness, but since it's not a regression it technically does not merit a -1 vote on the

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Maciej Szymkiewicz
-1 (non binding) https://issues.apache.org/jira/browse/SPARK-16589 No matter how useless in practice this shouldn't go to another major release. On 11/30/2016 10:34 AM, Sean Owen wrote: > FWIW I am seeing several test failures, each more than once, but, none > are necessarily repeatable. These ar

Re: Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

2016-11-30 Thread WangJianfei
yes, thank you, i know this imp is very simple, but i want to know why spark mllib imp this? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-don-t-we-imp-some-adaptive-learning-rate-methods-such-as-adadelat-adam-tp20057p20060.html Sent from the Apa

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Sean Owen
FWIW I am seeing several test failures, each more than once, but, none are necessarily repeatable. These are likely just flaky tests but I thought I'd flag these unless anyone else sees similar failures: - SELECT a.i, b.i FROM oneToTen a JOIN oneToTen b ON a.i = b.i + 1 *** FAILED *** org.apach

Re: Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

2016-11-30 Thread Nick Pentreath
check out https://github.com/VinceShieh/Spark-AdaOptimizer On Wed, 30 Nov 2016 at 10:52 WangJianfei wrote: > Hi devs: > Normally, the adaptive learning rate methods can have a fast > convergence > then standard SGD, so why don't we imp them? > see the link for more details > http://sebastian

Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

2016-11-30 Thread WangJianfei
Hi devs: Normally, the adaptive learning rate methods can have a fast convergence then standard SGD, so why don't we imp them? see the link for more details http://sebastianruder.com/optimizing-gradient-descent/index.html#adadelta -- View this message in context: http://apache-spark-develo