Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Marco Gaido
Thanks for this SPIP. I cannot comment on the docs, but just wanted to highlight one thing. In page 5 of the SPIP, when we talk about DRA, I see: "For instance, if each executor consists 4 CPUs and 2 GPUs, and each task requires 1 CPU and 1GPU, then we shall throw an error on application start bec

Re: [Spark SQL]: looking for place operators apply on the dataset / dataframe

2019-03-28 Thread Marco Gaido
Hi, you can check your execution plan and you can find from there which *Exec classes are used. Please notice that in case of wholeStageCodegen, its children operators are executed inside the wholeStageCodegenExec. Bests, Marco Il giorno gio 28 mar 2019 alle ore 15:21 ehsan shams < ehsan.shams.r

Re: Exposing JIRA issue types at GitHub PRs

2019-06-12 Thread Marco Gaido
Hi Dongjoon, Thanks for the proposal! I like the idea. Maybe we can extend it to component too and to some jira labels such as correctness which may be worth to highlight in PRs too. My only concern is that in many cases JIRAs are created not very carefully so they may be incorrect at the moment of

Re: Opinions wanted: how much to match PostgreSQL semantics?

2019-07-08 Thread Marco Gaido
Hi Sean, Thanks for bringing this up. Honestly, my opinion is that Spark should be fully ANSI SQL compliant. Where ANSI SQL compliance is not an issue, I am fine following any other DB. IMHO, we won't get anyway 100% compliance with any DB - postgres in this case (e.g. for decimal operations, we a

Re: Unmarking most things as experimental, evolving for 3.0?

2019-08-22 Thread Marco Gaido
Thanks for bringing this out Sean. +1 from me as well! Thanks, Marco Il giorno gio 22 ago 2019 alle ore 08:21 Dongjoon Hyun < dongjoon.h...@gmail.com> ha scritto: > +1 for unmarking old ones (made in `2.3.x` and before). > Thank you, Sean. > > Bests, > Dongjoon. > > On Wed, Aug 21, 2019 at 6:46

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marco Gaido
+1 Il giorno mer 28 ago 2019 alle ore 06:31 Wenchen Fan ha scritto: > +1 > > On Wed, Aug 28, 2019 at 2:43 AM DB Tsai wrote: > >> +1 >> >> Sincerely, >> >> DB Tsai >> -- >> Web: https://www.dbtsai.com >> PGP Key ID: 42E5B25A8F7A82C1 >> >> O

Re: Documentation on org.apache.spark.sql.functions backend.

2019-09-16 Thread Marco Gaido
Hi Vipul, a function is never turned in a logical plan. A function is turned into an Expression. And an Expression can be part of many Logical or Physical Plans. Hope this helps. Thanks, Marco Il giorno lun 16 set 2019 alle ore 08:27 Vipul Rajan ha scritto: > I am trying to create a function t

Re: Documentation on org.apache.spark.sql.functions backend.

2019-09-16 Thread Marco Gaido
nullSafeEval, > doGenCode. Aren't there any architectural docs that could help with what is > exactly happening? Reverse engineering seems a bit daunting. > > Regards > > On Mon, Sep 16, 2019 at 1:36 PM Marco Gaido > wrote: > >> Hi Vipul, >> >> a function i

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Marco Gaido
I agree with Matei too. Thanks, Marco Il giorno dom 22 set 2019 alle ore 03:44 Dongjoon Hyun < dongjoon.h...@gmail.com> ha scritto: > +1 for Matei's suggestion! > > Bests, > Dongjoon. > > On Sat, Sep 21, 2019 at 5:44 PM Matei Zaharia > wrote: > >> If the goal is to get people to try the DSv2 AP

[ML] Migrating transformers from mllib to ml

2017-11-06 Thread Marco Gaido
Hello, I saw that there are several TODOs to migrate some transformers (like HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of converting them to the mllib ones and back. Is there any reason why this has not been done so far? Is it to avoid code duplication? If so, is it

Re: Timeline for Spark 2.3

2017-11-10 Thread Marco Gaido
I would love too to have SPARK-18016. I think it would help a lot of users. 2017-11-10 5:58 GMT+01:00 Nick Pentreath : > +1 I think that’s practical > > On Fri, 10 Nov 2017 at 03:13, Erik Erlandson wrote: > >> +1 on extending the deadline. It will significantly improve the logistics >> for upstr

Re: Some Spark MLLIB tests failing due to some classes not being registered with Kryo

2017-11-11 Thread Marco Gaido
from the mllib folder. > > Thank you. > > 2017-11-11 12:36 GMT+00:00 Marco Gaido : > >> Hi Jorge, >> >> how are you running those tests? >> >> Thanks, >> Marco >> >> 2017-11-11 13:21 GMT+01:00 Jorge Sánchez : >> >&g

Decimals

2017-12-12 Thread Marco Gaido
Hi all, I saw in these weeks that there are a lot of problems related to decimal values (SPARK-22036, SPARK-22755, for instance). Some are related to historical choices, which I don't know, thus please excuse me if I am saying dumb things: - why are we interpreting literal constants in queries a

Re: Decimals

2017-12-19 Thread Marco Gaido
il, Best regards. Marco 2017-12-13 9:08 GMT+01:00 Reynold Xin : > Responses inline > > On Tue, Dec 12, 2017 at 2:54 AM, Marco Gaido > wrote: > >> Hi all, >> >> I saw in these weeks that there are a lot of problems related to decimal >> values (SPARK-22036, SPA

R: Decimals

2017-12-21 Thread Marco Gaido
w (as Hermann was suggesting in the PR). Do we agree on this way? If so, is there any way to read a configuration property in the catalyst project? Thank you, Marco - Messaggio originale - Da: "Xiao Li" Inviato: ‎21/‎12/‎2017 22:46 A: "Marco Gaido" Cc: "Reynol

Re: Decimals

2017-12-22 Thread Marco Gaido
ks, Marco 2017-12-22 3:58 GMT+01:00 Marco Gaido : > Thanks for your answer Xiao. The point is that behaving like this is > against SQL standard and is different also from Hive's behavior. Then I > would propose to add a configuration flag to switch between the two > behaviors, eit

Join Strategies

2018-01-13 Thread Marco Gaido
Hi dev, I have a question about how join strategies are defined. I see that CartesianProductExec is used only for InnerJoin, while for other kind of joins BroadcastNestedLoopJoinExec is used. For reference: https://github.com/apache/spark/blob/cd9f49a2aed3799964976ead06080a0f7044a0c3/sql/core/src

Re: Failing Spark Unit Tests

2018-01-23 Thread Marco Gaido
I tried doing a change for it, but I was unable to reproduce. Anyway, I am seeing some unrelated errors in other PRs too, so there might be (or might have been) something wrong at some point. But I'd expect the test to pass locally anyway. 2018-01-23 15:23 GMT+01:00 Sean Owen : > That's odd. The

BroadcastHashJoinExec cleanup

2018-01-29 Thread Marco Gaido
Hello, looking at BroadcastHashJoinExec, it seems to me that it never destroys the broadcasted variables. And I think this can cause problems like SPARK-22575. Anyway, when I tried to add a "cleanup" to destroy the variable, I saw some test failure because it was trying to access a the destroyed

Re: There is no space for new record

2018-02-13 Thread Marco Gaido
You can check all the versions where the fix is available on the JIRA SPARK-23376. Anyway it will be available in the upcoming 2.3.0 release. Thanks. On 13 Feb 2018 9:09 a.m., "SNEHASISH DUTTA" wrote: > Hi, > > In which version of Spark will this fix be available ? > The deployment is on EMR >

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marco Gaido
+1 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon : > +1 too > > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN : > >> +1 >> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang >> wrote: >> >>> +1 >>> >>> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道: >>> +1 On Tue, Feb 20, 2018 at 12:53 PM, Reynold

Re: Welcoming some new committers

2018-03-03 Thread Marco Gaido
Congratulations to you all! On 3 Mar 2018 8:30 a.m., "Liang-Chi Hsieh" wrote: > > Congrats to everyone! > > > Kazuaki Ishizaki wrote > > Congratulations to everyone! > > > > Kazuaki Ishizaki > > > > > > > > From: Takeshi Yamamuro < > > > linguin.m.s@ > > > > > > To: Spark dev list < > > >

Re: Contributing to Spark

2018-03-12 Thread Marco Gaido
Hi Roman, welcome to the community. Actually, this is not how it works. If you want to contribute to Spark you can just look for open JIRAs and submit a PR for that. JIRAs are assigned by committers once the PR gets merged. If you want, you can eventually comment on the JIRA that you are working o

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Marco Gaido
Congrats Zhenhua! 2018-04-02 11:00 GMT+02:00 Saisai Shao : > Congrats, Zhenhua! > > 2018-04-02 16:57 GMT+08:00 Takeshi Yamamuro : > >> Congrats, Zhenhua! >> >> On Mon, Apr 2, 2018 at 4:13 PM, Ted Yu wrote: >> >>> Congratulations, Zhenhua >>> >>> Original message >>> From: 雨中漫步

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marco Gaido
Hi all, I also agree with Mark that we should add Java 9/10 support to an eventual Spark 3.0 release, because supporting Java 9 is not a trivial task since we are using some internal APIs for the memory management which changed: either we find a solution which works on both (but I am not sure it i

Re: Accessing Hive Tables in Spark

2018-04-09 Thread Marco Gaido
Hi Tushar, It seems Spark is not able to access the metastore. It may be because you are using derby metastases which is maintained locally. Please check all your configurations and that Spark has access to the hive-site.xml file with the metastore uri. Thanks, Marco On Tue, 10 Apr 2018, 08:20 T

Re: Block Missing Exception while connecting Spark with HDP

2018-04-24 Thread Marco Gaido
Hi Jasbir, As a first note, please if you are using a vendor distribution, please contact the vendor for any issue you are facing. This mailing list is for the community so we focus on the community edition of Spark. Anyway, the error seems to be quite clear: your file on HDFS has a missing block

Transform plan with scope

2018-04-24 Thread Marco Gaido
Hi all, working on SPARK-24051 I realized that currently in the Optimizer and in all the places where we are transforming a query plan, we are lacking the context information of what is in scope and what is not. Coming back to the ticket, the bug reported in the ticket is caused mainly by two rea

Re: Transform plan with scope

2018-04-24 Thread Marco Gaido
ause we have no concept of scope? It's already possible for a plan rule >> to traverse each node's subtree if it wants. >> >> On Tue, Apr 24, 2018 at 10:18 AM, Marco Gaido >> wrote: >> >>> Hi all, >>> >>> working on SPARK-24051 I re

Re: eager execution and debuggability

2018-05-08 Thread Marco Gaido
I am not sure how this is useful. For students, it is important to understand how Spark works. This can be critical in many decision they have to take (whether and what to cache for instance) in order to have performant Spark application. Creating a eager execution probably can help them having som

Re: parser error?

2018-05-14 Thread Marco Gaido
Yes Takeshi, I agree, I think we can easily fix the warning replacing the * with +, since the two options are not required. I will test this fix and create a PR when it is ready. Thanks, Marco 2018-05-14 15:08 GMT+02:00 Takeshi Yamamuro : > IIUC, since the `lateral View*` matches an empty string

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Marco Gaido
I'd be against having a new feature in a minor maintenance release. I think such a release should contain only bugfixes. 2018-05-16 12:11 GMT+02:00 kant kodali : > Can this https://issues.apache.org/jira/browse/SPARK-23406 be part of > 2.3.1? > > On Tue, May 15, 2018 at 2:07 PM, Marcelo Vanzin >

Re: Time for 2.1.3

2018-06-13 Thread Marco Gaido
Hi Marcelo, thanks for bringing this up. Maybe we should consider to include SPARK-24495, as it is causing some queries to return an incorrect result. What do you think? Thanks, Marco 2018-06-13 1:27 GMT+02:00 Marcelo Vanzin : > Hey all, > > There are some fixes that went into 2.1.3 recently th

Re: Time for 2.1.3

2018-06-13 Thread Marco Gaido
Yes, you're right Herman. Sorry, my bad. Thanks. Marco 2018-06-13 14:01 GMT+02:00 Herman van Hövell tot Westerflier < her...@databricks.com>: > Isn't this only a problem with Spark 2.3.x? > > On Wed, Jun 13, 2018 at 1:57 PM Marco Gaido > wrote: > >> Hi Ma

Re: Spark issue 20236 - overwrite a partitioned data srouce

2018-06-14 Thread Marco Gaido
Hi Alessandro, I'd recommend you to check the UTs added in the commit which solved the issue (ie. https://github.com/apache/spark/commit/a66fe36cee9363b01ee70e469f1c968f633c5713). You can use them to try and reproduce the issue. Thanks, Marco 2018-06-14 15:57 GMT+02:00 Alessandro Liparoti : >

Re: Time for 2.3.2?

2018-06-28 Thread Marco Gaido
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely... 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1 >> >> Wenchen Fan 于2018年6月28日 周四下

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
The easy answer to this is that SortMergeJoin ensure an outputOrdering, while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you don't know which is going to be the order of the output since nothing enforces it. Hope this helps. Thanks. Marco 2018-06-28 15:46 GMT+02:00 吴晓菊 : >

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
t;> >>> Why we cannot use the output order of big table? >>> >>> >>> Chrysan Wu >>> Phone:+86 17717640807 >>> >>> >>> 2018-06-28 21:48 GMT+08:00 Marco Gaido : >>> >>>> The easy answer to this is that S

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-29 Thread Marco Gaido
n Fan : >> >>> SortMergeJoin sorts its children by join key, but broadcast join does >>> not. I think the output ordering of broadcast join has nothing to do with >>> join key. >>> >>> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido >>> wr

Re: [SPARK][SQL][CORE] Running sql-tests

2018-07-03 Thread Marco Gaido
Hi Daniel, please check sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala . You should find all your answers in the comments there. Thanks, Marco 2018-07-03 19:08 GMT+02:00 dmateusp : > Hey

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-15 Thread Marco Gaido
+1, this was indeed a problem in the past. On Sun, 15 Jul 2018, 22:56 Reynold Xin, wrote: > Makes sense. Thanks for looking into this. > > On Sun, Jul 15, 2018 at 1:51 PM Sean Owen wrote: > >> Yesterday I cleaned out old Spark releases from the mirror system -- >> we're supposed to only keep th

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-16 Thread Marco Gaido
+1 too On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, wrote: > +1 > > 2018년 7월 17일 (화) 오전 7:34, Sean Owen 님이 작성: > >> Fix is committed to branches back through 2.2.x, where this test was >> added. >> >> There is still some issue; I'm seeing that archive.apache.org is >> rate-limiting downloads and fre

Re: [VOTE] SPIP: Standardize SQL logical plans

2018-07-17 Thread Marco Gaido
+1 (non-binding) On Wed, 18 Jul 2018, 07:43 Takeshi Yamamuro, wrote: > +1 (non-binding) > > On Wed, Jul 18, 2018 at 2:41 PM John Zhuge wrote: > >> +1 (non-binding) >> >> On Tue, Jul 17, 2018 at 8:06 PM Wenchen Fan wrote: >> >>> +1 (binding). I think this is more clear to both users and develop

Re: [DISCUSS] Adaptive execution in Spark SQL

2018-07-31 Thread Marco Gaido
Hi all, I also like this idea very much and I think it may bring also other performance improvements in the future. Thanks to everybody who worked on this. I agree to target this feature for 3.0. Thanks everybody, Bests. Marco On Tue, 31 Jul 2018, 08:39 Wenchen Fan, wrote: > Hi Carson and Yu

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Marco Gaido
Hi Wenchen, I think it would be great to consider also - SPARK-24598 : Datatype overflow conditions gives incorrect result As it is a correctness bug. What do you think? Thanks, Marco 2018-07-31 4:01 GMT+02:00 Wenchen Fan : > I went through t

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-10 Thread Marco Gaido
Hi Makatun, I think your problem has been solved in https://issues.apache.org/jira/browse/SPARK-16406 which is going to be in Spark 2.4. Please try on the current master, where you should see the problem disappeared. Thanks, Marco 2018-08-09 12:56 GMT+02:00 makatun : > Here are the images missi

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Marco Gaido
-1, due to SPARK-25051. It is a regression and it is a correctness bug. In 2.3.0/2.3.1 an Analysis exception was thrown, 2.2.* works fine. I cannot reproduce the issue on current master, but I was able using the prepared 2.3.2 release. Il giorno mar 14 ago 2018 alle ore 10:04 Saisai Shao ha scri

Re: sql compile failing with Zinc?

2018-08-14 Thread Marco Gaido
I am not sure, I managed to build successfully using the mvn in the distribution today. Il giorno mar 14 ago 2018 alle ore 22:02 Sean Owen ha scritto: > If you're running zinc directly, you can give it more memory with -J-Xmx2g > or whatever. If you're running ./build/mvn and letting it run zinc

Re: Persisting driver logs in yarn client mode (SPARK-25118)

2018-08-22 Thread Marco Gaido
I agree with Saisai. You can also configure log4j to append anywhere else other than the console. Many companies have their system for collecting and monitoring logs and they just customize the log4j configuration. I am not sure how needed this change would be. Thanks, Marco Il giorno mer 22 ago

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-18 Thread Marco Gaido
Sorry but I am -1 because of what was reported here: https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104 . It is a regression unfortunately. Despite the impact is not huge and there are

***UNCHECKED*** Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-18 Thread Marco Gaido
Sorry, I am -1 because of SPARK-25454 which is a regression from 2.2. Il giorno mer 19 set 2018 alle ore 03:45 Dongjoon Hyun < dongjoon.h...@gmail.com> ha scritto: > +1. > > I tested with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive > -Phive-thriftserve` on OpenJDK(1.8.0_181)/CentOS 7.5. > > I hit t

***UNCHECKED*** Re: Re: Re: Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-19 Thread Marco Gaido
Marco, >> >> From my understanding of SPARK-25454, I don't think it is a block issue, >> it might be an corner case, so personally I don't want to block the release >> of 2.3.2 because of this issue. The release has been delayed for a long >> time. >> &g

SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
Hi all, I am writing this e-mail in order to discuss the issue which is reported in SPARK-25454 and according to Wenchen's suggestion I prepared a design doc for it. The problem we are facing here is that our rules for decimals operations are taken from Hive and MS SQL server and they explicitly

Re: SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
ending it! The problem is clearly explained in this email, but > I would not treat it as a SPIP. It proposes a fix for a very tricky bug, > and SPIP is usually for new features. Others please correct me if I was > wrong. > > Thanks, > Wenchen > > On Fri, Sep 21, 2018 at 5:47 PM

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Marco Gaido
-1, I was able to reproduce SPARK-25538 with the provided data. Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha scritto: > +1 > > Original message > From: Denny Lee > Date: 9/30/18 10:30 PM (GMT-08:00) > To: Stavros Kontopoulos > Cc: Sean Owen , Wenchen Fan , dev < > dev@sp

Re: welcome a new batch of committers

2018-10-03 Thread Marco Gaido
Congrats you all! Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh ha scritto: > > Congratulations to all new committers! > > > rxin wrote > > Hi all, > > > > The Apache Spark PMC has recently voted to add several new committers to > > the project, for their contributions: > > > > - Shane

Re: Random sampling in tests

2018-10-08 Thread Marco Gaido
Hi all, thanks for bringing up the topic Sean. I agree too with Reynold's idea, but in the specific case, if there is an error the timezone is part of the error message. So we know exactly which timezone caused the failure. Hence I thought that logging the seed is not necessary, as we can directly

Re: Random sampling in tests

2018-10-08 Thread Marco Gaido
t is much easier to just follow some > documentation saying "please run TEST_SEED=5 build/sbt ~ ". > > > On Mon, Oct 8, 2018 at 4:33 PM Marco Gaido wrote: > >> Hi all, >> >> thanks for bringing up the topic Sean. I agree too with Reynold's id

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread Marco Gaido
Hi all, I think a very big topic on this would be: what do we want to do with the old mllib API? For long I have been told that it was going to be removed on 3.0. Is this still the plan? Thanks, Marco Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin ha scritto: > Might be good to take a

[DISCUSS] Support decimals with negative scale in decimal operation

2018-10-25 Thread Marco Gaido
Hi all, a bit more than one month ago, I sent a proposal for handling properly decimals with negative scales in our operations. This is a long standing problem in our codebase as we derived our rules from Hive and SQLServer where negative scales are forbidden, while in Spark they are not. The dis

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Marco Gaido
Hi all, one big problem about getting rid of the Hive fork is the thriftserver, which relies on the HiveServer from the Hive fork. We might migrate to an apache/hive dependency, but not sure this would help that much. I think a broader topic would be the actual opportunity of having a thriftserver

Re: Is spark.sql.codegen.factoryMode property really for tests only?

2018-11-16 Thread Marco Gaido
Hi Jacek, I do believe it is correct. Please check the method you mentioned (CodeGeneratorWithInterpretedFallback.createObject): the value is relevant only if Utils.isTesting. Thanks, Marco Il giorno ven 16 nov 2018 alle ore 13:28 Jacek Laskowski ha scritto: > Hi, > > While reviewing the chang

Jenkins down?

2018-11-19 Thread Marco Gaido
Hi all, I see that Jenkins is not starting builds for the PRs today. Is it in maintenance? Thanks, Marco

Re: Jenkins down?

2018-11-19 Thread Marco Gaido
sterday: >>> https://github.com/apache/spark/commits/master >>> That might also be a factor in whatever you're observing. >>> On Mon, Nov 19, 2018 at 10:53 AM Marco Gaido >>> wrote: >>> > >>

Self join

2018-12-11 Thread Marco Gaido
Hi all, I'd like to bring to the attention of a more people a problem which has been there for long, ie, self joins. Currently, we have many troubles with them. This has been reported several times to the community and seems to affect many people, but as of now no solution has been accepted for it

Re: Self join

2018-12-12 Thread Marco Gaido
our exact underlying business problem, but maybe a graph > solution, such as Spark Graphx meets better your requirements. Usually > self-joins are done to address some kind of graph problem (even if you > would not describe it as such) and is for these kind of problems much more > efficient.

Re: Self join

2018-12-13 Thread Marco Gaido
ely to generate discussion than referring to > PRs or a quick paragraph on the dev list, because the only people that are > looking at it now are the ones already familiar with the problem. > > rb > > On Wed, Dec 12, 2018 at 2:05 AM Marco Gaido > wrote: > >> Thank you a

Decimals with negative scale

2018-12-18 Thread Marco Gaido
Hi all, as you may remember, there was a design doc to support operations involving decimals with negative scales. After the discussion in the design doc, now the related PR is blocked because for 3.0 we have another option which we can explore, ie. forbidding negative scales. This is probably a c

Re: Decimals with negative scale

2018-12-18 Thread Marco Gaido
This is at analysis time. On Tue, 18 Dec 2018, 17:32 Reynold Xin Is this an analysis time thing or a runtime thing? > > On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido > wrote: > >> Hi all, >> >> as you may remember, there was a design doc to support operations >&

Re: Decimals with negative scale

2018-12-19 Thread Marco Gaido
port > negative scale, if it is not supported? This way, we don't need to break > backward compatibility in anyway and it becomes a strict improvement. > > > On Tue, Dec 18, 2018 at 8:43 AM, Marco Gaido > wrote: > >> This is at analysis time. >> >> On Tue

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Marco Gaido
, and the result type of decimal operations, and the > behavior when writing out decimals(e.g. we can cast decimal(1, -20) to > decimal(20, 0) before writing). > > Another question is, shall we set a min scale? e.g. shall we allow > decimal(1, -1000)? > > On Thu, Oct 25, 2018 at 9

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Marco Gaido
or when writing negative-scale decimals to parquet and other data > sources. The most straightforward way is to fail for this case, but maybe > we can do something better, like casting decimal(1, -20) to decimal(20, 0) > before writing. > > On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido w

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido
gt; I'm OK with it, i.e. fail the write if there are negative-scale decimals >> (we need to document it though). We can improve it later in data source v2. >> >> On Mon, Jan 7, 2019 at 10:09 PM Marco Gaido >> wrote: >> >>> In general we can say that some

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido
d testing efforts for organizations > running Spark application becomes too large. Of course the current decimal > will be kept as it is. > > Am 07.01.2019 um 15:08 schrieb Marco Gaido : > > In general we can say that some datasources allow them, others fail. At > the moment,

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread Marco Gaido
Congrats, Jose! Bests, Marco Il giorno mer 30 gen 2019 alle ore 03:17 JackyLee ha scritto: > Congrats, Joe! > > Best, > Jacky > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubsc

Re: Self join

2019-01-30 Thread Marco Gaido
Marco. I thought you were trying to propose > a solution. > > On Thu, Dec 13, 2018 at 2:45 AM Marco Gaido > wrote: > >> Hi Ryan, >> >> My goal with this email thread is to discuss with the community if there >> are better ideas (as I was told many other people tr

Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-06 Thread Marco Gaido
+1 from me as well. Il giorno mer 6 feb 2019 alle ore 16:58 Yanbo Liang ha scritto: > +1 for the proposal > > > > On Thu, Jan 31, 2019 at 12:46 PM Mingjie Tang wrote: > >> +1, this is a very very important feature. >> >> Mingjie >> >> On Thu, Jan 31, 2019 at 12:42 AM Xiao Li wrote: >> >>> Chan

Re: I want to contribute to Apache Spark.

2019-02-13 Thread Marco Gaido
Hi, You need no permissions to start contributing to Spark. Just start working on the JIRAs you want and submit a PR for them. You will be added to the contributors in JIRA once your PR gets merged and you are assigned the related JIRA. For more information, please refer to the contributing page o

Re: SparkThriftServer Authorization design

2019-02-16 Thread Marco Gaido
Is this a feature request or a proposal? If it is the latter, may you please provide a design doc, so the community can look at it? Otherwise I think one of the main issues with authorization in STS is that all the queries are actually run inside the same spark job and hence with the same user. Th

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Marco Gaido
+1, a critical feature for AI/DL! Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu < weichen...@databricks.com> ha scritto: > +1, nice feature! > > On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > >> +1 >> >> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves >> wrote: >> >>> +1 for the SPIP. >>> >>>