Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Marco Gaido
I agree with Matei too. Thanks, Marco Il giorno dom 22 set 2019 alle ore 03:44 Dongjoon Hyun < dongjoon.h...@gmail.com> ha scritto: > +1 for Matei's suggestion! > > Bests, > Dongjoon. > > On Sat, Sep 21, 2019 at 5:44 PM Matei Zaharia > wrote: > >> If

Re: Documentation on org.apache.spark.sql.functions backend.

2019-09-16 Thread Marco Gaido
Hi Vipul, I am afraid I cannot help you on that. Thanks, Marco Il giorno lun 16 set 2019 alle ore 10:44 Vipul Rajan ha scritto: > Hi Marco, > > That does help. Thanks, for taking the time. I am confused as to how that > Expression is created. There are methods like eval,

Re: Documentation on org.apache.spark.sql.functions backend.

2019-09-16 Thread Marco Gaido
Hi Vipul, a function is never turned in a logical plan. A function is turned into an Expression. And an Expression can be part of many Logical or Physical Plans. Hope this helps. Thanks, Marco Il giorno lun 16 set 2019 alle ore 08:27 Vipul Rajan ha scritto: > I am trying to create a funct

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marco Gaido
+1 Il giorno mer 28 ago 2019 alle ore 06:31 Wenchen Fan ha scritto: > +1 > > On Wed, Aug 28, 2019 at 2:43 AM DB Tsai wrote: > >> +1 >> >> Sincerely, >> >> DB Tsai >> -- >> Web: https://www.dbtsai.com >> PGP Key ID: 42E5B25A8F7A82C1 >> >> O

Re: Unmarking most things as experimental, evolving for 3.0?

2019-08-22 Thread Marco Gaido
Thanks for bringing this out Sean. +1 from me as well! Thanks, Marco Il giorno gio 22 ago 2019 alle ore 08:21 Dongjoon Hyun < dongjoon.h...@gmail.com> ha scritto: > +1 for unmarking old ones (made in `2.3.x` and before). > Thank you, Sean. > > Bests, > Dongjoon. > > O

Re: Opinions wanted: how much to match PostgreSQL semantics?

2019-07-08 Thread Marco Gaido
s, we are following SQLServer, and postgres behaviour would be very hard to meet) - so I think it is fine that PMC members decide for each feature whether it is worth to support it or not. Thanks, Marco On Mon, 8 Jul 2019, 20:09 Sean Owen, wrote: > See the particular issue / question at

Re: Exposing JIRA issue types at GitHub PRs

2019-06-12 Thread Marco Gaido
Hi Dongjoon, Thanks for the proposal! I like the idea. Maybe we can extend it to component too and to some jira labels such as correctness which may be worth to highlight in PRs too. My only concern is that in many cases JIRAs are created not very carefully so they may be incorrect at the moment of

Re: [Spark SQL]: looking for place operators apply on the dataset / dataframe

2019-03-28 Thread Marco Gaido
Hi, you can check your execution plan and you can find from there which *Exec classes are used. Please notice that in case of wholeStageCodegen, its children operators are executed inside the wholeStageCodegenExec. Bests, Marco Il giorno gio 28 mar 2019 alle ore 15:21 ehsan shams

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Marco Gaido
eduling on task level out of scope for the moment, right? Thanks, Marco Il giorno gio 21 mar 2019 alle ore 01:26 Xiangrui Meng ha scritto: > Steve, the initial work would focus on GPUs, but we will keep the > interfaces general to support other accelerators in the future. This was > m

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Marco Gaido
+1, a critical feature for AI/DL! Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu < weichen...@databricks.com> ha scritto: > +1, nice feature! > > On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > >> +1 >> >> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves >> wrote: >> >>> +1 for the SPIP. >>> >>>

Re: SparkThriftServer Authorization design

2019-02-16 Thread Marco Gaido
. There are other projects trying to address those limitations. One of them, for instance, is Livy, where a Thrift server has been recently introduced in order to overcome some of STS's limitations. So you might probably want to look at it. Thanks, Marco Il giorno sab 16 feb 2019 alle ore 01:

Re: I want to contribute to Apache Spark.

2019-02-13 Thread Marco Gaido
on the website. Thanks, Looking forward to see your PRs. Marco On Thu, 14 Feb 2019, 06:32 wangfei > Hi Guys, > > I want to contribute to Apache Spark. > Would you please give me the permission as a contributor? > My JIRA ID is feiwang. > hzfeiwang > hzfeiw...@163.com >

Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-06 Thread Marco Gaido
+1 from me as well. Il giorno mer 6 feb 2019 alle ore 16:58 Yanbo Liang ha scritto: > +1 for the proposal > > > > On Thu, Jan 31, 2019 at 12:46 PM Mingjie Tang wrote: > >> +1, this is a very very important feature. >> >> Mingjie >> >> On Thu, Jan 31, 2019 at 12:42 AM Xiao Li wrote: >> >>> Chan

Re: Self join

2019-01-30 Thread Marco Gaido
Hi all, this thread got a bit stuck. Hence, if there are no objections, I'd go ahead with a design doc describing the solution/workaround I mentioned before. Any concerns? Thanks, Marco Il giorno gio 13 dic 2018 alle ore 18:15 Ryan Blue ha scritto: > Thanks for the extra context,

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread Marco Gaido
Congrats, Jose! Bests, Marco Il giorno mer 30 gen 2019 alle ore 03:17 JackyLee ha scritto: > Congrats, Joe! > > Best, > Jacky > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > -

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido
d testing efforts for organizations > running Spark application becomes too large. Of course the current decimal > will be kept as it is. > > Am 07.01.2019 um 15:08 schrieb Marco Gaido : > > In general we can say that some datasources allow them, others fail. At > the moment,

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido
gt; I'm OK with it, i.e. fail the write if there are negative-scale decimals >> (we need to document it though). We can improve it later in data source v2. >> >> On Mon, Jan 7, 2019 at 10:09 PM Marco Gaido >> wrote: >> >>> In general we can say that some

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Marco Gaido
or when writing negative-scale decimals to parquet and other data > sources. The most straightforward way is to fail for this case, but maybe > we can do something better, like casting decimal(1, -20) to decimal(20, 0) > before writing. > > On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido w

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-07 Thread Marco Gaido
set a min for it. That would break backward compatibility (for very weird use case), so I wouldn't do that. Thanks, Marco Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan ha scritto: > I think we need to do this for backward compatibility, and according to > the discussion in t

[no subject]

2019-01-03 Thread marco rocchi
Unsubscribe me, please. Thank you so much

Re: Decimals with negative scale

2018-12-19 Thread Marco Gaido
r of cases when we deal with negative scales in anyway small (and we do not have issues with datasources which don't support them). Thanks, Marco Il giorno mar 18 dic 2018 alle ore 19:08 Reynold Xin ha scritto: > So why can't we just do validation to fail sources that don't sup

Re: Decimals with negative scale

2018-12-18 Thread Marco Gaido
This is at analysis time. On Tue, 18 Dec 2018, 17:32 Reynold Xin Is this an analysis time thing or a runtime thing? > > On Tue, Dec 18, 2018 at 7:45 AM Marco Gaido > wrote: > >> Hi all, >> >> as you may remember, there was a design doc to support operations >&

Decimals with negative scale

2018-12-18 Thread Marco Gaido
ive scales can cause issues in other moments, eg. when saving to a data source which doesn't support them. Looking forward to hear your thoughts, Thanks. Marco

Re: Self join

2018-12-13 Thread Marco Gaido
Hi Ryan, My goal with this email thread is to discuss with the community if there are better ideas (as I was told many other people tried to address this). I'd consider this as a brainstorming email thread. Once we have a good proposal, then we can go ahead with a SPIP. Thanks, Marco Il g

Re: Self join

2018-12-12 Thread Marco Gaido
irst attribute is taken from `df1` and so it has to be resolved using it and the same for the other. But I am open to any approach to this problem, if other people have better ideas/suggestions. Thanks, Marco Il giorno mar 11 dic 2018 alle ore 18:31 Jörn Franke ha scritto: > I don’t know y

Self join

2018-12-11 Thread Marco Gaido
mment-393554552). So I'd like to propose to discuss here which is the best approach for tackling this issue, which I think would be great to fix for 3.0.0, so if we decide to introduce breaking changes in the design, we can do that. Thoughts on this? Thanks, Marco

Re: Jenkins down?

2018-11-19 Thread Marco Gaido
sterday: >>> https://github.com/apache/spark/commits/master >>> That might also be a factor in whatever you're observing. >>> On Mon, Nov 19, 2018 at 10:53 AM Marco Gaido >>> wrote: >>> > >>

Jenkins down?

2018-11-19 Thread Marco Gaido
Hi all, I see that Jenkins is not starting builds for the PRs today. Is it in maintenance? Thanks, Marco

Re: Is spark.sql.codegen.factoryMode property really for tests only?

2018-11-16 Thread Marco Gaido
Hi Jacek, I do believe it is correct. Please check the method you mentioned (CodeGeneratorWithInterpretedFallback.createObject): the value is relevant only if Utils.isTesting. Thanks, Marco Il giorno ven 16 nov 2018 alle ore 13:28 Jacek Laskowski ha scritto: > Hi, > > While revi

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Marco Gaido
we could evaluate in the long term. Thanks, Marco Il giorno ven 26 ott 2018 alle ore 19:07 Sean Owen ha scritto: > OK let's keep this about Hive. > > Right, good point, this is really about supporting metastore versions, and > there is a good argument for retaining backwards-c

[DISCUSS] Support decimals with negative scale in decimal operation

2018-10-25 Thread Marco Gaido
PR. Thanks, Marco

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread Marco Gaido
Hi all, I think a very big topic on this would be: what do we want to do with the old mllib API? For long I have been told that it was going to be removed on 3.0. Is this still the plan? Thanks, Marco Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin ha scritto: > Might be good to t

Re: Random sampling in tests

2018-10-08 Thread Marco Gaido
Yes, I see. It makes sense. Thanks. Il giorno lun 8 ott 2018 alle ore 16:35 Reynold Xin ha scritto: > Marco - the issue is to reproduce. It is much more annoying for somebody > else who might not have touched this test case to be able to reproduce the > error, just given a timezone. I

Re: Random sampling in tests

2018-10-08 Thread Marco Gaido
e can directly use the failing timezone. Thanks, Marco Il giorno lun 8 ott 2018 alle ore 16:24 Xiao Li ha scritto: > For this specific case, I do not think we should test all the timezone. If > this is fast, I am fine to leave it unchanged. However, this is very slow. > Thus, I even prefer t

Re: welcome a new batch of committers

2018-10-03 Thread Marco Gaido
Congrats you all! Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh ha scritto: > > Congratulations to all new committers! > > > rxin wrote > > Hi all, > > > > The Apache Spark PMC has recently voted to add several new committers to > > the project, for their contributions: > > > > - Shane

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Marco Gaido
-1, I was able to reproduce SPARK-25538 with the provided data. Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha scritto: > +1 > > Original message > From: Denny Lee > Date: 9/30/18 10:30 PM (GMT-08:00) > To: Stavros Kontopoulos > Cc: Sean Owen , Wenchen Fan , dev < > dev@sp

Re: SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
Hi Wenchen, Thank you for the clarification. I agree that this is more a bug fix rather than an improvement. I apologize for the error. Please consider this as a design doc. Thanks, Marco Il giorno ven 21 set 2018 alle ore 12:04 Wenchen Fan ha scritto: > Hi Marco, > > Thanks for s

SPIP: support decimals with negative scale in decimal operation

2018-09-21 Thread Marco Gaido
2450. Looking forward to hear your feedback, Thanks. Marco

***UNCHECKED*** Re: Re: Re: Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-19 Thread Marco Gaido
It is not new, it has been there since 2.3.0, so in that case this is not a blocker. Thanks. Il giorno mer 19 set 2018 alle ore 09:21 Reynold Xin ha scritto: > We also only block if it is a new regression. > > On Wed, Sep 19, 2018 at 12:18 AM Saisai Shao > wrote: > >> Hi

***UNCHECKED*** Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-18 Thread Marco Gaido
Sorry, I am -1 because of SPARK-25454 which is a regression from 2.2. Il giorno mer 19 set 2018 alle ore 03:45 Dongjoon Hyun < dongjoon.h...@gmail.com> ha scritto: > +1. > > I tested with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive > -Phive-thriftserve` on OpenJDK(1.8.0_181)/CentOS 7.5. > > I hit t

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-18 Thread Marco Gaido
Sorry but I am -1 because of what was reported here: https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104 . It is a regression unfortunately. Despite the impact is not huge and there are

Re: Persisting driver logs in yarn client mode (SPARK-25118)

2018-08-22 Thread Marco Gaido
I agree with Saisai. You can also configure log4j to append anywhere else other than the console. Many companies have their system for collecting and monitoring logs and they just customize the log4j configuration. I am not sure how needed this change would be. Thanks, Marco Il giorno mer 22 ago

Re: sql compile failing with Zinc?

2018-08-14 Thread Marco Gaido
I am not sure, I managed to build successfully using the mvn in the distribution today. Il giorno mar 14 ago 2018 alle ore 22:02 Sean Owen ha scritto: > If you're running zinc directly, you can give it more memory with -J-Xmx2g > or whatever. If you're running ./build/mvn and letting it run zinc

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Marco Gaido
-1, due to SPARK-25051. It is a regression and it is a correctness bug. In 2.3.0/2.3.1 an Analysis exception was thrown, 2.2.* works fine. I cannot reproduce the issue on current master, but I was able using the prepared 2.3.2 release. Il giorno mar 14 ago 2018 alle ore 10:04 Saisai Shao ha scri

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-10 Thread Marco Gaido
Hi Makatun, I think your problem has been solved in https://issues.apache.org/jira/browse/SPARK-16406 which is going to be in Spark 2.4. Please try on the current master, where you should see the problem disappeared. Thanks, Marco 2018-08-09 12:56 GMT+02:00 makatun : > Here are the ima

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Marco Gaido
Hi Wenchen, I think it would be great to consider also - SPARK-24598 <https://issues.apache.org/jira/browse/SPARK-24598>: Datatype overflow conditions gives incorrect result As it is a correctness bug. What do you think? Thanks, Marco 2018-07-31 4:01 GMT+02:00 Wenchen Fan : > I wen

Re: [DISCUSS] Adaptive execution in Spark SQL

2018-07-31 Thread Marco Gaido
Hi all, I also like this idea very much and I think it may bring also other performance improvements in the future. Thanks to everybody who worked on this. I agree to target this feature for 3.0. Thanks everybody, Bests. Marco On Tue, 31 Jul 2018, 08:39 Wenchen Fan, wrote: > Hi Carson

Re: [VOTE] SPIP: Standardize SQL logical plans

2018-07-17 Thread Marco Gaido
+1 (non-binding) On Wed, 18 Jul 2018, 07:43 Takeshi Yamamuro, wrote: > +1 (non-binding) > > On Wed, Jul 18, 2018 at 2:41 PM John Zhuge wrote: > >> +1 (non-binding) >> >> On Tue, Jul 17, 2018 at 8:06 PM Wenchen Fan wrote: >> >>> +1 (binding). I think this is more clear to both users and develop

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-16 Thread Marco Gaido
+1 too On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, wrote: > +1 > > 2018년 7월 17일 (화) 오전 7:34, Sean Owen 님이 작성: > >> Fix is committed to branches back through 2.2.x, where this test was >> added. >> >> There is still some issue; I'm seeing that archive.apache.org is >> rate-limiting downloads and fre

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-15 Thread Marco Gaido
+1, this was indeed a problem in the past. On Sun, 15 Jul 2018, 22:56 Reynold Xin, wrote: > Makes sense. Thanks for looking into this. > > On Sun, Jul 15, 2018 at 1:51 PM Sean Owen wrote: > >> Yesterday I cleaned out old Spark releases from the mirror system -- >> we're supposed to only keep th

Re: [SPARK][SQL][CORE] Running sql-tests

2018-07-03 Thread Marco Gaido
Hi Daniel, please check sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala <https://github.com/apache/spark/pull/21568/files#diff-432455394ca50800d5de508861984ca5>. You should find all your answers in the comments there. Thanks, Marco 2018-07-03 19:08 GMT+02:00 dm

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-29 Thread Marco Gaido
n Fan : >> >>> SortMergeJoin sorts its children by join key, but broadcast join does >>> not. I think the output ordering of broadcast join has nothing to do with >>> join key. >>> >>> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido >>> wr

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
t;> >>> Why we cannot use the output order of big table? >>> >>> >>> Chrysan Wu >>> Phone:+86 17717640807 >>> >>> >>> 2018-06-28 21:48 GMT+08:00 Marco Gaido : >>> >>>> The easy answer to this is that S

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
The easy answer to this is that SortMergeJoin ensure an outputOrdering, while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you don't know which is going to be the order of the output since nothing enforces it. Hope this helps. Thanks. Marco 2018-06-28 15:46 GMT

Re: Time for 2.3.2?

2018-06-28 Thread Marco Gaido
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely... 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1 >> >> Wenchen Fan 于2018年6月28日 周四下

Re: Spark issue 20236 - overwrite a partitioned data srouce

2018-06-14 Thread Marco Gaido
Hi Alessandro, I'd recommend you to check the UTs added in the commit which solved the issue (ie. https://github.com/apache/spark/commit/a66fe36cee9363b01ee70e469f1c968f633c5713). You can use them to try and reproduce the issue. Thanks, Marco 2018-06-14 15:57 GMT+02:00 Alessandro Lip

Re: Time for 2.1.3

2018-06-13 Thread Marco Gaido
Yes, you're right Herman. Sorry, my bad. Thanks. Marco 2018-06-13 14:01 GMT+02:00 Herman van Hövell tot Westerflier < her...@databricks.com>: > Isn't this only a problem with Spark 2.3.x? > > On Wed, Jun 13, 2018 at 1:57 PM Marco Gaido > wrote: > >> Hi Ma

Re: Time for 2.1.3

2018-06-13 Thread Marco Gaido
Hi Marcelo, thanks for bringing this up. Maybe we should consider to include SPARK-24495, as it is causing some queries to return an incorrect result. What do you think? Thanks, Marco 2018-06-13 1:27 GMT+02:00 Marcelo Vanzin : > Hey all, > > There are some fixes that went into 2.1.3

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Marco Gaido
I'd be against having a new feature in a minor maintenance release. I think such a release should contain only bugfixes. 2018-05-16 12:11 GMT+02:00 kant kodali : > Can this https://issues.apache.org/jira/browse/SPARK-23406 be part of > 2.3.1? > > On Tue, May 15, 2018 at 2:07 PM, Marcelo Vanzin >

Re: parser error?

2018-05-14 Thread Marco Gaido
Yes Takeshi, I agree, I think we can easily fix the warning replacing the * with +, since the two options are not required. I will test this fix and create a PR when it is ready. Thanks, Marco 2018-05-14 15:08 GMT+02:00 Takeshi Yamamuro : > IIUC, since the `lateral View*` matches an em

Re: eager execution and debuggability

2018-05-08 Thread Marco Gaido
, as Ryan also mentioned, there are tools/ways to force the execution, helping in the debugging phase. So they can achieve without a big effort the same result, but with a big difference: they are aware of what is really happening, which may help them later. Thanks, Marco 2018-05-08 21:37 GMT+02

Re: Transform plan with scope

2018-04-24 Thread Marco Gaido
/decide that such an operation I proposed might be useful, probably we can spend more time on investigating the best solution (any suggestion in case would be very welcomed). Any more thoughts on this? Thanks for your answers and your time, Marco 2018-04-24 19:47 GMT+02:00 Herman van Hövell tot

Transform plan with scope

2018-04-24 Thread Marco Gaido
be used by their parents? Or do you think it is useful in general to introduce the concept of scope (if an attribute can be accessed by a node of a plan)? Thanks, Marco

Re: Block Missing Exception while connecting Spark with HDP

2018-04-24 Thread Marco Gaido
, Marco On Tue, 24 Apr 2018, 09:21 Sing, Jasbir, wrote: > i am using HDP2.6.3 and 2.6.4 and using the below code – > > > > 1. Creating sparkContext object > 2. Reading a text file using – rdd =sc.textFile(“hdfs:// > 192.168.142.129:8020/abc/test1.txt”); > 3. println(rdd.count);

Re: Accessing Hive Tables in Spark

2018-04-09 Thread Marco Gaido
Hi Tushar, It seems Spark is not able to access the metastore. It may be because you are using derby metastases which is maintained locally. Please check all your configurations and that Spark has access to the hive-site.xml file with the metastore uri. Thanks, Marco On Tue, 10 Apr 2018, 08:20

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marco Gaido
is feasible) or we have to switch between 2 implementations according to the Java version. So I'd rather avoid doing this in a non-major release. Thanks, Marco 2018-04-05 17:35 GMT+02:00 Mark Hamstra : > As with Sean, I'm not sure that this will require a new major version, but >

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Marco Gaido
Congrats Zhenhua! 2018-04-02 11:00 GMT+02:00 Saisai Shao : > Congrats, Zhenhua! > > 2018-04-02 16:57 GMT+08:00 Takeshi Yamamuro : > >> Congrats, Zhenhua! >> >> On Mon, Apr 2, 2018 at 4:13 PM, Ted Yu wrote: >> >>> Congratulations, Zhenhua >>> >>> Original message >>> From: 雨中漫步

Re: Contributing to Spark

2018-03-12 Thread Marco Gaido
on it. Looking forward for your contributions. Best regards, Marco 2018-03-12 10:48 GMT+01:00 Roman Maier : > Hello everyone, > > I would like to contribute to Spark. > > Can somebody give me the possibility to assign issues in jira? > > > > > > Sincerely, > > Roman Maier >

Re: Welcoming some new committers

2018-03-03 Thread Marco Gaido
Congratulations to you all! On 3 Mar 2018 8:30 a.m., "Liang-Chi Hsieh" wrote: > > Congrats to everyone! > > > Kazuaki Ishizaki wrote > > Congratulations to everyone! > > > > Kazuaki Ishizaki > > > > > > > > From: Takeshi Yamamuro < > > > linguin.m.s@ > > > > > > To: Spark dev list < > > >

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marco Gaido
+1 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon : > +1 too > > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN : > >> +1 >> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang >> wrote: >> >>> +1 >>> >>> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道: >>> +1 On Tue, Feb 20, 2018 at 12:53 PM, Reynold

Re: There is no space for new record

2018-02-13 Thread Marco Gaido
You can check all the versions where the fix is available on the JIRA SPARK-23376. Anyway it will be available in the upcoming 2.3.0 release. Thanks. On 13 Feb 2018 9:09 a.m., "SNEHASISH DUTTA" wrote: > Hi, > > In which version of Spark will this fix be available ? > The deployment is on EMR >

BroadcastHashJoinExec cleanup

2018-01-29 Thread Marco Gaido
ch cases a BroadcastExchangeExec can be used more than once (I can't think of any)? Thanks, Marco

Re: Failing Spark Unit Tests

2018-01-23 Thread Marco Gaido
I tried doing a change for it, but I was unable to reproduce. Anyway, I am seeing some unrelated errors in other PRs too, so there might be (or might have been) something wrong at some point. But I'd expect the test to pass locally anyway. 2018-01-23 15:23 GMT+01:00 Sean Owen : > That's odd. The

Join Strategies

2018-01-13 Thread Marco Gaido
/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L260 May you kindly explain me why this is done? It doesn't seem a great choice to me, since BroadcastNestedLoopJoinExec can fail with OOM. Thanks, Marco

Re: Decimals

2017-12-22 Thread Marco Gaido
or proposed in the PR for points 1 and 2 is the right one. For 3, I am not sure because Hive behaves differently and now we are compliant to Hive. I would propose to adhere to the SQL standard, but I am open to discuss it (indeed I'd really love some feedbacks by the community on it). Than

R: Decimals

2017-12-21 Thread Marco Gaido
w (as Hermann was suggesting in the PR). Do we agree on this way? If so, is there any way to read a configuration property in the catalyst project? Thank you, Marco - Messaggio originale - Da: "Xiao Li" Inviato: ‎21/‎12/‎2017 22:46 A: "Marco Gaido" Cc: "Reynol

Re: Decimals

2017-12-19 Thread Marco Gaido
ll/20023. For 3, I'd love to get your feedbacks in order to agree on what to do and then I will eventually do a PR which reflect what decided here by the community. I would really love to get your feedback either here or on the PR. Thanks for your patience and your time reading this long ema

Decimals

2017-12-12 Thread Marco Gaido
eries as Decimal and not as Double? I think it is very unlikely that a user can enter a number which is beyond Double precision. - why are we returning null in case of precision loss? Is this approach better than just giving a result which might loose some accuracy? Thanks, Marco

Re: Some Spark MLLIB tests failing due to some classes not being registered with Kryo

2017-11-11 Thread Marco Gaido
Hi Jorge, then try running the tests not from the mllib folder, but on Spark base directory. If you want to run only the tests in mllib, you can specify the project using the -pl argument of mvn. Thanks, Marco 2017-11-11 13:37 GMT+01:00 Jorge Sánchez : > Hi Marco, > > Just mvn test

unsubscribe

2017-11-10 Thread marco rocchi
unsubscribe

Re: Timeline for Spark 2.3

2017-11-10 Thread Marco Gaido
I would love too to have SPARK-18016. I think it would help a lot of users. 2017-11-10 5:58 GMT+01:00 Nick Pentreath : > +1 I think that’s practical > > On Fri, 10 Nov 2017 at 03:13, Erik Erlandson wrote: > >> +1 on extending the deadline. It will significantly improve the logistics >> for upstr

[ML] Migrating transformers from mllib to ml

2017-11-06 Thread Marco Gaido
still an issue since we are going to deprecate mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can work on this. Thanks, Marco

Inclusion of Spark on SDKMAN

2017-09-27 Thread Marco Vermeulen
Hi all, My name is Marco and I am the project lead of SDKMAN. For those of you who are not familiar with the project, it is a FLOSS SDK management tool which allows you to install and switch seamlessly between multiple versions of the same SDK when using UNIX shells. You can read more about it

GC limit exceed

2017-01-18 Thread marco rocchi
attention Marco

Re: SparkUI via proxy

2016-11-25 Thread marco rocchi
Thanks to all, I solved the problem. I'm sorry if the question was off topic, next time I'll post to stackoverflow. Thanks a lot 2016-11-25 17:19 GMT+01:00 marco rocchi : > Thanks for the helping. > I've created my ssh tunnel at port 4040, and setted browser firefox SOCK

Re: SparkUI via proxy

2016-11-25 Thread marco rocchi
.168.1.204:4040, webUI doesn't appear. Where I'm wrong? The question could be stupid, but I never worked with spark over a cluster :) Thanks Marco 2016-11-25 10:19 GMT+01:00 Ewan Leith : > This is more of a question for the spark user’s list, but if you look at > FoxyProxy and

SparkUI via proxy

2016-11-24 Thread marco rocchi
Hi, I'm working with Apache Spark in order to develop my master thesis.I'm new in spark and working with cluster. I searched through internet but I didn't found a way to solve. My problem is the following one: from my pc I can access to a master node of a cluster only via proxy. To connect to proxy

Dynamic Graph Handling

2016-10-24 Thread Marco
nnot=20 provide solution to manages this kind of graph, and searching throw=20 internet I didn't found nothing of relevant. There is a framework or a=20 way to handle dynamic graphs? Thanks in advance Marco Rocchi - To unsubscribe

Spark MOOC - early access

2015-05-21 Thread Marco Shaw
please feel free to contact me (marco.s...@gmail.com ) with any issues, comments, or questions.Sincerely,Marco ShawSpark MOOC TA_(This is being sent as an HTML formatted email. Some of the links have been duplicated just in case.)1. Install VirtualBox here <https://www.virt

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Marco Slot
o have a particular impact on query times. We are now looking forward to working together with SparkSQL developers, and re-running the numbers with proposed optimizations. regards, Marco