Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread DB Tsai
For second >>> one, we propose (SPARK-34198) to add it as an external module to relieve the >>> dependency concern. >>> >>> Because it was pushed back previously, I'm going to raise this discussion to >>> know what people think about it now, in advance

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-14 Thread DB Tsai
+1 (binding) DB Tsai | ACS Spark Core |  Apple, Inc. > On Apr 14, 2021, at 10:42 AM, Wenchen Fan wrote: > > +1 (binding) > > On Thu, Apr 15, 2021 at 12:22 AM Maxim Gekk <mailto:maxim.g...@databricks.com>> wrote: > +1 (non-binding) > > On Wed, Apr

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread DB Tsai
+1 (binding) > On Apr 28, 2021, at 9:26 AM, Liang-Chi Hsieh wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until May 4th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Relea

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread DB Tsai
+1 on renaming. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Jun 24, 2021, at 11:41 AM, Chao Sun wrote: > > Hi, > > As Spark master has upgraded to Hadoop-3.3.1, the current Maven profile name > hadoop-3.2 is no longer accurate, and it may confuse Spa

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread DB Tsai
Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet 1.12, so it might be easier to wait for the fix in parquet community instead of reverting all the related changes. The fix in parquet community is very trivial, and we hope that it will not take too long. Thanks. DB Tsai

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-11 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 On Mon, Oct 11, 2021 at 6:01 AM Almeida, (Ricardo) wrote: > > +1 (non-binding) > > > > Ricardo Almeida > > > > From: Xiao Li > Sent: Monday, October 11, 2021 9:09 AM > To: Yi Wu > Cc: Ho

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-23 Thread DB Tsai
forward to it as a new feature in Spark 3.3 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 On Fri, Oct 22, 2021 at 12:18 PM Chao Sun wrote: > > Hi, > > Ryan and I drafted a design doc to support a new type of join: storage > partitioned join which covers bucket j

Re: [VOTE] SPIP: Storage Partitioned Join for Data Source V2

2021-10-29 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 On Fri, Oct 29, 2021 at 11:42 AM Ryan Blue wrote: > +1 > > On Fri, Oct 29, 2021 at 11:06 AM huaxin gao > wrote: > >> +1 >> >> On Fri, Oct 29, 2021 at 10:59 AM Dongjoon Hyun >> wrote: >&g

Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-12 Thread DB Tsai
----- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > >> > > - > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1

Re: Apache Spark Jenkins Infra 2022

2022-01-09 Thread DB Tsai
Thank you, Dongjoon for driving the build infra. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Jan 9, 2022, at 6:38 PM, shane knapp ☠ wrote: > > > apache spark jenkins lives on! > > @dongjoon, let me know if there's anything you need

Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread DB Tsai
Thank you, Huaxin for the 3.2.1 release! Sent from my iPhone > On Jan 28, 2022, at 5:45 PM, Chao Sun wrote: > >  > Thanks Huaxin for driving the release! > >> On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng wrote: >> It's Great! >> Congrats and thanks, huaxin! >> >> >> -- 原始邮

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread DB Tsai
+1Sent from my iPhoneOn Jan 31, 2023, at 4:16 PM, Yuming Wang wrote:+1.On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura wrote:Great! Much appreciated, Mitch! KazuOn Jan 31, 2023, at 3:07 PM, Mich Talebzadeh wrote:Thanks, Kazu.I followed that template link and indeed a

Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-14 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Feb 14, 2023, at 8:29 AM, Guo Weijie wrote: > > +1 > > Yuming Wang mailto:wgy...@gmail.com>> 于2023年2月14日周二 > 15:58写道: >> +1 >> >> On Tue, Feb 14, 2023 at 11:27 AM Prem Sahoo >

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread DB Tsai
Kubernetes operator is essential for our Spark community as well. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Nov 9, 2023, at 12:05 PM, Zhou Jiang wrote: > > Hi Spark community, > I'm reaching out to initiate a conversation about the possibility of &g

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov > wrote: > > +1 (non-binding) > > On Tue, Nov 14, 2023 at 8:03 PM Chao Sun <mailto:sunc...@apache.org>> wrote: >> +1 >> >>

[VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-19 Thread DB Tsai
es, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open S

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread DB Tsai
differences between RC8 and 2.4.0 are big? If an issue is found to justify to fail RC8, we can include SPARK-27112 and SPARK-27160 in next cut. Thus, even we decide to cut another RC, it will be easier to test. Thanks. Sincerely, DB Tsai -- Web

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-22 Thread DB Tsai
branch-2.4, can you make anther PR against branch-2.4 so we can include the ORC fix in 2.4.1? Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Wed, Mar 20, 2019 at 9:11 PM Felix Cheung wrote

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-23 Thread DB Tsai
-1 I will fail RC8, and cut another RC9 on Monday to include SPARK-27160, SPARK-27178, SPARK-27112. Please let me know if there is any critical PR that has to be back-ported into branch-2.4. Thanks. Sincerely, DB Tsai -- Web: https

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-24 Thread DB Tsai
Hello Sean, By looking at SPARK-26961 PR, seems it's ready to go. Do you think we can merge it into 2.4 branch soon? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Sat, Mar 23, 2019 at 12:04 PM Sean

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
I am going to cut a 2.4.1 rc9 soon tonight. Besides SPARK-26961 https://github.com/apache/spark/pull/24126 , anything critical that we have to wait for 2.4.1 release? Thanks! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
RC9 was just cut. Will send out another thread once the build is finished. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Mon, Mar 25, 2019 at 5:10 PM Sean Owen wrote: > > That's all merged n

[VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-27 Thread DB Tsai
es, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open S

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread DB Tsai
+1 from myself On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth wrote: > +1 (non-binding) > > Thanks, Misi > > Sean Owen ezt írta (időpont: 2019. márc. 28., Cs, > 0:19): > >> +1 from me - same as last time. >> >> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai wrote:

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-31 Thread DB Tsai
This vote passes! +1: Wenchen Fan (binding) Sean Owen (binding) Mihaly Toth DB Tsai (binding) Jonatan Jäderberg Xiao Li (binding) Denny Lee Felix Cheung (binding) +0: None -1: None It's the largest RC ever; I will follow up with an official release announcement soon. Thank you all for

[ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-04 Thread DB Tsai
ithout you. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-04 Thread DB Tsai
+user list We are happy to announce the availability of Spark 2.4.1! Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4.0 users to upgrade to this stable release. In Apache Spark 2.4.1, Scala 2.12 support is GA, and it'

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-24 Thread DB Tsai
umnar processing support, I can imagine that the heavy lifting parts of ML applications (such as computing the objective functions) can be written as columnar expressions that leverage on SIMD architectures to get a good speedup. Sincerely, DB

Re: Release Apache Spark 2.4.4

2019-08-13 Thread DB Tsai
+1 On Tue, Aug 13, 2019 at 4:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` > since 2.4.3. > > It would be great if we can have Spark 2.4.4. > Shall we start `2.4.4 RC1`

Re: JDK11 Support in Apache Spark

2019-08-24 Thread DB Tsai
Congratulations on the great work! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Sat, Aug 24, 2019 at 8:11 AM Dongjoon Hyun wrote: > > Hi, All. > > Thanks to your many many contributions, &g

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-27 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Tue, Aug 27, 2019 at 11:31 AM Dongjoon Hyun wrote: > > +1. > > I also verified SHA/GPG and tested UTs on AdoptOpenJDKu8_222/CentOS6.9 wit

[DISCUSSION]JDK11 for Apache 2.x?

2019-08-27 Thread DB Tsai
is not desired in minor release? Thanks. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread DB Tsai
+1 Thanks! On Wed, Aug 28, 2019 at 7:14 AM Wenchen Fan wrote: > +1, no more blocking issues that I'm aware of. > > On Wed, Aug 28, 2019 at 8:33 PM Sean Owen wrote: > >> +1 from me again. >> >> On Tue, Aug 27, 2019 at 6:06 PM Dongjoon Hyun >> wrote: >> > >> > Please vote on releasing the follo

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-14 Thread DB Tsai
+1 Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Tue, Jan 14, 2020 at 11:08 AM Sean Owen wrote: > > Yeah it's something about the env I spun up, but I don't know what. It >

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-17 Thread DB Tsai
+1 as well. Thanks. On Sun, May 17, 2020 at 7:39 AM Sean Owen wrote: > +1 , same response as to the last RC. > This looks like it includes the fix discussed last time, as well as a > few more small good fixes. > > On Sat, May 16, 2020 at 12:08 AM Holden Karau > wrote: > > > > Please vote on rel

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-18 Thread DB Tsai
' code when upgrading from Scala 2.11 to Scala 2.12. Thanks, Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 Sincerely, DB Tsai -- Web:

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread DB Tsai
+1 (binding), thanks! On Sun, May 31, 2020 at 9:23 PM Wenchen Fan wrote: > +1 (binding), although I don't know why we jump from RC 3 to RC 8... > > On Mon, Jun 1, 2020 at 7:47 AM Holden Karau wrote: > >> Please vote on releasing the following candidate as Apache Spark >> version 2.4.6. >> >> Th

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread DB Tsai
+1 (binding) Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Mon, Jun 8, 2020 at 1:03 PM Dongjoon Hyun wrote: > > +1 > > Thanks, > Dongjoon. > > On Mon, Jun 8, 2020 at 6:37 AM Russ

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread DB Tsai
can still move forward using new features. Afterall, the reason why we are working on OSS is we like people to use our code, isn't it? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Fri, Jun 12, 2020

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-13 Thread DB Tsai
. >> At the job level sure, but upgrading large jobs, possibly written in Scala >> 2.11, whole-hog as it currently stands is not a small matter. >> >> On Fri, Jun 12, 2020 at 9:40 PM DB Tsai wrote: >> +1 for a 2.x release with DSv2, JDK11, and Scala 2.11 supp

Re: [VOTE] Decommissioning SPIP

2020-07-02 Thread DB Tsai
; >>> +1 for having this feature in Spark >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >>> >>> ----- &

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread DB Tsai
/edit >> >> Active discussions on the jira and SPIP document have settled. >> >> I will leave the vote open until Friday (the 18th September 2020), 5pm >> CST. >> >> [ ] +1: Accept the proposal as an official SPIP >> [ ] +0 >> [ ] -1: I don't think this is a good idea because ... >> >> >> Thanks, >> Mridul >> > -- Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-06 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Fri, Oct 6, 2017 at 7:46 AM, Felix Cheung wrote: > Thanks Nick, Hyukjin. Yes this seems to be a longer standing issue on RHEL > with resp

Re: Welcoming Tejas Patil as a Spark committer

2017-10-06 Thread DB Tsai
Congratulations! On Wed, Oct 4, 2017 at 6:55 PM, Liwei Lin wrote: > Congratulations! > > Cheers, > Liwei > > On Wed, Oct 4, 2017 at 2:27 PM, Yuval Itzchakov wrote: >> >> Congratulations and Good luck! :) >> >> >> >> -- >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> >

Re: Will higher order functions in spark SQL be pushed upstream?

2017-10-10 Thread DB Tsai
, this effort is primarily tracked via SPARK-4502 (see >> https://github.com/apache/spark/pull/16578) and is currently targeted for >> 2.3. -- Sincerely, DB Tsai -- PGP Key ID: 0x5CED8B896A6BDFA0 - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [MLLib] Logistic Regression and standadization

2018-04-24 Thread DB Tsai
the result should match R. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Apr 20, 2018, at 5:56 PM, Weichen Xu wrote: > > Right. If regularization item isn't zero, then enable/disable standardization > will get different result. > But,

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
blocker for us to move to newer version of Scala 2.12.x since the newer version of Scala 2.12.x has the same issue. In my opinion, Scala should fix the root cause and provide a stable hook for 3rd party developers to initialize their custom code. DB Tsai | Siri Open Source Technologies [not a

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
ark context Web UI available at http://192.168.1.169:4040 Spark context available as 'sc' (master = local[*], app id = local-1528180279528). Spark session available as 'spark’. scala> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Jun 7,

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
018 at 5:54 PM, Holden Karau >> wrote: >> > I agree that's a little odd, could we not add the bacspace terminal >> > character? Regardless even if not, I don't think that should be a >> blocker >> > for 2.12 support especially since it doesn'

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread DB Tsai
I'll +1 on removing those legacy mllib code. Many users are confused about the APIs, and some of them have weird behaviors (for example, in gradient descent, the intercept is regularized which supports not to). DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread DB Tsai
selected simultaneously. https://issues.apache.org/jira/browse/SPARK-25879 If we decide to not fix it in 2.4, we should at least document it in the release note to let users know. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: Java 11 support

2018-11-06 Thread DB Tsai
Given Oracle's new 6-month release model, I think the only realistic option is to only support and test LTS JDK. I'll send out two separate emails to dev to facilitate the discussion. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 9

Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
have ample time to work on bugs and issues that we may run into. What do you think? Thanks, DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Test and support only LTS JDK release?

2018-11-06 Thread DB Tsai
Given Oracle's new 6-month release model, I feel the only realistic option is to only test and support JDK such as JDK 11 LTS and future LTS release. I would like to have a discussion on this in Spark community. Thanks, DB Tsai | Siri Open Source Technologies [not a contrib

Re: Test and support only LTS JDK release?

2018-11-06 Thread DB Tsai
OpenJDK will follow Oracle's release cycle, https://openjdk.java.net/projects/jdk/ <https://openjdk.java.net/projects/jdk/>, a strict six months model. I'm not familiar with other non-Oracle VMs and Redhat support. DB Tsai | Siri Open Source Technologies [not a contribution]

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
agree with Sean that this can make the decencies really complicated; hence I support to drop Scala 2.11 in Spark 3.0 directly. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 11:38 AM, Sean Owen wrote: > > I think we should make Scala

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
Ideally, supporting only Scala 2.12 in Spark 3 will be ideal. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 2:55 PM, Felix Cheung wrote: > > So to clarify, only scala 2.12 is supported in Spark 3? > > > From: Ryan Blu

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-08 Thread DB Tsai
later if we want to change the alternative Scala version to 2.13 and drop 2.11 if we just want to support two Scala versions at one time. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Wed, Nov 7,

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-16 Thread DB Tsai
Most of the time in the PR build is on running tests. How about we also add Scala 2.11 compilation for both main and test without running the tests in the PR build? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-21 Thread DB Tsai
+1 on removing Scala 2.11 support for 3.0 given Scala 2.11 is already EOL. On Tue, Nov 20, 2018 at 2:53 PM Sean Owen wrote: > PS: pull request at https://github.com/apache/spark/pull/23098 > Not going to merge it until there's clear agreement. > > On Tue, Nov 20, 2018 at 10:16 AM Ryan Blue wro

Re: Automated formatting

2018-11-21 Thread DB Tsai
I like the idea of checking only the diff. Even I am sometimes confused about the right style in Spark since I am working on multiple projects with slightly different coding styles. On Wed, Nov 21, 2018 at 1:36 PM Sean Owen wrote: > I know the PR builder runs SBT, but I presume this would just b

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-08 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.2.3. >

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread DB Tsai
-1 Agreed with Anton that this bug will potentially corrupt the data silently. As he is ready to submit a PR, I'll suggest to wait to include the fix. Thanks! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6

Time to cut an Apache 2.4.1 release?

2019-02-11 Thread DB Tsai
Hello all, I am preparing to cut a new Apache 2.4.1 release as there are many bugs and correctness issues fixed in branch-2.4. The list of addressed issues are https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20D

Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread DB Tsai
Great. I'll prepare the release for voting. Thanks! DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Feb 12, 2019, at 4:11 AM, Wenchen Fan wrote: > > +1 for 2.4.1 > > On Tue, Feb 12, 2019 at 7:55 PM Hyukjin Kwon wrote: > +1 for 2.4

[VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread DB Tsai
e will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Technol

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread DB Tsai
Okay. Let's fail rc2, and I'll prepare rc3 with SPARK-26859. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Feb 20, 2019, at 12:11 PM, Marcelo Vanzin > wrote: > > Just wanted to point out that > https://issues.apache.org/jira/bro

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread DB Tsai
I am cutting a new rc4 with fix from Felix. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0359BC9965359766 On Thu, Feb 21, 2019 at 8:57 AM Felix Cheung wrote: > > I merged the fix

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-07 Thread DB Tsai
spark-streaming-flume-assembly_2.11-2.4.1-tests.jar', check the logs.* I am sure my key is in the key server, and the weird thing is that it fails on different jars each time I ran the publish script. Sincerely, DB Tsai -- Web: https://www.

[VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-07 Thread DB Tsai
es, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Technol

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
of using the same commit causing this issue. Should we create a new rc7? DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Mar 8, 2019, at 10:54 AM, Marcelo Vanzin > wrote: > > I personally find it a little weird to not have the commit i

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
Okay, I see the problem. rc6 tag is not in the 2.4 branch. It's very weird. It must be overwritten by a force push. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Mar 8, 2019, at 11:39 AM, DB Tsai wrote: > > I was using `./do-release-docker

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
Since I can not find the commit of `Preparing development version 2.4.2-SNAPSHOT` after rc6 cut, it's very risky to fix the branch and do a force-push. I'll follow Marcelo's suggestion to have another rc7 cut. Thus, this vote fails. DB Tsai | Siri Open Source Technologies [not

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread DB Tsai
As we have many important fixes in 2.4 branch which we want to release asap, and this is is not a regression from Spark 2.4; as a result, 2.4.1 will be not blocked by this. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-14 Thread DB Tsai
Since rc8 was already cut without the k8s client upgrade; the build is ready to vote, and including k8s client upgrade in 2.4.1 implies that we will drop the old-but-not-that-old K8S versions as Sean mentioned, should we include this upgrade in 2.4.2? Thanks. Sincerely, DB Tsai

Re: welcoming Xiao Li as a committer

2016-10-05 Thread DB Tsai
Congrats, Xiao! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x9DCC1DBD7FC7BBB2 On Wed, Oct 5, 2016 at 2:36 PM, Fred Reiss wrote: > Congratulations, Xiao! > > Fred > > > On Tuesday, October 4, 2016, Jos

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-10 Thread DB Tsai
-1 I think that back-porting SPARK-20270 <https://github.com/apache/spark/pull/17577> and SPARK-18555 <https://github.com/apache/spark/pull/15994> are very important since it's a critical bug that na.fill will mess up the data in Long even the data isn't null. Thanks.

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-10 Thread DB Tsai
I backported the fix into both branch-2.1 and branch-2.0. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Mon, Apr 10, 2017 at 4:20 PM, Ryan Blue wrote: > DB, > > This vote already f

Re: [mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread DB Tsai
ckage, there are different strategies to do feature scalling for linear regression and logistic regression; as a result, we don't want to make it public api naively without addressing different use-case. Sincerely, DB Tsai --- My B

Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try ./make-distribution.sh -Pyarn -Phive -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.1.3 -DskipTests See the detail of how to change the profile at https://spark.apache.org/docs/latest/building-with-maven.html Sincerely, DB Tsai

Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
oh, I meant to say cdh5.1.3 used by Jakub's company is based on 2.3. You can see it from the first part of the Cloudera's version number - "2.3.0-cdh 5.1.3". Sincerely, DB Tsai --- My Blog: https://www.dbtsai

CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Hi Xiangrui, It seems that it's stateless so will be hard to implement regularization path. Any suggestion to extend it? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/d

Re: CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps: Array[ParamMap]): Seq[M] can be overwritten to implement regularization path. Correct me if I'm wrong. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn:

Re: LinearRegressionWithSGD accuracy

2015-01-17 Thread DB Tsai
I'm working on LinearRegressionWithElasticNet using OWLQN now. This will do the data standardization internally so it's transparent to users. With OWLQN, you don't have to manually choose stepSize. Will send out PR soon next week. Sinc

Re: Maximum size of vector that reduce can handle

2015-01-23 Thread DB Tsai
ad, loss), (label, features)) => val l = localGradient.compute( features, label, bcW.value, grad) (grad, loss + l) }, combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => axpy(1.0, grad2, grad1) (grad1, loss1 + loss2) }) Si

Re: Maximum size of vector that reduce can handle

2015-01-23 Thread DB Tsai
a are small. By default, depth 2 is used, so if you have so many partitions of large vector, this may still cause issue. You can increase the depth into higher numbers such that in the final reduce in driver, the number of partitions are very small. Sincerely, DB

Re: LinearRegressionWithSGD accuracy

2015-01-28 Thread DB Tsai
Hi Robin, You can try this PR out. This has built-in features scaling, and has ElasticNet regularization (L1/L2 mix). This implementation can stably converge to model from R's glmnet package. https://github.com/apache/spark/pull/4259 Sincerely, DB

Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-15 Thread DB Tsai
It's a bug in breeze's side. Once David fixes it and publishes it to maven, we can upgrade to breeze 0.11.2. Please file a jira ticket for this issue. thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Sun, Mar 15, 201

Re: LogisticGradient Design

2015-03-25 Thread DB Tsai
ataset to avoid the second cache. In this case, the code will be more complicated, so I will split the code into two paths. Will be done in another PR. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Mar 25, 2015 at 11:57 AM, Josep

Re: Regularization in MLlib

2015-04-07 Thread DB Tsai
ckage. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Tue, Apr 7, 2015 at 3:03 PM, Ulanov, Alexander wrote: > Hi, > > Could anyone elaborate on the regularization in Spark? I've found that L1 and > L2 are implemented wi

Re: Regularization in MLlib

2015-04-14 Thread DB Tsai
Hi Theodore, I'm currently working on elastic-net regression in ML framework, and I decided not to have any extra layer of abstraction for now but focus on accuracy and performance. We may come out with proper solution later. Any idea is welcome. Sincerely, DB

Re: spark packages

2015-05-23 Thread DB Tsai
I thought LGPL is okay but GPL is not okay for Apache project. On Saturday, May 23, 2015, Patrick Wendell wrote: > Yes - spark packages can include non ASF licenses. > > On Sat, May 23, 2015 at 6:16 PM, Debasish Das > wrote: > > Hi, > > > > Is it possible to add GPL/LGPL code on spark packages

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread DB Tsai
Is your HDP implementation based on distributed gibbs sampling? Thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao wrote: > Hi Lorenz, > > > > I’m trying to build a proto

Re: What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-12 Thread DB Tsai
try to refactor those code to share more.) Sincerely, DB Tsai -- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> On Mon, Oct 12, 2015 at 1:24 AM, YiZhi Liu wrote: >

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread DB Tsai
There is a JIRA for this. I know Holden is interested in this. On Thursday, October 22, 2015, YiZhi Liu wrote: > Would someone mind giving some hint? > > 2015-10-20 15:34 GMT+08:00 YiZhi Liu >: > > Hi all, > > > > I noticed that in ml.classification.LogisticRegression, users are not > > allowed

Re: Spark Implementation of XGBoost

2015-10-26 Thread DB Tsai
Interesting. For feature sub-sampling, is it per-node or per-tree? Do you think you can implement generic GBM and have it merged as part of Spark codebase? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon

Re: Spark Implementation of XGBoost

2015-10-26 Thread DB Tsai
Also, does it support categorical feature? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Oct 26, 2015 at 4:06 PM, DB Tsai wrote: > Interesting. For feature sub-sampling, is it per-node or per-tree?

Re: Spark Implementation of XGBoost

2015-10-27 Thread DB Tsai
tting more than shrinkage). Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu wrote: > Hi DB Tsai, > > Thank you very much for your interest and comment. &

Re: [Spark MLlib] about linear regression issue

2015-11-01 Thread DB Tsai
n to our current linear regression, but currently, there is no open source implementation in Spark. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu wrote: > Dear All, >

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-11-02 Thread DB Tsai
Hi YiZhi, Sure. I think Holden already created a JIRA for this. Please coordinate with Holden, and keep me in the loop. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Nov 2, 2015 at 7:32 AM

  1   2   3   >