Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-09 Thread Jungtaek Lim
+1 (non-binding) * verified signatures * built custom distribution with enabling kubernetes & hadoop-cloud profile * built custom docker image from dist * ran applications "rate to kafka" & "kafka to kafka" on k8s cluster (local k3s) Thanks for driving the release

[DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-14 Thread Jungtaek Lim
doc proving that they really spent non-trivial effort already. My point is preempting JIRA issues with only sketched ideas or even just rationalizations.) Would like to hear everyone's voices. Thanks, Jungtaek Lim (HeartSaVioR) ps. better yet, probably it's better then to restrict something explicitly if we sincerely respect the underlying culture on the statement "In case several people contributed, prefer to assign to the more ‘junior’, non-committer contributor".

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-15 Thread Jungtaek Lim
> etc. > It makes me think that the actual issue by setting an assignee happens > rarely, and it is an issue to several specific cases that would need a look > case-by-case. > Were there specific cases that made you concerned? > > > 2021년 2월 15일 (월) 오전 8:58, Jungtaek Lim 님이

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-18 Thread Jungtaek Lim
and you have a proposal, nothing wrong with just going ahead with > a proposal. There may be no disagreement. It might result in the > other person joining your PR. As I say, not sure if there's a deeper issue > than that if even this hasn't been tried? > > On Mon, Feb 15

Re: Please use Jekyll via "bundle exec" from now on

2021-02-18 Thread Jungtaek Lim
Nice fix. Thanks! On Thu, Feb 18, 2021 at 7:13 PM Hyukjin Kwon wrote: > Thanks Attlila for fixing and sharing this. > > 2021년 2월 18일 (목) 오후 6:17, Attila Zsolt Piros 님이 > 작성: > >> Hello everybody, >> >> To pin the exact same version of Jekyll across all the contributors, Ruby >> Bundler is introd

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-22 Thread Jungtaek Lim
+1 (non-binding) Verified signatures. Only a few commits added after RC2 which don't seem to change the SS behavior, so I'd carry over my +1 from RC2. On Mon, Feb 22, 2021 at 3:57 PM Hyukjin Kwon wrote: > Starting with my +1 (binding). > > 2021년 2월 22일 (월) 오후 3:56, Hyukjin Kwon 님이 작성: > >> Plea

Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-02-27 Thread Jungtaek Lim
Thanks Hyukjin! I've only looked into the SS part, and added a comment. Otherwise it looks great! On Sat, Feb 27, 2021 at 7:12 PM Dongjoon Hyun wrote: > Thank you for sharing, Hyukjin! > > Dongjoon. > > On Sat, Feb 27, 2021 at 12:36 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I am preparing to pu

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Jungtaek Lim
Thanks Hyukjin for driving the huge release, and thanks everyone for contributing the release! On Wed, Mar 3, 2021 at 6:54 PM angers zhu wrote: > Great work, Hyukjin ! > > Bests, > Angers > > Wenchen Fan 于2021年3月3日周三 下午5:02写道: > >> Great work and congrats! >> >> On Wed, Mar 3, 2021 at 3:51 PM K

Re: using accumulators in (MicroBatch) InputPartitionReader

2021-03-07 Thread Jungtaek Lim
I'm not sure about the accumulator approach; one possible approach which might work (DISCLAIMER: a random thought) would be employing an RPC endpoint on the driver side which receives such information from executors and plays as a coordinator. Beware that Spark's RPC implementation is package priv

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread Jungtaek Lim
That property decides how many log files (log file is created per batch per type - types are like offsets, commits, etc.) to retain on the checkpoint. Unless you're struggling with a small files problem on checkpoint, you wouldn't need to tune the value. I guess that's why the configuration is mar

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-11 Thread Jungtaek Lim
+1 (non-binding) Excellent description on SPIP doc! Thanks for the amazing effort! On Wed, Mar 10, 2021 at 3:19 AM Liang-Chi Hsieh wrote: > > +1 (non-binding). > > Thanks for the work! > > > Erik Krogen wrote > > +1 from me (non-binding) > > > > On Tue, Mar 9, 2021 at 9:27 AM huaxin gao < > > >

Re: Observable Metrics on Spark Datasets

2021-03-15 Thread Jungtaek Lim
If I remember correctly, the major audience of the "observe" API is Structured Streaming, micro-batch mode. From the example, the abstraction in 2 isn't something working with Structured Streaming. It could be still done with callback, but it remains the question how much complexity is hidden from

Re: Observable Metrics on Spark Datasets

2021-03-16 Thread Jungtaek Lim
n and un-registration happens. I think this qualifies > as: "all the logic happens in the JVM". All that is transferred to Python > is a row's data. No listeners needed. > > Enrico > > > > Am 16.03.21 um 00:13 schrieb Jungtaek Lim: > > If I remember c

Re: Determine global watermark via StreamingQueryProgress eventTime watermark String

2021-03-16 Thread Jungtaek Lim
There was a similar question (but another approach) and I've explained the current status a bit. https://lists.apache.org/thread.html/r89a61a10df71ccac132ce5d50b8fe405635753db7fa2aeb79f82fb77%40%3Cuser.spark.apache.org%3E I guess this would also answer your question as well. At least for now, Spa

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim
I see some points making async checkpoint be tricky to add in micro-batch; one example is "end to end exactly-once", as the commit phase in sink for the batch N can be run "after" the batch N + 1 has been started and write for batch N + 1 can happen before committing batch N. state store checkpoint

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim
e provider ? > > Rohit > > On Mon, Mar 22, 2021 at 4:09 PM Jungtaek Lim > wrote: > >> I see some points making async checkpoint be tricky to add in >> micro-batch; one example is "end to end exactly-once", as the commit phase >> in sink for the batch N can b

Re: Welcoming six new Apache Spark committers

2021-03-26 Thread Jungtaek Lim
Congrats all! 2021년 3월 27일 (토) 오전 6:56, Liang-Chi Hsieh 님이 작성: > Congrats! Welcome! > > > Matei Zaharia wrote > > Hi all, > > > > The Spark PMC recently voted to add several new committers. Please join > me > > in welcoming them to their new role! Our new committers are: > > > > - Maciej Szymkiew

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-13 Thread Jungtaek Lim
+1 (non-binding) signature OK, extracting tgz files OK, build source without running tests OK. On Tue, Apr 13, 2021 at 5:02 PM Herman van Hovell wrote: > +1 > > On Tue, Apr 13, 2021 at 2:40 AM sarutak wrote: > >> +1 (non-binding) >> >> > +1 >> > >> > On Tue, 13 Apr 2021, 02:58 Sean Owen, wrot

Re: [DISCUSS] Add RocksDB StateStore

2021-04-27 Thread Jungtaek Lim
I think adding RocksDB state store to sql/core directly would be OK. Personally I also voted "either way is fine with me" against RocksDB state store implementation in Spark ecosystem. The overall stance hasn't changed, but I'd like to point out that the risk becomes quite lower than before, given

Re: [ANNOUNCE] Apache Spark 2.4.8 released

2021-05-18 Thread Jungtaek Lim
Thanks for the huge efforts on driving the release! On Tue, May 18, 2021 at 4:53 PM Wenchen Fan wrote: > Thank you, Liang-Chi! > > On Tue, May 18, 2021 at 1:32 PM Dongjoon Hyun > wrote: > >> Finally! Thank you, Liang-Chi. >> >> Bests, >> Dongjoon. >> >> >> On Mon, May 17, 2021 at 10:14 PM Takes

Re: Apache Spark 3.1.2 Release?

2021-05-18 Thread Jungtaek Lim
Late +1 here as well, thanks for volunteering! 2021년 5월 19일 (수) 오전 11:24, 郑瑞峰 님이 작성: > late +1. thanks Dongjoon! > > > -- 原始邮件 -- > *发件人:* "Dongjoon Hyun" ; > *发送时间:* 2021年5月19日(星期三) 凌晨1:29 > *收件人:* "Wenchen Fan"; > *抄送:* "Xiao Li";"Kent Yao";"John > Zhuge";"Hyukji

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-01 Thread Jungtaek Lim
Nice! Thanks Dongjoon for your amazing efforts! On Wed, Jun 2, 2021 at 2:59 PM Liang-Chi Hsieh wrote: > Thank you, Dongjoon! > > > > Takeshi Yamamuro wrote > > Thank you, Dongjoon! > > > > On Wed, Jun 2, 2021 at 2:29 PM Xiao Li < > > > lixiao@ > > > > wrote: > > > >> Thank you! > >> > >> Xiao >

Re: Apache Spark 3.0.3 Release?

2021-06-09 Thread Jungtaek Lim
Late +1 Thanks! On Thu, Jun 10, 2021 at 12:06 PM Yi Wu wrote: > Thanks all, I'll start the RC soon. > > On Wed, Jun 9, 2021 at 7:07 PM Gengliang Wang wrote: > >> +1, thanks Yi >> >> Gengliang Wang >> >> >> >> >> On Jun 9, 2021, at 6:03 PM, 郑瑞峰 wrote: >> >> +1, thanks Yi >> >> >>

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-20 Thread Jungtaek Lim
+1 (non-binding) Thanks for your efforts! On Mon, Jun 21, 2021 at 2:40 PM Kent Yao wrote: > +1 (non-binding) > > *Kent Yao * > @ Data Science Center, Hangzhou Research Institute, NetEase Corp. > *a spark enthusiast* > *kyuubi is a unified multi-tenant JDBC > i

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread Jungtaek Lim
Meta question: this doesn't target Spark 3.2, right? Many folks have been working on branch cut for Spark 3.2, so might be less active to jump in new feature proposals right now. On Fri, Jun 25, 2021 at 9:00 AM Holden Karau wrote: > I took an initial look at the PRs this morning and I’ll go thro

Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Jungtaek Lim
Thanks to Gengliang for driving this huge release! On Wed, Oct 20, 2021 at 1:50 AM Dongjoon Hyun wrote: > Thank you so much, Gengliang and all! > > Dongjoon. > > On Tue, Oct 19, 2021 at 8:48 AM Xiao Li wrote: > >> Thank you, Gengliang! >> >> Congrats to our community and all the contributors! >

Re: Update Spark 3.3 release window?

2021-10-28 Thread Jungtaek Lim
+1 for mid-March 2022. +1 for EOL 2.x as well. I guess we did it already according to Dongjoon's quote from the Spark website. On Fri, Oct 29, 2021 at 3:49 AM Dongjoon Hyun wrote: > +1 for mid March for Spark 3.3. > > For 2.4, our document already mentioned its EOL like > > " For example, 2.4.0

Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Jungtaek Lim
Thanks for all the hard work you have been doing, Shane! On Tue, Dec 7, 2021 at 2:17 PM Nick Pentreath wrote: > Wow! end of an era > > Thanks so much to you Shane for all you work over 10 (!!) years. And to > Amplab also! > > Farewell Spark Jenkins! > > N > > On Tue, Dec 7, 2021 at 6:49 AM Nicho

Re: Time for Spark 3.2.1?

2021-12-07 Thread Jungtaek Lim
+1 for both releases and the time! On Wed, Dec 8, 2021 at 3:46 PM Mridul Muralidharan wrote: > > +1 for maintenance release, and also +1 for doing this in Jan ! > > Thanks, > Mridul > > On Tue, Dec 7, 2021 at 11:41 PM Gengliang Wang wrote: > >> +1 for new maintenance releases for all 3.x branch

[Proposal] Deprecate Trigger.Once and replace with Trigger.AvailableNow

2021-12-07 Thread Jungtaek Lim
Trigger.AvailableNow in migration guide - Replace all usages of Trigger.Once with Trigger.AvailableNow, except the test cases of Trigger.Once itself Please review the proposal and share your voice on this. Thanks! Jungtaek Lim 1. https://issues.apache.org/jira/browse/SPARK-36533

Re: [Proposal] Deprecate Trigger.Once and replace with Trigger.AvailableNow

2021-12-12 Thread Jungtaek Lim
Friendly reminder. I'll submit the proposed change if there is no objection observed this week. On Wed, Dec 8, 2021 at 4:16 PM Jungtaek Lim wrote: > Hi dev, > > I would like to hear voices about deprecating Trigger.Once, and replacing > it with Trigger.AvailableNow [1] in Str

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Jungtaek Lim
If ASF wants to do it, INFRA could probably deal with it for entire projects, like ASF code of conduct being exposed to the right side of the all ASF github repos recently. On Wed, Dec 15, 2021 at 11:49 PM Sean Owen wrote: > It might imply that this is a way to fund Spark alone, and it isn't. >

Re: Apache Spark 3.3 Release

2022-03-03 Thread Jungtaek Lim
Thanks Maxim for volunteering to drive the release! I support the plan (March 15th) to perform a release branch cut. Btw, would we be open for modification of critical/blocker issues after the release branch cut? I have a blocker JIRA ticket and the PR is open for reviewing, but need some time to

Re: bazel and external/

2022-03-17 Thread Jungtaek Lim
We seem to just focus on how to avoid the conflict with the name "external" used in bazel. Since we consider the possibility of renaming, why not revisit the modules "external" contains? Looks like kinds of the modules external directory contains are 1) Docker 2) Connectors 3) Sink on Dropwizard m

Re: bazel and external/

2022-03-17 Thread Jungtaek Lim
re top level dirs. > > On Thu, Mar 17, 2022 at 7:33 PM Jungtaek Lim > wrote: > >> We seem to just focus on how to avoid the conflict with the name >> "external" used in bazel. Since we consider the possibility of renaming, >> why not revisit the modules "exter

[DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-18 Thread Jungtaek Lim
back. 4. Do not care. End users can indicate the upgrade in the release note, and we expect end users to actively check the notable changes (& KIPs) from Kafka doc. 5. Options not described above... Please take a look and provide your voice on this. Thanks, Jungtaek Lim (HeartSaVioR) p

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-18 Thread Jungtaek Lim
CORRECTION: in option 2, we enumerate KIPs which may bring incompatibility with older brokers (not all KIPs). On Fri, Mar 18, 2022 at 7:12 PM Jungtaek Lim wrote: > Hi dev, > > I would like to initiate the discussion about how to deal with the > migration guide on upgrading Kafka

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-18 Thread Jungtaek Lim
tween releases because >> they've told what's important to check :) >> >> Seems like my Kafka Spark compatibility gist is out-of-date so maybe I >> need to invest some time to resurrect it: >> https://gist.github.com/gaborgsomogyi/3476c32d69ff2087ed5d7d031653c7a9 >&

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-18 Thread Jungtaek Lim
that would affect > Kafka usage itself; focus on the connector-related issues. > > On Fri, Mar 18, 2022 at 5:15 AM Jungtaek Lim > wrote: > >> CORRECTION: in option 2, we enumerate KIPs which may bring >> incompatibility with older brokers (not all KIPs). >&g

Re: bazel and external/

2022-03-22 Thread Jungtaek Lim
ate steps. If there is consensus that connectors will move out, should >>>> the directory be named misc for everything else until there is some >>>> direction for the remaining modules? >>>> >>>> On Fri, 18 Mar 2022 at 03:03 Jungtaek Lim >>>> wrote: &g

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-23 Thread Jungtaek Lim
Bump to try gathering more voices before taking action. For now, I see two voices as option 2 & 5 (similar to option 2 but not in the migration node but in the release note). On Fri, Mar 18, 2022 at 7:15 PM Jungtaek Lim wrote: > CORRECTION: in option 2, we enumerate KIPs which ma

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-23 Thread Jungtaek Lim
ons (does it require a server-side > update or not?), and document the change itself for sure along with any > Spark-side migration notes. > > On Fri, Mar 18, 2022 at 8:47 PM Jungtaek Lim > wrote: > >> The thing is, it is “us” who upgrades Kafka client and makes possible >&g

Re: SIGMOD System Award for Apache Spark

2022-05-12 Thread Jungtaek Lim
Congrats Spark community! On Fri, May 13, 2022 at 10:40 AM Qian Sun wrote: > Congratulations !!! > > 2022年5月13日 上午3:44,Matei Zaharia 写道: > > Hi all, > > We recently found out that Apache Spark received > the SIGMOD System Award this > year, given by SIGM

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Jungtaek Lim
Apologize for late participation. I'm sorry, but -1 (non-binding) from me. Unfortunately I found a major user-facing issue which hurts UX seriously on Kafka data source usage. In some cases, Kafka data source can throw IllegalStateException for the case of failOnDataLoss=true which condition is

Re: 回复: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Jungtaek Lim
+1 (non-binding) Checked signature and checksum. Confirmed SPARK-39412 is resolved. Built source tgz with JDK 11. Thanks Max for driving the efforts of this huge release! On Tue, Jun 14, 2022 at 2:51 PM huaxin gao wrote: > +1 (non-binding) >

Observed consistent test failure in master (ParquetIOSuite)

2022-06-27 Thread Jungtaek Lim
ng context looks into this sooner. Thanks! Jungtaek Lim (HeartSaVioR)

[DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-06 Thread Jungtaek Lim
n the next day. Thanks to the behavior of Trigger.AvailableNow, it handles no-data batch as well before termination of the query. Please review and let us know if you have any feedback or concerns on the proposal. Thanks! Jungtaek Lim 1. https://issues.apache.org/jira/browse/SPARK-36533

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Jungtaek Lim
ove forward if there are no outstanding objections. On Wed, Jul 6, 2022 at 8:46 PM Jungtaek Lim wrote: > Hi dev, > > I would like to hear voices about deprecating Trigger.Once, and promoting > Trigger.AvailableNow as a replacement [1] in Structured Streaming. > (It doesn't

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Jungtaek Lim
orkaround. Backfill may warrant its own design to deal with.) > > Adam > > On Fri, Jul 8, 2022 at 3:24 AM Jungtaek Lim > wrote: > >> Bump to get a chance to expose the proposal to wider audiences. >> >> Given that there are not many active contributors/maint

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-09 Thread Jungtaek Lim
e our batches are processed > in the correct event time order when starting from scratch. > > I'm not against deprecating Trigger.Once, just wanted to chime in that > someone was using it! I'm itching to upgrade and try out the new stuff. > > Adam > > On Fri, Jul 8

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-11 Thread Jungtaek Lim
Final reminder. I'll leave this thread for a couple of days to see further voices, and go forward if there is no outstanding comment. On Sat, Jul 9, 2022 at 9:54 PM Jungtaek Lim wrote: > It sounds like none of the approaches perfectly solve the issue of > backfill. > > 1. Tr

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-18 Thread Jungtaek Lim
g all available data in a single microbatch. While this can behave the same with Trigger.Once on processing new available data (watermark advancement happens after processing all the data), this can also handle previous uncommitted batch(es) as well as no-data batch. On Tue, Jul 12, 2022 at 9:43 AM Jungtae

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Jungtaek Lim
Congrats Xinrong! Well deserved. 2022년 8월 9일 (화) 오후 5:13, Hyukjin Kwon 님이 작성: > Hi all, > > The Spark PMC recently added Xinrong Meng as a committer on the project. > Xinrong is the major contributor of PySpark especially Pandas API on Spark. > She has guided a lot of new contributors enthusiasti

Re: Welcoming three new PMC members

2022-08-09 Thread Jungtaek Lim
Congrats everyone! On Wed, Aug 10, 2022 at 8:57 AM Hyukjin Kwon wrote: > Congrats everybody! > > On Wed, 10 Aug 2022 at 05:50, Mridul Muralidharan > wrote: > >> >> Congratulations ! >> Great to have you join the PMC !! >> >> Regards, >> Mridul >> >> On Tue, Aug 9, 2022 at 11:57 AM vaquar khan

Re: [Structured Streaming + Kafka] Reduced support for alternative offset management

2022-09-01 Thread Jungtaek Lim
: https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer Hope this helps. Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Aug 30, 2022 at 5:05 PM Martin Andersson wrote: > I was looking around for some documentation regarding how checkpointing > (or rather, delivery semantics) is don

Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Jungtaek Lim
+1 On Thu, Oct 6, 2022 at 5:59 AM Chao Sun wrote: > +1 > > > and specifically may allow us to finally move off of the ancient version > of Guava (?) > > I think the Guava issue comes from Hive 2.3 dependency, not Hadoop. > > On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng > wrote: > >> +1. >> >> On

Re: Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Jungtaek Lim
Congrats! 2022년 10월 8일 (토) 오후 3:24, huaxin gao 님이 작성: > Congratulations! > > On Fri, Oct 7, 2022 at 11:22 PM Yang,Jie(INF) wrote: > >> Congratulations Yikun! >> >> Regards, >> Yang Jie >> -- >> *发件人:* Mridul Muralidharan >> *发送时间:* 2022年10月8日 14:16:02 >> *收件人:* Yumin

[DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-12 Thread Jungtaek Lim
e would be introduced inevitably (they can set topic based ACL rule), but most people will get benefited. IMHO this is something we can deal with release/migration note. Would like to hear the voices on this. Thanks, Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-16 Thread Jungtaek Lim
t;> >> BR, >> G >> >> >> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Hi all, >>> >>> I would like to propose flipping the default value of Kafka offset >>> fetching c

Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-18 Thread Jungtaek Lim
No further voice so far. I'm going to submit a PR. Thanks again for the feedback! On Mon, Oct 17, 2022 at 9:30 AM Jungtaek Lim wrote: > Thanks Gabor and Dongjoon for supporting this! > > Bump to reach more eyes. If there is no further voice on this in a couple > of days, I&#x

Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Jungtaek Lim
+1 Nice to see the chance for driver to reduce resource usage and increase stability, especially the fact that the driver is SPOF. It's even promising to have a future plan to pre-bake the kvstore for SHS from the driver. Thanks for driving the effort, Gengliang! On Thu, Nov 17, 2022 at 5:32 AM

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-22 Thread Jungtaek Lim
Just FYI, I'm shepherding this SPIP project. I think the major meta question would be, "why don't we spend effort on continuous mode rather than initiating another feature aiming for the same workload?". Jerry already updated the doc to answer the question, but I can also share my thoughts about i

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-23 Thread Jungtaek Lim
te: > >> Jungtaek, >> >> Thanks for taking up the role to shepard this SPIP! Thank you for also >> chiming in on your thoughts concerning the continuous mode! >> >> Best, >> >> Jerry >> >> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Jungtaek Lim
Thanks Chao for driving the release! On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan wrote: > Thanks, Chao! > > On Wed, Nov 30, 2022 at 1:33 AM Chao Sun wrote: > >> We are happy to announce the availability of Apache Spark 3.2.3! >> >> Spark 3.2.3 is a maintenance release containing stability fixes

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
;>> may serve as the "future" engine powering Spark Streaming. Improving the >>>>> "current" engine does not mean we cannot work on a "future" engine. These >>>>> two are not mutually exclusive. I would like to focus the discussion o

[VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
ts.apache.org/thread/yv8ffr56prjr16qh12lwjyjl1q8dl7lp> Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … Thanks! Jungtaek Lim (HeartSaVioR)

Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
Starting with +1 from me. On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: Asynchronous Offset Management in > Structured Streaming. > > The high level summary of the SPIP is that we propose a couple of > improvemen

[VOTE][RESULT][SPIP] Asynchronous Offset Management in Structured Streaming

2022-12-04 Thread Jungtaek Lim
The vote passes with 7 +1s (5 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Xingbo Jiang - Mridul Muralidharan (*) - Hyukjin Kwon (*) - Shixiong Zhu (*) - Wenchen Fan (*) - Dongjoon Hyun (*) +0: None -1: None Thanks, Jungtaek Lim

[DISCUSS] Deprecate DStream in 3.4

2023-01-10 Thread Jungtaek Lim
ic API. I don't intend to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. What do you think? Thanks, Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
bump for more visibility. On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim wrote: > Hi dev, > > I'd like to propose the deprecation of DStream in Spark 3.4, in favor of > promoting Structured Streaming. > (Sorry for the late proposal, if we don't make the change in 3.4, w

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
, Jan 12, 2023 at 5:08 PM Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> +1 >>> >>> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon >>> wrote: >>> >>>> +1 >>>> >>>> On Fri, 13 Jan 2

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
again that they are encouraged to move to SS.) This is not an action item from the proposal: - Add (tentative) target version to remove the API on the deprecation message. Hope this makes the proposal crystally clear. On Fri, Jan 13, 2023 at 3:05 PM Jungtaek Lim wrote: > Maybe I need to

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
I? > > > On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Maybe I need to clarify - my proposal is "explicitly" deprecating it, >> which incurs code change for sure. Guidance on the Spark website is done >>

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-15 Thread Jungtaek Lim
precated > > Jungtaek, can you please provide / elaborate on the concrete actions you > intend on taking for the depreciation process? > > Best, > > Jerry > > On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: > >> +1 >> >> On Thu, Jan 12, 2023 at 10:3

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-15 Thread Jungtaek Lim
n Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim >>> wrote: >>> > >>> > Yes, exactly. I'm sorry to bring confusion - should have clarified >>> act

Re: Time for Spark 3.4.0 release?

2023-01-17 Thread Jungtaek Lim
+1 on delaying. I see there’s a JIRA ticket about DStream depreciation, we are working on this - thanks for taking this into account! 2023년 1월 18일 (수) 오후 12:43, Hyukjin Kwon 님이 작성: > +1. Thanks for driving this, Xinrong. > > On Wed, 18 Jan 2023 at 12:31, Xinrong Meng > wrote: > >> Hi All, >> >>

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-20 Thread Jungtaek Lim
26 AM Jungtaek Lim wrote: > Given that I got more than 3 PMC members' positive votes as well as > several active contributors' positive votes as well, I will proceed with > the actual work. > (It may take a couple of more days as folk in US will help me and there's > a hol

Re: Slack for PySpark users

2023-03-30 Thread Jungtaek Lim
I'm reading through the page "Briefing: The Apache Way", and in the section of "Open Communications", restriction of communication inside ASF INFRA (mailing list) is more about code and decision-making. https://www.apache.org/theapacheway/#what-makes-the-apache-way-so-hard-to-define It's unavoidab

Re: Slack for PySpark users

2023-04-03 Thread Jungtaek Lim
fficial Slack channel has 602 subscribers. > > May I ask if the users prefer to use the ASF Official Slack channel > than the user mailing list? > > Dongjoon. > > > > On Thu, Mar 30, 2023 at 9:10 PM Jungtaek Lim > wrote: > >> I'm reading through the pa

Re: Slack for PySpark users

2023-04-03 Thread Jungtaek Lim
there. On Tue, Apr 4, 2023 at 7:04 AM Jungtaek Lim wrote: > The number of subscribers doesn't give any meaningful value. Please look > into the number of mails being sent to the list. > > https://lists.apache.org/list.html?u...@spark.apache.org > The latest month there were m

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving the release! On Wed, Apr 12, 2023 at 10:42 AM Ye Zhou wrote: > +1 non-binding > > On Tue, Apr 11, 2023 at 18:40 Ye Zhou wrote: > >> Yes, it is not a regression issue. We can fix it after the release. >> >> Thanks >> Ye >> >> On Tue, Apr 11, 2023 at 17:42 Xia

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving the release! On Wed, Apr 12, 2023 at 3:41 AM Xinrong Meng wrote: > +1 non-binding > > Thank you Doogjoon! > > Wenchen Fan 于2023年4月10日 周一下午11:32写道: > >> +1 >> >> On Tue, Apr 11, 2023 at 10:09 AM Hyukjin Kwon >> wrote: >> >>> +1 >>> >>> On Tue, 11 Apr 2023 at

Re: Parametrisable output metadata path

2023-04-15 Thread Jungtaek Lim
metadata directory, so someone might find it useful. For end-to-end exactly once, people can either use a limited current FileStream sink or use Data Lake products. I don't see the value in making improvements to the current FileStream sink. Thanks, Jungtaek Lim (HeartSaVioR) On Sun, Apr

Re: Parametrisable output metadata path

2023-04-17 Thread Jungtaek Lim
:37 PM Wojciech Indyk wrote: > Hi Jungtaek, > integration with Delta Lake is not an option to me, I raised a PR for > improvement of FileStreamSink with the new parameter: > https://github.com/apache/spark/pull/40821. Can you please take a look? > > -- > Kind regards/ Pozdraw

Re: Parametrisable output metadata path

2023-04-17 Thread Jungtaek Lim
small correction: "I intentionally didn't enumerate." The meaning could be quite different so making a small correction. On Tue, Apr 18, 2023 at 5:38 AM Jungtaek Lim wrote: > There seems to be miscommunication - I didn't mean "Delta Lake". I meant > "an

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Jungtaek Lim
Shall we initiate a new discussion thread for Scala 2.13 by default? While I'm not an expert on this area, it sounds like the change is major and (probably) breaking. It seems to be worth having a separate discussion thread rather than just treat it like one of 25 items. On Tue, May 30, 2023 at 9:

Re: JDK version support policy?

2023-06-07 Thread Jungtaek Lim
+1 to drop Java 8 but +1 to set the lowest support version to Java 11. Considering the phase for only security updates, 11 LTS would not be EOLed in very long time. Unless that’s coupled with other deps which require bumping JDK version (hope someone can bring up lists), it doesn’t seem to buy muc

Re: ASF policy violation and Scala version issues

2023-06-11 Thread Jungtaek Lim
Are we concerned that a library does not release a new version which bumps the Scala version, which the Scala version is announced in less than a week? Shall we respect the efforts of all maintainers of open source projects we use as dependencies, regardless whether they are ASF projects or individ

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jungtaek Lim
I concur with Holden and Mridul. Let's build a plan before we call the tentative deadline. I understand setting the tentative deadline would definitely help in pushing back features which "never ever ends", but at least we may want to list up features and discuss for priority. It is still possible

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jungtaek Lim
+1 On Wed, Jul 5, 2023 at 2:23 AM L. C. Hsieh wrote: > +1 > > Thanks Yuanjian. > > On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: > > > > +1 > > > > > > > > 发件人: Maxim Gekk > > 日期: 2023年7月4日 星期二 17:24 > > 收件人: Kent Yao > > 抄送: "dev@spark.apache.org" > > 主题: Re: Time for Spark v3.5.0 release

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Jungtaek Lim
+1 On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin wrote: > +1! > > > On Fri, Jul 7 2023 at 11:58 AM, Holden Karau > wrote: > >> +1 >> >> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Jungtaek Lim
's not covering streaming. On Tue, Jul 11, 2023 at 8:35 AM Matei Zaharia wrote: > +1 > > On Jul 10, 2023, at 10:19 AM, Takuya UESHIN > wrote: > > +1 > > On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng wrote: > >> +1 >> >> On Mon, Jul 10, 2023

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Jungtaek Lim
Congrats Peter and Xiduo! On Mon, Aug 7, 2023 at 11:33 AM yangjie01 wrote: > Congratulations, Peter and Xiduo ~ > > > > *发件人**: *Hyukjin Kwon > *日期**: *2023年8月7日 星期一 10:30 > *收件人**: *Ruifeng Zheng > *抄送**: *Xiao Li , Debasish Das < > debasish.da...@gmail.com>, Wenchen Fan , Spark dev > list >

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-01 Thread Jungtaek Lim
ssle, I'd like to see this be fixed in 3.5.0. There is no fix yet but I'm working on it. I'll give an update here. Maybe we could lower down priority and let the release go with describing this as a "known issue", if I couldn't make progress in a couple of days. I&#x

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving this release! On Fri, Sep 8, 2023 at 11:29 AM Holden Karau wrote: > +1 pip installing seems to function :) > > On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > >> +1. >> >> On Thu, Sep 7, 2023 at 10:33 PM yangjie01 >> wrote: >> >>> +1 >>> >>> >>> >>> *发件

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving this release and the patience on multiple RCs! On Tue, Sep 12, 2023 at 10:00 AM Yuanjian Li wrote: > +1 (non-binding) > > Yuanjian Li 于2023年9月11日周一 09:36写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a r

[DISCUSS] Porting back SPARK-45178 to 3.5/3.4 version lines

2023-09-20 Thread Jungtaek Lim
correctness, or leave these version lines as they are. Looking for voices on this. Thanks in advance! Jungtaek Lim (HeartSaVioR)

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Jungtaek Lim
Congrats! 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: > Congratulations! > > > > Jie Yang > > > > *发件人**: *Dongjoon Hyun > *日期**: *2023年10月4日 星期三 13:04 > *收件人**: *Hyukjin Kwon > *抄送**: *Hussein Awala , Rui Wang , > Gengliang Wang , Xiao Li , " > dev@spark.apache.org" > *主题**: *Re: Welcome to Ou

Re: Watermark on late data only

2023-10-09 Thread Jungtaek Lim
Technically speaking, "late data" represents the data which cannot be processed due to the fact the engine threw out the state associated with the data already. That said, the only reason watermark does exist for streaming is to handle stateful operators. From the engine's point of view, there is

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
datafarame).writeTo("on_time_data") > }) > > A little bit as you can do with Apache Flink in fact: > > https://github.com/immerok/recipes/blob/main/late-data-to-sink/src/main/java/com/immerok/cookbook/LateDataToSeparateSink.java#L81 > > WDYT? > > Best, > B

  1   2   3   4   5   6   >