+1. And I leave a comment in the docs about the Hadoop client improvement, which should also benefit running Spark on the laptop.
Thanks, Cheng Pan > On May 6, 2026, at 15:01, John Zhuge <[email protected]> wrote: > > +1 worthwhile to lower Spark small-data overhead > > On Mon, May 4, 2026 at 11:47 PM Ángel Álvarez Pascua > <[email protected] <mailto:[email protected]>> > wrote: >> Love it. Please, count on me if any help is needed. >> >> El mar, 5 may 2026, 7:31, DB Tsai <[email protected] >> <mailto:[email protected]>> escribió: >>> Thanks Daniel and Liang-Chi for driving this. This is an exciting proposal >>> that can significantly speed up local experimentation and development on >>> laptops. It also helps make Spark a great fit for both big-data workloads >>> and small-data exploratory workflows. >>> >>> DB Tsai | https://www.dbtsai.com/ | PGP 0x9FB9FAA3 >>> >>> On Monday, May 4th, 2026 at 3:39 PM, Daniel Tenedorio >>> <[email protected] <mailto:[email protected]>> wrote: >>>> Hi Spark community, >>>> >>>> We’d like to propose a new SPIP to improve the experience of running >>>> Apache Spark on laptops. >>>> >>>> SPIP doc: >>>> >>>> https://docs.google.com/document/d/1Nphejrf_vh4YRECn0JPgKClqxDS_lB6wufZFJQxyY98/edit?tab=t.0#heading=h.hj76akdx5ul >>>> >>>> Summary: >>>> >>>> Spark’s execution model is optimized for distributed workloads, but this >>>> introduces noticeable overhead for small datasets (e.g., <100MB), where >>>> even simple queries can take multiple seconds. This makes Spark less >>>> suitable for interactive and exploratory use cases on laptops, and often >>>> pushes users toward alternative single-node tools. >>>> >>>> This proposal aims to reduce that overhead in local mode, improving >>>> latency for small queries and making Spark more usable as an entry point >>>> for new users and iterative workflows. >>>> >>>> We’d appreciate your review and feedback. >>>> >>>> Thanks, >>>> Daniel Tenedorio and Liang-Chi Hsieh >>>> >>> > > > > -- > John Zhuge
