Re: Code formatting tech debt

2025-03-17 Thread Rob Reeves
en at that time realized, that target was run only for some specific > modules ( I think connect..). > > Regards > > Asif > > On Fri, Mar 14, 2025 at 9:46 PM Rob Reeves > wrote: > >> Hi Spark devs, >> >> There seems to be a lot of code formatting tech de

Code formatting tech debt

2025-03-15 Thread Rob Reeves
Hi Spark devs, There seems to be a lot of code formatting tech debt. When I run "./dev/scalafmt" on the master branch it makes formatting changes on thousands of files. Is that expected or am I doing something wrong? If these files should be formatted we could add a formatting check to the PR to p

Re: Contribution to Spark SQL: new data type TIME

2025-03-06 Thread Rob Reeves
Hi Max, I'll work on Add the make_time() function . Thanks, Rob On Thu, Mar 6, 2025 at 3:16 AM Max Gekk wrote: > Hi Spark devs, > > I would like to invite you to develop the new data type TIME in Spark > SQL. At the moment, there are > 10 sub-

Re: Proposal to improve data skew debugging

2025-01-27 Thread Rob Reeves
g based? Likely Count-Min Sketch > algorithm! > > HTH > > Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > &g

Proposal to improve data skew debugging

2025-01-24 Thread Rob Reeves
Hi Spark devs, I recently worked on a prototype to make it easier to identify the root cause of data skew in Spark. I wanted to see if the community was interested in it before working on contributing the changes (SPIP and PRs). *Problem* When a query has data skew today, you see outlier tasks ta