Re: Handling user-facing metadata issues on file stream source & sink

2020-06-25 Thread Jungtaek Lim
Bump + adding one more issue I fixed (and by chance there's relevant report in user mailing list recently) * [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue [1] The patch stabilizes the driver's memory usage on utilizing a huge metadata log, which was t

Re: Handling user-facing metadata issues on file stream source & sink

2020-06-14 Thread Jungtaek Lim
Bump again - hope to get some traction because these issues are either long-standing problems or noticeable improvements (each PR has numbers/UI graph to show the improvement). Fixed long-standing problems: * [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input file

Re: Handling user-facing metadata issues on file stream source & sink

2020-05-21 Thread Jungtaek Lim
Worth noting that I got similar question around local community as well. These reporters didn't encounter the edge-case, they're encountered the critical issue in the normal running of streaming query. On Fri, May 8, 2020 at 4:49 PM Jungtaek Lim wrote: > (bump to expose the discussion to more re

Re: Handling user-facing metadata issues on file stream source & sink

2020-05-08 Thread Jungtaek Lim
(bump to expose the discussion to more readers) On Mon, May 4, 2020 at 5:45 PM Jungtaek Lim wrote: > Hi devs, > > I'm seeing more and more structured streaming end users encountered the > metadata issues on file stream source and sink. They have been known-issues > and there're even long-standin

Handling user-facing metadata issues on file stream source & sink

2020-05-04 Thread Jungtaek Lim
Hi devs, I'm seeing more and more structured streaming end users encountered the metadata issues on file stream source and sink. They have been known-issues and there're even long-standing JIRA issues reported before, end users report them again in user@ mailing list in April. * Spark Structure S