It's worth noting that the push-based shuffle SPIP currently in progress addresses a substantial blocker in the area. If you remember when we removed the half-finished stateful query support, the lack of that functionality and the challenge of implementing it is basically why it was half-finished. I can't make a hard commitment, but I do plan to take a look at how easy it would be to build continuous shuffle support on top of the SPIP once it's in, and continuous mode is gonna be a lot more useful if most (all?) queries can run using it.
On Tue, Sep 15, 2020 at 6:37 AM Sean Owen <sro...@gmail.com> wrote: > I think we certainly can't remove it without deprecation and a few > releases. If there were big problems with it that weren't getting > fixed, sure maybe, but lack of interest in reviewing minor changes > isn't necessarily a bad sign. By the same logic you'd delete graphx > long ago. > > Anecdotally, yes there are people using it that I know of at least, > but I wouldn't know a lot of them. > I think the question is, is it causing a problem, like a lot of > maintenance? doesn't sound like it. > > On Tue, Sep 15, 2020 at 8:19 AM Jungtaek Lim > <kabhwan.opensou...@gmail.com> wrote: > > > > Probably it would depend on the meaning of "experimental". My > understanding of "experimental" is more likely "incubation", which may be > graduated finally, or may be retired. > > > > To be clear, I'm evaluating the continuous mode as "candidate to > retire", unless there are actual use cases in production and at least a > couple of community members volunteer to maintain it. As far as I see the > activity in a year, there's no interest for the continuous mode in > community members. I can refer to at least three PRs which suffered to find > reviewers (around 1 year) and closed on inactivity. No improvements/bug > fixes except trivials. It doesn't seem to get some traction - few questions > in SO, a few posts in google search results which were all posted around > the date when continuous mode was introduced. Though I would be convinced > if someone could provide meaningful numbers of actual use cases. > > > > If the answer really has to be taken between un-experimental or not > (which says retirement is not an option), I'd rather vote to leave as > experimental, so I just keep forgetting about it. Actually it bothers > sometimes even if the change is done in micro-batch side (so that's not a > zero cost to maintain), but still better than officially supporting it. > > > > > > On Tue, Sep 15, 2020 at 9:08 PM Sean Owen <sro...@gmail.com> wrote: > >> > >> If you're suggesting making it un-Experimental, probably yes, as it is > >> de facto not going to change much I expect. > >> If you're saying remove it, probably not? I don't see that it's > >> anywhere near deprecated, and not sure it's unmaintained - obviously > >> tests etc still have to keep passing. > >> > >> On Mon, Sep 14, 2020 at 11:34 PM Jungtaek Lim > >> <kabhwan.opensou...@gmail.com> wrote: > >> > > >> > Hi devs, > >> > > >> > It was Spark 2.3 in Feb 2018 which introduced continuous mode in > Structured Streaming as "experimental". > >> > > >> > Now we are here at 2.5 years after its release - I feel it would be a > good time to evaluate the mode, whether the mode has been widely used or > not, and the mode has been making progress, as the mode is "experimental". > >> > > >> > At least from the surface I don't see any active effort for > continuous mode around the community - the last major effort was stateful > operation which was incomplete and I removed that. There were some couples > of bug reports as well as fixes more than a year ago and almost nothing has > been handled. (A trivial bugfix PR has been merged recently but that's > all.) The new features introduced to the Structured Streaming (at least > observable metrics, SS UI) don't apply to continuous mode, and no one made > "support continuous mode" as a hard requirement on passing review in these > PRs. > >> > > >> > I have no idea how many companies are using the mode in production > (please add the voice if someone has statistics about this) but I don't see > any bug reports recently, and see only a few questions in SO, which makes > me think about cost on maintenance. > >> > > >> > I know there's a mood to avoid discontinue support as possible, but > it sounds weird to keep something as "unmaintained", especially it's still > "experimental" and main authors are no more active enough to promise > maintenance/improvement on the module. Thoughts? > >> > > >> > Thanks, > >> > Jungtaek Lim (HeartSaVioR) > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >