Thanks for bringing up the custom metrics API in the list, its something that needs to be addressed.
A couple more items worth considering, 1. Possibility to unify the batch, micro-batch and continuous sources. (similar to SPARK-25000) Right now now there is significant code duplication even between micro-batch v/s continuous sources. Attempt to redesign such that a single implementation could potentially work across modes (by implementing relevant apis). 2. Better framework support for supporting end-end exactly-once in streaming. (maybe framework level support for 2PC). Thanks, Arun On Tue, 30 Oct 2018 at 19:24, Wenchen Fan <cloud0...@gmail.com> wrote: > Hi all, > > I spent some time thinking about the roadmap, and came up with an initial > list: > SPARK-25390: data source V2 API refactoring > SPARK-24252: add catalog support > SPARK-25531: new write APIs for data source v2 > SPARK-25190: better operator pushdown API > Streaming rate control API > Custom metrics API > Migrate existing data sources > Move data source v2 and built-in implementations to individual modules. > > > Let's have more discussion over the hangout. > > Thanks, > Wenchen > > On Tue, Oct 30, 2018 at 4:32 AM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> Everyone, >> >> There are now 25 guests invited, which is a lot of people to actively >> participate in a sync like this. >> >> For those of you who probably won't actively participate, I've added a >> live stream. If you don't plan to talk, please use the live stream instead >> of the meet/hangout so that we don't end up with so many people that we >> can't actually get the discussion going. Here's a link to the stream: >> >> https://stream.meet.google.com/stream/6be59d80-04c7-44dc-9042-4f3b597fc8ba >> >> Thanks! >> >> rb >> >> On Thu, Oct 25, 2018 at 1:09 PM Ryan Blue <rb...@netflix.com> wrote: >> >>> Hi everyone, >>> >>> There's been some great discussion for DataSourceV2 in the last few >>> months, but it has been difficult to resolve some of the discussions and I >>> don't think that we have a very clear roadmap for getting the work done. >>> >>> To coordinate better as a community, I'd like to start a regular sync-up >>> over google hangouts. We use this in the Parquet community to have more >>> effective community discussions about thorny technical issues and to get >>> aligned on an overall roadmap. It is really helpful in that community and I >>> think it would help us get DSv2 done more quickly. >>> >>> Here's how it works: people join the hangout, we go around the list to >>> gather topics, have about an hour-long discussion, and then send a summary >>> of the discussion to the dev list for anyone that couldn't participate. >>> That way we can move topics along, but we keep the broader community in the >>> loop as well for further discussion on the mailing list. >>> >>> I'll volunteer to set up the sync and send invites to anyone that wants >>> to attend. If you're interested, please reply with the email address you'd >>> like to put on the invite list (if there's a way to do this without >>> specific invites, let me know). Also for the first sync, please note what >>> times would work for you so we can try to account for people in different >>> time zones. >>> >>> For the first one, I was thinking some day next week (time TBD by those >>> interested) and starting off with a general roadmap discussion before >>> diving into specific technical topics. >>> >>> Thanks, >>> >>> rb >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >