I'm currently working with MemVerge on the Splash project (one implementation of remote shuffle storage) and followed this ticket for a while. I would like to be a shepherd if no one else volunteered to be.
Best regards, Saisai Matt Cheah <mch...@palantir.com> 于2019年6月6日周四 上午8:33写道: > Hi everyone, > > > > I wanted to pick this back up again. The discussion has quieted down both > on this thread and on the document. > > > > We made a few revisions to the document to hopefully make it easier to > read and to clarify our criteria for success in the project. Some of the > APIs have also been adjusted based on further discussion and things we’ve > learned. > > > > I was hoping to discuss what our next steps could be here. Specifically, > > 1. Would any PMC be willing to become the shepherd for this SPIP? > 2. Is there any more feedback regarding this proposal? > 3. What would we need to do to take this to a voting phase and to > begin proposing our work against upstream Spark? > > > > Thanks, > > > > -Matt Cheah > > > > *From: *"Yifei Huang (PD)" <yif...@palantir.com> > *Date: *Monday, May 13, 2019 at 1:04 PM > *To: *Mridul Muralidharan <mri...@gmail.com> > *Cc: *Bo Yang <b...@uber.com>, Ilan Filonenko <i...@cornell.edu>, Imran > Rashid <iras...@cloudera.com>, Justin Uang <ju...@palantir.com>, Liang > Tang <lat...@linkedin.com>, Marcelo Vanzin <van...@cloudera.com>, Matei > Zaharia <matei.zaha...@gmail.com>, Matt Cheah <mch...@palantir.com>, Min > Shen <ms...@linkedin.com>, Reynold Xin <r...@databricks.com>, Ryan Blue < > rb...@netflix.com>, Vinoo Ganesh <vgan...@palantir.com>, Will Manning < > wmann...@palantir.com>, "b...@fb.com" <b...@fb.com>, "dev@spark.apache.org" > <dev@spark.apache.org>, "fel...@uber.com" <fel...@uber.com>, " > f...@linkedin.com" <f...@linkedin.com>, "tgraves...@gmail.com" < > tgraves...@gmail.com>, "yez...@linkedin.com" <yez...@linkedin.com>, " > yue...@memverge.com" <yue...@memverge.com> > *Subject: *Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API > > > > Hi Mridul - thanks for taking the time to give us feedback! Thoughts on > the points that you mentioned: > > > > The API is meant to work with the existing SortShuffleManager algorithm. > There aren't strict requirements on how other ShuffleManager > implementations must behave, so it seems impractical to design an API that > could also satisfy those unknown requirements. However, we do believe that > the API is rather generic, using OutputStreams for writes and InputStreams > for reads, and indexing the data by a shuffleId-mapId-reduceId combo, so if > other shuffle algorithms treat the data in the same chunks and want an > interface for storage, then they can also use this API from within their > implementation. > > > > About speculative execution, we originally made the assumption that each > shuffle task is deterministic, which meant that even if a later mapper > overrode a previous committed mapper's value, it's still the same contents. > Having searched some tickets and reading > https://github.com/apache/spark/pull/22112/files more carefully, I think > there are problems with our original thought if the writer writes all > attempts of a task to the same location. One example is if the writer > implementation writes each partition to the remote host in a sequence of > chunks. In such a situation, a reducer might read data half written by the > original task and half written by the running speculative task, which will > not be the correct contents if the mapper output is unordered. Therefore, > writes by a single mapper might have to be transactioned, which is not > clear from the API, and seems rather complex to reason about, so we > shouldn't expect this from the implementer. > > > > However, this doesn't affect the fundamentals of the API since we only > need to add an additional attemptId to the storage data index (which can be > stored within the MapStatus) to solve the problem of concurrent writes. > This would also make it more clear that the writer should use attempt ID as > an index to ensure that writes from speculative tasks don't interfere with > one another (we can add that to the API docs as well). > > > > *From: *Mridul Muralidharan <mri...@gmail.com> > *Date: *Wednesday, May 8, 2019 at 8:18 PM > *To: *"Yifei Huang (PD)" <yif...@palantir.com> > *Cc: *Bo Yang <b...@uber.com>, Ilan Filonenko <i...@cornell.edu>, Imran > Rashid <iras...@cloudera.com>, Justin Uang <ju...@palantir.com>, Liang > Tang <lat...@linkedin.com>, Marcelo Vanzin <van...@cloudera.com>, Matei > Zaharia <matei.zaha...@gmail.com>, Matt Cheah <mch...@palantir.com>, Min > Shen <ms...@linkedin.com>, Reynold Xin <r...@databricks.com>, Ryan Blue < > rb...@netflix.com>, Vinoo Ganesh <vgan...@palantir.com>, Will Manning < > wmann...@palantir.com>, "b...@fb.com" <b...@fb.com>, "dev@spark.apache.org" > <dev@spark.apache.org>, "fel...@uber.com" <fel...@uber.com>, " > f...@linkedin.com" <f...@linkedin.com>, "tgraves...@gmail.com" < > tgraves...@gmail.com>, "yez...@linkedin.com" <yez...@linkedin.com>, " > yue...@memverge.com" <yue...@memverge.com> > *Subject: *Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API > > > > > > Unfortunately I do not have bandwidth to do a detailed review, but a few > things come to mind after a quick read: > > > > - While it might be tactically beneficial to align with existing > implementation, a clean design which does not tie into existing shuffle > implementation would be preferable (if it can be done without over > engineering). Shuffle implementation can change and there are custom > implementations and experiments which differ quite a bit from what comes > with Apache Spark. > > > > > > - Please keep speculative execution in mind while designing the > interfaces: in spark, implicitly due to task scheduler logic, you won’t > have conflicts at an executor for (shuffleId, mapId) and (shuffleId, mapId, > reducerId) tuple. > > When you externalize it, there can be conflict : passing a way to > distinguish different tasks for same partition would be necessary for > nontrivial implementations. > > > > > > This would be a welcome and much needed enhancement to spark- looking > forward to its progress ! > > > > > > Regards, > > Mridul > > > > > > > > On Wed, May 8, 2019 at 11:24 AM Yifei Huang (PD) <yif...@palantir.com> > wrote: > > Hi everyone, > > For the past several months, we have been working on an API for pluggable > storage of shuffle data. In this SPIP, we describe the proposed API, its > implications, and how it fits into other work being done in the Spark > shuffle space. If you're interested in Spark shuffle, and especially if you > have done some work in this area already, please take a look at the SPIP > and give us your thoughts and feedback. > > Jira Ticket: https://issues.apache.org/jira/browse/SPARK-25299 > [issues.apache.org] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=P9Y1I_qiLLmmQlbeg4BGts4r8C1kLZn8TFUig7_oM1Q&m=8Ngs33_g8Iqpl1edSO0-rHSDkIiK1dkhzqQ1CRATp6c&s=1syPrDzxWbhcLwdKZUP1cbxk-yOLInSzUaKsDfPoGhw&e=> > SPIP: > https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit > [docs.google.com] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n-5F0iMSWwhCQ_edit&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=P9Y1I_qiLLmmQlbeg4BGts4r8C1kLZn8TFUig7_oM1Q&m=8Ngs33_g8Iqpl1edSO0-rHSDkIiK1dkhzqQ1CRATp6c&s=-Fd5cE0ONza5uF8eehh8zUc0jio-8kvtuJ53rT5QUgE&e=> > > Thank you! > > Yifei Huang and Matt Cheah > > > >