yeah we run into this all the time with new hires. they will send emails explaining there is an error in the .write operation and they are debugging the writing to disk, focusing on that piece of code :)
unrelated, but another frequent cause for confusion is cascading errors. like the FetchFailedException. they will be debugging the reducer task not realizing the error happened before that, and the FetchFailedException is not the root cause. On Tue, May 8, 2018 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote: > Similar to the thread yesterday about improving ML/DL integration, I'm > sending another email on what I've learned recently from Spark users. I > recently talked to some educators that have been teaching Spark in their > (top-tier) university classes. They are some of the most important users > for adoption because of the multiplicative effect they have on the future > generation. > > To my surprise the single biggest ask they want is to enable eager > execution mode on all operations for teaching and debuggability: > > (1) Most of the students are relatively new to programming, and they need > multiple iterations to even get the most basic operation right. In these > cases, in order to trigger an error, they would need to explicitly add > actions, which is non-intuitive. > > (2) If they don't add explicit actions to every operation and there is a > mistake, the error pops up somewhere later where an action is triggered. > This is in a different position from the code that causes the problem, and > difficult for students to correlate the two. > > I suspect in the real world a lot of Spark users also struggle in similar > ways as these students. While eager execution is really not practical in > big data, in learning environments or in development against small, sampled > datasets it can be pretty helpful. > > > > > > > > > >