Similar to the thread yesterday about improving ML/DL integration, I'm
sending another email on what I've learned recently from Spark users. I
recently talked to some educators that have been teaching Spark in their
(top-tier) university classes. They are some of the most important users
for adoption because of the multiplicative effect they have on the future
generation.

To my surprise the single biggest ask they want is to enable eager
execution mode on all operations for teaching and debuggability:

(1) Most of the students are relatively new to programming, and they need
multiple iterations to even get the most basic operation right. In these
cases, in order to trigger an error, they would need to explicitly add
actions, which is non-intuitive.

(2) If they don't add explicit actions to every operation and there is a
mistake, the error pops up somewhere later where an action is triggered.
This is in a different position from the code that causes the problem, and
difficult for students to correlate the two.

I suspect in the real world a lot of Spark users also struggle in similar
ways as these students. While eager execution is really not practical in
big data, in learning environments or in development against small, sampled
datasets it can be pretty helpful.

Reply via email to