Hi, My limited understanding of Spark tells me that a task is the least possible working unit and Spark itself won't give you much. It wouldn't expect so since "acount" is a business entity not Spark's one.
What about using mapPartitions* to know the details of partitions and do whatever you want (log to stdout or whatever)? Just a thought. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Sun, Nov 29, 2015 at 3:12 PM, Yuhao Yang <hhb...@gmail.com> wrote: > Hi all, > > I got a simple processing job for 20000 accounts on 8 partitions. It's > roughly 2500 accounts on each partition. Each account will take about 1s to > complete the computation. That means each partition will take about 2500 > seconds to finish the batch. > > My question is how can I get the detailed progress of how many accounts has > been processed for each partition during the computation. An ideal solution > would allow me to know how many accounts has been processed periodically > (like every minute) so I can monitor and take action to save some time. > Right now on UI I can only get that task is running. > > I know one solution is to split the data horizontally on driver and submit > to spark in mini batches, yet I think that would waste some cluster resource > and create extra complexity for result handling. > > Any experience or best practice is welcome. Thanks a lot. > > Regards, > Yuhao --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org