Mostly we complain that we don't have enough reporting just yet. But seriously, the long running programs in Mahout are almost all map-reduce jobs and there is a fairly good framework for progress reporting in Hadoop. This includes normal logging as well as a counter framework that allows code to drive status counters in parallel out to a standard web interface showing the status of a job.
The only non-hadoop long-running job is the stochastic gradient descent modeling stuff. There, we use simple logging with a list of tab separated values on a specially marked log line. These can be extracted using tail and grep to provide progress plots. The separation between status reports backs off exponentially so you only get a logarithmic total number of progress. The exponential constants are chosen so that no matter how long your job runs, you get fairly fine-grained feedback relative the total time so far and the actual back-off is adjusted so that each reporting epoch comes out on an even multiple of {1,2,5} x 10^p so that it presents a pretty picture. On Thu, Nov 18, 2010 at 4:22 PM, Phil Steitz <phil.ste...@gmail.com> wrote: > On 11/18/10 7:17 PM, Ted Dunning wrote: > >> I really don't think that a general progress listener framework implies >> this >> clutter and for long running algorithms is >> a very nice thing. >> > > I agree. Do you have specific ideas on how best to look at this? What does > Mahout do?