On Monday 04 May 2009 04:01:23 am Hendrik van Rooyen wrote: > This will form a virtual (or real if you have different machines) > systolic array with producers feeding consumers that feed > the summary process, all running concurrently.
Nah, I can't do that. The summary process is expensive, but not nearly as expensive as the consuming (10 minutes vs. a few hours), and can't be started anyway before the consumers are done. > You only need to keep the output of the consumers in files if > you need access to it later for some reason. In your case it sounds > as if you are only interested in the output of the summary. Or if the summarizing process requires that it is stored on files. Or if the consumers naturally store the data on files. The consumers "produce" several gigabytes of data, not enough to make it intractable, but enough to not want to load them into RAM or transmit it back. In case you are wondering what the job is: i'm indexing a lot of documents with Xapian. The producer reads the [compressed] documents from the hard disk, the consumers process it and index it on they own xapian database. When they are finished, I merge the databases (the summarizing) and delete the partial DBs. I don't need the DBs to be in memory at any time, and xapian works with files anyway. Even if I were to use different machines (not worth it for a process that will not run very frequently, except at developing time), it would be still cheaper to scp the files. Now, if I only had a third core available to consume a bit faster ... Regards, -- Luis Zarrabeitia (aka Kyrie) Fac. de Matemática y Computación, UH. http://profesores.matcom.uh.cu/~kyrie -- http://mail.python.org/mailman/listinfo/python-list