Adrian Crum wrote: > Adam Heath wrote: >> Adrian Crum wrote: >>> The java.util.concurrent package rocks! I used it a few weeks ago to >>> multi-thread the demo data loading code. I got it down from 3 minutes to >>> 1.5 minutes. >> >> What? You made the ofbiz demo data loading code multi-threaded? >> Seriously? If so, that rocks! > > I used a thread pool to create tables and non-fk indexes. By fine tuning > the thread count, I was able to take the single-threaded CPU usage from > 12-20% up to 50-90%. I used a FIFO queue for loading data - the main > thread parses the XML files and places DOM Elements in the queue, and > another thread takes the elements from the queue and stores them in the > database. > > Some day I'll clean up the code and provide a patch. It only benefits > multi-CPU computers.
I would do this in multiple stages. First stage would be a generic xml parsing service. Each xml file is handed off to an ExecutorService. The Callable.call() method would then parse the file, and the return would be a Document. The second phase would then use the same ExecutorService, and convert each Document to a List<GenericValue>. As an optimization, the first phase would auto-submit the document back to the same executor. Third phase would then import files in parallel, but not the separate values. You'd have to handle dependency issues, similiar to the looping that is currently done. However, the correct fix for these kinds of problems would be to reorder the data in the files.
