Hi

Matt Raible (Spring Live, AppFuse etc), mentioned on his blog that during testing AppFuse with Ant, he experienced an OutOfMemoryError.

Bouncing a couple of emails back'n'forth, he thinks it may occur in the Copy task as his target uses Copy extensively and he has a lot of files to copy. Looking at Copy, there are two parts, 1) build up a collection of files to copy, and 2) copy them. It's mentioned in the source that this is done for performance reasons, as a file by file copy would take too long, so they are batched and copied later (that's my reading of the comments anyway).

I have a few questions/suggestions after looking at the code.

1) When we construct a String with + ie ("Copying " + fileCopyMap.size() + " file" + (fileCopyMap.size() == 1 ? "" : "s") + " to " + destDir.getAbsolutePath()) we use a lot of temporary objects (which will be sized based on the length of the path, plus an overhead). When will these be released for gc? I've always thought that these temporary strings will be released after the method has exited, when there are no more references to them.

Now consider this code
(from doFileOperations)

           Enumeration e = fileCopyMap.keys();
           while (e.hasMoreElements()) {
               String fromFile = (String) e.nextElement();
               String[] toFiles = (String[]) fileCopyMap.get(fromFile);

               for (int i = 0; i < toFiles.length; i++) {
                   String toFile = toFiles[i];

                   if (fromFile.equals(toFile)) {
                       log("Skipping self-copy of " + fromFile, verbosity);
                       continue;
                   }
                   try {
log("Copying " + fromFile + " to " + toFile, verbosity); << this is creating a lot of temporary objects for each file copied

We create a lot of cruft for logging purposes, but if the verbosity is set too low(not sure what the correct terminology is for this, but when we set the verbosity to DEBUG, we get a load of output, when we set to QUIET we get none, so QUIET-ish level of logging for example), then we create this cruft in memory without it ever being used - it's never written to the log. For a large number of files, this cruft will gradually eat up memory without going out of scope (as the method won't exit until all the files are processed), and it won't be elegible for gc.

Forgive me if my understanding of the way this is going to interact with the gc system is incorrect (and therefore this entire post is incorrect), but I think that this will use up memory unnecessarily, and may cause problems related to the OutOfMemoryError mentioned previously
So what are the possible solutions (if indeed this is a problem).

1) StringBuffer - the old favourite for Java programmers, use StringBuffer to get a mutable string, therfore using up less memory as less temp objects get assigned to the heap (although the majority of the temp objects will be in Eden right? But when Eden is full, a minor gc starts reclaiming unreferenced objects and pushing still referenced ones into the young generation. I think with a long enough running loop inside this single method, this code will eventually fill up all the young generations + Eden and the amount of memory that is not really referenced (through objects), but is still in scope (of the running method) will cause this OOM error) 2) MessageFormat.format - not sure if this will save memory or not, the API docs only go as far back as 1.3.1, so if this wasn't present in Java1.2, we can't use it and retain compilability on JDK1.2. The work being done inside the format method (source) looks like a lot, so my guess is that it will actually cost more memory to implement logging using MessageFormat 3) static final strings, more complex logic to only build the message if the verbosity is set above a threshold. This will save memory by not creating the string for the log unless the user has selected to run in verbose mode, but it complicates the code, and it kinda subverts the verbosity flag of the log method.

I could spend some time hacking up a revised Copy task with some tweaks to try to reduce the memory consumption for copeis involving a large number of files, but I'd rather get input from the rest of you about the possible consequences before starting anything. There are other places I'd like to change things (slightly) within the Copy task, but this log + String concatenation looks like it is a space inefficient implementation, although it only manifests itself for large filesets.

Thanks
Kev



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to