Presuming this is reproducible, you should enter this state, then take thread dumps (via ctrl-break, ctrl-\, jconsole, jstack, jvisualvm, etc).
If you want more info on what's happening with gc, use -verbose:gc and/ or -XX:+PrintGCDetails If you want another level of analysis, use a performance profiler like YourKit or a heap profiler. On Apr 14, 9:57 pm, Ken Wesson <kwess...@gmail.com> wrote: > I've been using Clojure to automate some file management tasks over > here; in particular, I whipped together about 100 lines of code to > build and maintain a hash database of certain files and thereby > efficiently discover duplications among these files. (Basically, it > uses a subdirectory of the filesystem as a persisted hashmap, with the > first few bytes of the hash used to make a tree of subdirectories, the > leaves of which are hash buckets that collect files; > FileInputStream/MessageDigest is used to compute file hashes, and > byte-by-byte comparisons detect identical files from among those with > identical hashes.) > > I noticed something odd after spending a while working in other file > management tools, then switching to my REPL and hammering off an > (add-files! "/some/directory"). I flipped to NB's *out* tab to monitor > its progress and ... what progress? It had printed nothing, and as I > watched, it continued to print nothing. I switched back to the REPL > and it looked like it had submitted the sexp to the evaluator. But no > output. Then I checked the process list and saw the java task > consuming no CPU (unsusprising since I'd given it an I/O bound task) > but rapidly growing in memory use. It bloated up by several tens of > megs before I finally saw "Entering /some/directory" in *out*. > > The funny thing is, the code is basically this: > > (defn add-files! [dir] > (sysprint "Entering " dir) > (doseq [f (file-seq dir)] > ...)) > > and sysprint is basically this: > > (defn sysprint [& objs] > (.println System/out (apply str objs))) > > (so the output goes to *out* instead of blitzkrieging the REPL tab's > scrollback with thousands of lines of output, which then makes looking > things up from earlier in the REPL session very annoying). > > Since the thing froze for a significant amount of time allocating a > significant amount of memory before the (.println System.out ...) > succeeded, I conclude that the rest of add-files! is irrelevant here > and it was something in (apply str "Entering " "/some/directory") or > in (.println System/out ...) that spent a long time allocating tens of > megabytes of heap. I'm mystified as to why. The add-files! function > doesn't usually do this, and it recurses for subdirectories, so the > "Entering ..." message is printed several times in a typical use. > After the one long pause, this particular run generated several more > such messages without any noticeable pauses. > > What the heck was it? A GC pause? Why would that grow the heap? The > heap should be near max size already and perhaps shrink somewhat in a > GC pause. On the other hand, a long pause accompanied by rapid heap > growth suggests the construction of a large data structure, so large > it's not only big in memory but slow to build. But there was no > significant CPU use by the Java process during the pause, which > suggests that it was performing *blocking I/O*. That suggests it was > loading something big from disk, but a) it hadn't even reached the > doseq yet to start reading and hashing files, b) the files never > reside completely in live heap at one time (only in 4KB chunks), and > c) the files themselves were smallish, few exceeding 1MB and none even > approaching 10MB. > > The nearest thing I can figure out is that the process was going ahead > and hashing files, but the *out* display was lagging way behind. So, > the sysprint ran, then the doseq was well underway, and the heap grew > for a while as 4KB chunks were allocated ... > > Except a single (byte-array 4096) is being recycled in the hashing > loop. It's a mutable Java array and DigestInputStream.read() is > overwriting its contents for each new chunk. And unless the attached > MessageDigest is building huge temporary data structures, which seems > unlikely, that's basically it. The heap should have grown by a few > megs, total, from String, String[], and File object churn, and > creating one more 4KB array for each new entire file (not each 4KB of > data), over the entire operation. If there were 20 subdirectories and > 1000 files, a reasonable estimate, we're looking at: > > 20 String[] from File.list() on the subdirectories, average length 51 > (1020/20), average size 212 bytes or so, for 4240 bytes. > > 1020 String for file paths, average length say around 30, for 30600 bytes. > > 1000 4KB byte[] arrays for just under 4MB. > > 1000 byte[]s of length 20 to hold the 160-bit hashes generated by > MessageDigests, for around 28000 bytes. > > 1000 MessageDigest objects. If we assume that they contain 20-element > byte[]s but return defensive copies, so those aren't the 28000 bytes > just counted, and (generously!) that they contain another 1KB of > temporary state at peak size, that's another meg. > > 1020 seq objects over arrays and a few layers of filter, remove, map, > etc. over these, say 10,000 seq objects total, backed by arrays -- > generously call that another whole megabyte. > > Assorted small temporary strings and such per object -- assume, > generously, yet another full meg total. > > = about 7 megabytes. > > With a 64 meg heap and a much smaller initial process size, I'd have > expected it to just grow by 7 megs (or less, probably quite a lot > less) and not trigger ANY garbage collection. > > Any way I figure it, the code I wrote had no business causing the JVM > to allocate even ten megs, let alone over thirty, let alone > accompanied by a lengthy pause, whether for GC or disk I/O. Nor is > there an obvious reason for the System/out println's return to precede > the actual appearance of the text in the *out* tab by any significant > span of time. In particular, the Java task wasn't using detectable > levels of CPU (< 1% the whole time) so it couldn't have been hogging > the CPU and making Netbeans slow to update its display; and it wasn't > remote, so network latency is not a factor. NB was talking to it via > loopback interface and NB's own maintaining of the display is > essentially a CPU bound task that would not be impacted by contention > for access to the network or the disk controller (so, the Java task > sending the println to *out* and then heavily hogging the disk > controller during the file hashing should not have blocked NB from > receiving the text in the pipe from *out* and updating its display). > > In fact the only thing that would seem like it could explain it is if > the operating system paused one or both processes to page a big chunk > of something in from disk itself, but with the memory usage by the > system under 60% at the time nothing should have been paged out to > begin with. And of course any paging of the NB process back in should > have concluded before I was able to issue any command at the REPL in > it, whereas the Java process's size was (before it began growing) only > a few megs, which should page in virtually instantaneously as judged > at human perception speeds. And of course that rapid process size > growth would remain unexplained. > > Something's hinky. Either I'm not understanding something in how > System.out.println works, or in the Clojure code that called it, or in > how NB and an attached Clojure runtime communicate; or else something > even stranger is going on, like the operating system misreporting > process sizes or something. (If the Java process size was actually > much larger than it was claiming, then perhaps it both needed paging > and needed a lot of time to page in, for instance. Though the free > memory % suggested as much as a whole GB of unallocated RAM, meaning > that process would have to have been a LOT larger to force the OS to > page -- close to the -Xmx 1G I gave it, in fact, and then the pause > could have been a GC pause after all...then again, if the OS-reported > process sizes are suspect, then so is the OS-reported free memory %! > Damn, once you distrust any of your diagnostic tools, it opens a > Pandora's box that leads one to doubt everything and be sure of > nothing, unless you successfully predict the output and it was > statistically impossible to get by accident. The files all have the > same SHA-1 as reported by multiple tools as well as the MessageDigest > code, for instance, and that's highly unlikely to happen unless the > various SHA-1 algorithm implementations in these tools are in fact all > working correctly.) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en