Presuming this is reproducible, you should enter this state, then take
thread dumps (via ctrl-break, ctrl-\, jconsole, jstack, jvisualvm,
etc).

If you want more info on what's happening with gc, use -verbose:gc and/
or -XX:+PrintGCDetails

If you want another level of analysis, use a performance profiler like
YourKit or a heap profiler.


On Apr 14, 9:57 pm, Ken Wesson <kwess...@gmail.com> wrote:
> I've been using Clojure to automate some file management tasks over
> here; in particular, I whipped together about 100 lines of code to
> build and maintain a hash database of certain files and thereby
> efficiently discover duplications among these files. (Basically, it
> uses a subdirectory of the filesystem as a persisted hashmap, with the
> first few bytes of the hash used to make a tree of subdirectories, the
> leaves of which are hash buckets that collect files;
> FileInputStream/MessageDigest is used to compute file hashes, and
> byte-by-byte comparisons detect identical files from among those with
> identical hashes.)
>
> I noticed something odd after spending a while working in other file
> management tools, then switching to my REPL and hammering off an
> (add-files! "/some/directory"). I flipped to NB's *out* tab to monitor
> its progress and ... what progress? It had printed nothing, and as I
> watched, it continued to print nothing. I switched back to the REPL
> and it looked like it had submitted the sexp to the evaluator. But no
> output. Then I checked the process list and saw the java task
> consuming no CPU (unsusprising since I'd given it an I/O bound task)
> but rapidly growing in memory use. It bloated up by several tens of
> megs before I finally saw "Entering /some/directory" in *out*.
>
> The funny thing is, the code is basically this:
>
> (defn add-files! [dir]
>   (sysprint "Entering " dir)
>   (doseq [f (file-seq dir)]
>     ...))
>
> and sysprint is basically this:
>
> (defn sysprint [& objs]
>   (.println System/out (apply str objs)))
>
> (so the output goes to *out* instead of blitzkrieging the REPL tab's
> scrollback with thousands of lines of output, which then makes looking
> things up from earlier in the REPL session very annoying).
>
> Since the thing froze for a significant amount of time allocating a
> significant amount of memory before the (.println System.out ...)
> succeeded, I conclude that the rest of add-files! is irrelevant here
> and it was something in (apply str "Entering " "/some/directory") or
> in (.println System/out ...) that spent a long time allocating tens of
> megabytes of heap. I'm mystified as to why. The add-files! function
> doesn't usually do this, and it recurses for subdirectories, so the
> "Entering ..." message is printed several times in a typical use.
> After the one long pause, this particular run generated several more
> such messages without any noticeable pauses.
>
> What the heck was it? A GC pause? Why would that grow the heap? The
> heap should be near max size already and perhaps shrink somewhat in a
> GC pause. On the other hand, a long pause accompanied by rapid heap
> growth suggests the construction of a large data structure, so large
> it's not only big in memory but slow to build. But there was no
> significant CPU use by the Java process during the pause, which
> suggests that it was performing *blocking I/O*. That suggests it was
> loading something big from disk, but a) it hadn't even reached the
> doseq yet to start reading and hashing files, b) the files never
> reside completely in live heap at one time (only in 4KB chunks), and
> c) the files themselves were smallish, few exceeding 1MB and none even
> approaching 10MB.
>
> The nearest thing I can figure out is that the process was going ahead
> and hashing files, but the *out* display was lagging way behind. So,
> the sysprint ran, then the doseq was well underway, and the heap grew
> for a while as 4KB chunks were allocated ...
>
> Except a single (byte-array 4096) is being recycled in the hashing
> loop. It's a mutable Java array and DigestInputStream.read() is
> overwriting its contents for each new chunk. And unless the attached
> MessageDigest is building huge temporary data structures, which seems
> unlikely, that's basically it. The heap should have grown by a few
> megs, total, from String, String[], and File object churn, and
> creating one more 4KB array for each new entire file (not each 4KB of
> data), over the entire operation. If there were 20 subdirectories and
> 1000 files, a reasonable estimate, we're looking at:
>
> 20 String[] from File.list() on the subdirectories, average length 51
> (1020/20), average size 212 bytes or so, for 4240 bytes.
>
> 1020 String for file paths, average length say around 30, for 30600 bytes.
>
> 1000 4KB byte[] arrays for just under 4MB.
>
> 1000 byte[]s of length 20 to hold the 160-bit hashes generated by
> MessageDigests, for around 28000 bytes.
>
> 1000 MessageDigest objects. If we assume that they contain 20-element
> byte[]s but return defensive copies, so those aren't the 28000 bytes
> just counted, and (generously!) that they contain another 1KB of
> temporary state at peak size, that's another meg.
>
> 1020 seq objects over arrays and a few layers of filter, remove, map,
> etc. over these, say 10,000 seq objects total, backed by arrays --
> generously call that another whole megabyte.
>
> Assorted small temporary strings and such per object -- assume,
> generously, yet another full meg total.
>
> = about 7 megabytes.
>
> With a 64 meg heap and a much smaller initial process size, I'd have
> expected it to just grow by 7 megs (or less, probably quite a lot
> less) and not trigger ANY garbage collection.
>
> Any way I figure it, the code I wrote had no business causing the JVM
> to allocate even ten megs, let alone over thirty, let alone
> accompanied by a lengthy pause, whether for GC or disk I/O. Nor is
> there an obvious reason for the System/out println's return to precede
> the actual appearance of the text in the *out* tab by any significant
> span of time. In particular, the Java task wasn't using detectable
> levels of CPU (< 1% the whole time) so it couldn't have been hogging
> the CPU and making Netbeans slow to update its display; and it wasn't
> remote, so network latency is not a factor. NB was talking to it via
> loopback interface and NB's own maintaining of the display is
> essentially a CPU bound task that would not be impacted by contention
> for access to the network or the disk controller (so, the Java task
> sending the println to *out* and then heavily hogging the disk
> controller during the file hashing should not have blocked NB from
> receiving the text in the pipe from *out* and updating its display).
>
> In fact the only thing that would seem like it could explain it is if
> the operating system paused one or both processes to page a big chunk
> of something in from disk itself, but with the memory usage by the
> system under 60% at the time nothing should have been paged out to
> begin with. And of course any paging of the NB process back in should
> have concluded before I was able to issue any command at the REPL in
> it, whereas the Java process's size was (before it began growing) only
> a few megs, which should page in virtually instantaneously as judged
> at human perception speeds. And of course that rapid process size
> growth would remain unexplained.
>
> Something's hinky. Either I'm not understanding something in how
> System.out.println works, or in the Clojure code that called it, or in
> how NB and an attached Clojure runtime communicate; or else something
> even stranger is going on, like the operating system misreporting
> process sizes or something. (If the Java process size was actually
> much larger than it was claiming, then perhaps it both needed paging
> and needed a lot of time to page in, for instance. Though the free
> memory % suggested as much as a whole GB of unallocated RAM, meaning
> that process would have to have been a LOT larger to force the OS to
> page -- close to the -Xmx 1G I gave it, in fact, and then the pause
> could have been a GC pause after all...then again, if the OS-reported
> process sizes are suspect, then so is the OS-reported free memory %!
> Damn, once you distrust any of your diagnostic tools, it opens a
> Pandora's box that leads one to doubt everything and be sure of
> nothing, unless you successfully predict the output and it was
> statistically impossible to get by accident. The files all have the
> same SHA-1 as reported by multiple tools as well as the MessageDigest
> code, for instance, and that's highly unlikely to happen unless the
> various SHA-1 algorithm implementations in these tools are in fact all
> working correctly.)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to