Re: review the clojure.string code

Steven E. Harris Sun, 30 May 2010 15:45:32 -0700

Why do some of the functions use StringBuilder (no internal
synchronization) and some use StringBuffer (provides internal
synchronization). Using the latter is probably a mistake.


The first function -- reverse -- uses StringBuilder#reverse() to reverse
the character sequence in place, and then calls StringBuilder#toString()
to yield a String. Why is the final step necessary? Since StringBuilder
implements CharSequence, isn't it sufficient to just return the
StringBuilder? That's what the function signature promises. Calling
toString() makes yet another copy of the data, so it's best avoided.

Some of these functions construct instances of class StringBuilder using
its default constructor, which gives it a default capacity. But most of
the functions also know at the outset some reasonable size for the
buffer. For instance, function `replace-first-by' could reasonably use
the length of its argument "s" as the size of the buffer, even though
the replacement for some matched substring may increase the overall
length.

Why does function `replace-first' unconditional call
CharSequence#toString() on its argument "s", when in several cases just
using "s" as a CharSequence would be fine. Again, calling
CharSequence#toString() might make a copy of the data, depending on
whether the CharSequence was actually a String or something like a
StringBuilder.

The implementation of a function like `trim' could be more efficient,
given that it's allowed to return a CharSequence. You can construct a
StringBuilder of length equal to the source CharSequence, then walk the
source sequence from the beginning, skipping over all whitespace,
copying the non-whitespace characters, repeating until you hit the end
of the source string. The "right trim" behavior is a little tricky, as
you have to suspend the copying upon encountering whitespace, but but
back up and copy that whitespace upon encountering something else before
the end of the source string. Alternately, just walk backward from the
end of the source string.

The bonus is that you only copy the data once. With the current
implementation, calling CharSequence#toString() might copy the data
once, and calling String#trim() will copy it again.

Rounding out the review, function `trim-newline' doesn't have to call
toString() on the CharSequence yielded by CharSequence#subSequence(), as
it already promises to return a CharSequence.

Most Java code poisons its string manipulation efficiency by always
promising to return String rather than CharSequence. You've done better
in your signatures here, so I'm just encouraging you to avoid String and
the extra copies it forces.

-- 
Steven E. Harris

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: review the clojure.string code

Reply via email to