Re: review the clojure.string code

Stuart Halloway Sun, 30 May 2010 20:02:24 -0700

Steven, thanks for the detailed feedback! Responses inline:

> Why do some of the functions use StringBuilder (no internal
> synchronization) and some use StringBuffer (provides internal
> synchronization). Using the latter is probably a mistake.


Stuck with this thanks to the Java API: .appendReplacement and .appendTail 
*require* a StringBuffer. :-(

> The first function -- reverse -- uses StringBuilder#reverse() to reverse
> the character sequence in place, and then calls StringBuilder#toString()
> to yield a String. Why is the final step necessary? Since StringBuilder
> implements CharSequence, isn't it sufficient to just return the
> StringBuilder? That's what the function signature promises. Calling
> toString() makes yet another copy of the data, so it's best avoided.

Switching to returning String everywhere (see feedback from ataggart).

> Some of these functions construct instances of class StringBuilder using
> its default constructor, which gives it a default capacity. But most of
> the functions also know at the outset some reasonable size for the
> buffer. For instance, function `replace-first-by' could reasonably use
> the length of its argument "s" as the size of the buffer, even though
> the replacement for some matched substring may increase the overall
> length.

Saw this in one place (on a StringBuffer): agreed and fixed.

> Why does function `replace-first' unconditional call
> CharSequence#toString() on its argument "s", when in several cases just
> using "s" as a CharSequence would be fine. Again, calling
> CharSequence#toString() might make a copy of the data, depending on
> whether the CharSequence was actually a String or something like a
> StringBuilder.

Won't work for most of the branches, which depend on String methods. IMO not 
worth doing for the few cases where it could be done.

> The implementation of a function like `trim' could be more efficient,
> given that it's allowed to return a CharSequence. You can construct a
> StringBuilder of length equal to the source CharSequence, then walk the
> source sequence from the beginning, skipping over all whitespace,
> copying the non-whitespace characters, repeating until you hit the end
> of the source string. The "right trim" behavior is a little tricky, as
> you have to suspend the copying upon encountering whitespace, but but
> back up and copy that whitespace upon encountering something else before
> the end of the source string. Alternately, just walk backward from the
> end of the source string.
> 
> The bonus is that you only copy the data once. With the current
> implementation, calling CharSequence#toString() might copy the data
> once, and calling String#trim() will copy it again.
> 
> Rounding out the review, function `trim-newline' doesn't have to call
> toString() on the CharSequence yielded by CharSequence#subSequence(), as
> it already promises to return a CharSequence.

The Java String methods are already smarter than this, and avoid naive 
unnecessary copying. Moreover, subSequence on StringBuilder already returns a 
String, so the copying damage is already done.

Stu 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: review the clojure.string code

Reply via email to