Type-hinting args as a CharSequence is a GoodThing; type-hinting that
you're returning a CharSequence when you're actually returning a
String is not.

I disagree with Steven that some functions should return the
StringBuilder instance due to being type-hinted as CharSequence.
CharSequence is barely a step above being a marker interface, useful
only insofar as calling .toString() is likely to be useful (which is
precisely what StringBuilder's constructor does anyway).  Further, any
purported performance benefit from getting the mutable object back is
lost if/when it is passed to another clojure.string function since the
very first usage of a CharSequence is always to call .toString() on
it.

I'd go the other way and document functions as returning a String, and
type-hinting them as such.  This also has practical effects:

user=> (defn ^CharSequence reverse
  [^CharSequence s]
  (.toString (.reverse (StringBuilder. s))))
user=> (set! *warn-on-reflection* true)
true
user=> (.substring (reverse "hello") 3)
Reflection warning, NO_SOURCE_PATH:7 - call to substring can't be
resolved.
"eh"
user=> (defn ^String reverse
  [^CharSequence s]
  (.toString (.reverse (StringBuilder. s))))
#'user/reverse
user=> (.substring (reverse "hello") 3)
"eh"

As for the concern about data copying, there are reasonable
optimizations already in place such that the String returned from
StringBuilder.toString() is likely to be created with the same char
array instance; no copying needed.

As for the use of StringBuffer, it appears that they are used solely
when dealing with regex, since that API requires them.


On May 30, 3:45 pm, "Steven E. Harris" <s...@panix.com> wrote:
> Why do some of the functions use StringBuilder (no internal
> synchronization) and some use StringBuffer (provides internal
> synchronization). Using the latter is probably a mistake.
>
> The first function -- reverse -- uses StringBuilder#reverse() to reverse
> the character sequence in place, and then calls StringBuilder#toString()
> to yield a String. Why is the final step necessary? Since StringBuilder
> implements CharSequence, isn't it sufficient to just return the
> StringBuilder? That's what the function signature promises. Calling
> toString() makes yet another copy of the data, so it's best avoided.
>
> Some of these functions construct instances of class StringBuilder using
> its default constructor, which gives it a default capacity. But most of
> the functions also know at the outset some reasonable size for the
> buffer. For instance, function `replace-first-by' could reasonably use
> the length of its argument "s" as the size of the buffer, even though
> the replacement for some matched substring may increase the overall
> length.
>
> Why does function `replace-first' unconditional call
> CharSequence#toString() on its argument "s", when in several cases just
> using "s" as a CharSequence would be fine. Again, calling
> CharSequence#toString() might make a copy of the data, depending on
> whether the CharSequence was actually a String or something like a
> StringBuilder.
>
> The implementation of a function like `trim' could be more efficient,
> given that it's allowed to return a CharSequence. You can construct a
> StringBuilder of length equal to the source CharSequence, then walk the
> source sequence from the beginning, skipping over all whitespace,
> copying the non-whitespace characters, repeating until you hit the end
> of the source string. The "right trim" behavior is a little tricky, as
> you have to suspend the copying upon encountering whitespace, but but
> back up and copy that whitespace upon encountering something else before
> the end of the source string. Alternately, just walk backward from the
> end of the source string.
>
> The bonus is that you only copy the data once. With the current
> implementation, calling CharSequence#toString() might copy the data
> once, and calling String#trim() will copy it again.
>
> Rounding out the review, function `trim-newline' doesn't have to call
> toString() on the CharSequence yielded by CharSequence#subSequence(), as
> it already promises to return a CharSequence.
>
> Most Java code poisons its string manipulation efficiency by always
> promising to return String rather than CharSequence. You've done better
> in your signatures here, so I'm just encouraging you to avoid String and
> the extra copies it forces.
>
> --
> Steven E. Harris

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to