Type-hinting args as a CharSequence is a GoodThing; type-hinting that you're returning a CharSequence when you're actually returning a String is not.
I disagree with Steven that some functions should return the StringBuilder instance due to being type-hinted as CharSequence. CharSequence is barely a step above being a marker interface, useful only insofar as calling .toString() is likely to be useful (which is precisely what StringBuilder's constructor does anyway). Further, any purported performance benefit from getting the mutable object back is lost if/when it is passed to another clojure.string function since the very first usage of a CharSequence is always to call .toString() on it. I'd go the other way and document functions as returning a String, and type-hinting them as such. This also has practical effects: user=> (defn ^CharSequence reverse [^CharSequence s] (.toString (.reverse (StringBuilder. s)))) user=> (set! *warn-on-reflection* true) true user=> (.substring (reverse "hello") 3) Reflection warning, NO_SOURCE_PATH:7 - call to substring can't be resolved. "eh" user=> (defn ^String reverse [^CharSequence s] (.toString (.reverse (StringBuilder. s)))) #'user/reverse user=> (.substring (reverse "hello") 3) "eh" As for the concern about data copying, there are reasonable optimizations already in place such that the String returned from StringBuilder.toString() is likely to be created with the same char array instance; no copying needed. As for the use of StringBuffer, it appears that they are used solely when dealing with regex, since that API requires them. On May 30, 3:45 pm, "Steven E. Harris" <s...@panix.com> wrote: > Why do some of the functions use StringBuilder (no internal > synchronization) and some use StringBuffer (provides internal > synchronization). Using the latter is probably a mistake. > > The first function -- reverse -- uses StringBuilder#reverse() to reverse > the character sequence in place, and then calls StringBuilder#toString() > to yield a String. Why is the final step necessary? Since StringBuilder > implements CharSequence, isn't it sufficient to just return the > StringBuilder? That's what the function signature promises. Calling > toString() makes yet another copy of the data, so it's best avoided. > > Some of these functions construct instances of class StringBuilder using > its default constructor, which gives it a default capacity. But most of > the functions also know at the outset some reasonable size for the > buffer. For instance, function `replace-first-by' could reasonably use > the length of its argument "s" as the size of the buffer, even though > the replacement for some matched substring may increase the overall > length. > > Why does function `replace-first' unconditional call > CharSequence#toString() on its argument "s", when in several cases just > using "s" as a CharSequence would be fine. Again, calling > CharSequence#toString() might make a copy of the data, depending on > whether the CharSequence was actually a String or something like a > StringBuilder. > > The implementation of a function like `trim' could be more efficient, > given that it's allowed to return a CharSequence. You can construct a > StringBuilder of length equal to the source CharSequence, then walk the > source sequence from the beginning, skipping over all whitespace, > copying the non-whitespace characters, repeating until you hit the end > of the source string. The "right trim" behavior is a little tricky, as > you have to suspend the copying upon encountering whitespace, but but > back up and copy that whitespace upon encountering something else before > the end of the source string. Alternately, just walk backward from the > end of the source string. > > The bonus is that you only copy the data once. With the current > implementation, calling CharSequence#toString() might copy the data > once, and calling String#trim() will copy it again. > > Rounding out the review, function `trim-newline' doesn't have to call > toString() on the CharSequence yielded by CharSequence#subSequence(), as > it already promises to return a CharSequence. > > Most Java code poisons its string manipulation efficiency by always > promising to return String rather than CharSequence. You've done better > in your signatures here, so I'm just encouraging you to avoid String and > the extra copies it forces. > > -- > Steven E. Harris -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en