Thanks! Trying to pass through non-strings was overreaching. Ease of use first: the API should return immutable strings. If you really need to optimize more than this, roll your own.
Stu > Type-hinting args as a CharSequence is a GoodThing; type-hinting that > you're returning a CharSequence when you're actually returning a > String is not. > > I disagree with Steven that some functions should return the > StringBuilder instance due to being type-hinted as CharSequence. > CharSequence is barely a step above being a marker interface, useful > only insofar as calling .toString() is likely to be useful (which is > precisely what StringBuilder's constructor does anyway). Further, any > purported performance benefit from getting the mutable object back is > lost if/when it is passed to another clojure.string function since the > very first usage of a CharSequence is always to call .toString() on > it. > > I'd go the other way and document functions as returning a String, and > type-hinting them as such. This also has practical effects: > > user=> (defn ^CharSequence reverse > [^CharSequence s] > (.toString (.reverse (StringBuilder. s)))) > user=> (set! *warn-on-reflection* true) > true > user=> (.substring (reverse "hello") 3) > Reflection warning, NO_SOURCE_PATH:7 - call to substring can't be > resolved. > "eh" > user=> (defn ^String reverse > [^CharSequence s] > (.toString (.reverse (StringBuilder. s)))) > #'user/reverse > user=> (.substring (reverse "hello") 3) > "eh" > > As for the concern about data copying, there are reasonable > optimizations already in place such that the String returned from > StringBuilder.toString() is likely to be created with the same char > array instance; no copying needed. > > As for the use of StringBuffer, it appears that they are used solely > when dealing with regex, since that API requires them. > > > On May 30, 3:45 pm, "Steven E. Harris" <s...@panix.com> wrote: >> Why do some of the functions use StringBuilder (no internal >> synchronization) and some use StringBuffer (provides internal >> synchronization). Using the latter is probably a mistake. >> >> The first function -- reverse -- uses StringBuilder#reverse() to reverse >> the character sequence in place, and then calls StringBuilder#toString() >> to yield a String. Why is the final step necessary? Since StringBuilder >> implements CharSequence, isn't it sufficient to just return the >> StringBuilder? That's what the function signature promises. Calling >> toString() makes yet another copy of the data, so it's best avoided. >> >> Some of these functions construct instances of class StringBuilder using >> its default constructor, which gives it a default capacity. But most of >> the functions also know at the outset some reasonable size for the >> buffer. For instance, function `replace-first-by' could reasonably use >> the length of its argument "s" as the size of the buffer, even though >> the replacement for some matched substring may increase the overall >> length. >> >> Why does function `replace-first' unconditional call >> CharSequence#toString() on its argument "s", when in several cases just >> using "s" as a CharSequence would be fine. Again, calling >> CharSequence#toString() might make a copy of the data, depending on >> whether the CharSequence was actually a String or something like a >> StringBuilder. >> >> The implementation of a function like `trim' could be more efficient, >> given that it's allowed to return a CharSequence. You can construct a >> StringBuilder of length equal to the source CharSequence, then walk the >> source sequence from the beginning, skipping over all whitespace, >> copying the non-whitespace characters, repeating until you hit the end >> of the source string. The "right trim" behavior is a little tricky, as >> you have to suspend the copying upon encountering whitespace, but but >> back up and copy that whitespace upon encountering something else before >> the end of the source string. Alternately, just walk backward from the >> end of the source string. >> >> The bonus is that you only copy the data once. With the current >> implementation, calling CharSequence#toString() might copy the data >> once, and calling String#trim() will copy it again. >> >> Rounding out the review, function `trim-newline' doesn't have to call >> toString() on the CharSequence yielded by CharSequence#subSequence(), as >> it already promises to return a CharSequence. >> >> Most Java code poisons its string manipulation efficiency by always >> promising to return String rather than CharSequence. You've done better >> in your signatures here, so I'm just encouraging you to avoid String and >> the extra copies it forces. >> >> -- >> Steven E. Harris > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en