Let me start by giving some figures from my Smalltalk, on an Intel core I5-6200U @ 2.3 Ghz CPU laptop with 8GB of memory running Ubuntu 22.04 and gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0. Smalltalk is compiled to C then finished with the system C compiler. Static whole-program compilation is allowed by the ANSI standard and the system was originally written to serve as a baseline for Bryce's JIT. nsec technique 249 replaceFrom:to:with:startingAt:*5 128 withAll:*5 486 ,,,, 492 (,,),(,) 521 streamContents: 367 StringWriteStream 860 StringBuffer>>addAllLast: 385 StringBuffer>>nextPutAll:
replaceFrom:to:with:startingAt:*5 makes a string the right size then fills it in using #replaceAll:from:to:startingAt:. withAll:*5 is String withAll: a withAll: b withAll: c withAll: d withAll: e (supported up to 6 withAlls.) This is interesting because the result can be a [ReadOnly](ByteArray -- UTF8 -- or ShortArray -- UTF16 or String -- UTF32) and each of the up to 6 operands can independently be these things. It wasn't intended as a fast alternative to #, . .... is a,b,c,d,e (,,),(,) is (a,b,c),(d,e). streamContents: is what you had StringWriteStream is basically the same as streamContents: but using a WriteStream specialised to Strings with some extra primitive support. There are also StringReadStream and StringReadWriteStream. StringBuffer is my version of Java's StringBuilder; it's a cross between a String, an OrderedCollection, and a WriteStream. It can change size like an OrderedCollection; it has most of the "writing" methods (but not the "position" ones) of a WriteStream, and at all times you can use it as a String without having to copy the contents. You would expect #addAllLast: and #nextPutAll: to have the same result, and they do, but they were written a different times and #nextPutAll: was optimised for the case where the operand is a string while #addAllLast: wasn't. What does all that mean in practice? It means that a benchmark like this is VERY SENSITIVE to the details of how the library is written. Even just bracketing the commas differently gives you a different time. It means that techniques which are more efficient for LARGE volumes of data may have startup costs that make them less efficient for SMALL volumes of data, and that this is a very small benchmark. The cost of a,b,c,d,e is proportional to |a|*5 + |b|*4 + |c|*3 + |d|*2 + |e|, while the other techniques are proportional to |a| + |b| + |c| + |d| + |e|, BUT have overheads of their own. Well, that was astc. What about Pharo? 1,950,528 per second' ,,,, 6,509,256 per second' withAll:*5 Here it is. I've added withAll:*2 to withAll:*6 to ArrayedCollection class. withAll: c1 withAll: c2 withAll: c3 withAll: c4 withAll: c5 |e1 e2 e3 e4 e5| e1 := c1 size. e2 := c2 size + e1. e3 := c3 size + e2. e4 := c4 size + e3. e5 := c5 size + e4. ^(self new: e5) replaceFrom: 1 to: e1 with: c1 startingAt: 1; replaceFrom: e1+1 to: e2 with: c2 startingAt: 1; replaceFrom: e2+1 to: e3 with: c3 startingAt: 1; replaceFrom: e3+1 to: e4 with: c4 startingAt: 1; replaceFrom: e4+1 to: e5 with: c5 startingAt: 1; yourself What's the lesson here? Just because A is faster than B doesn't mean there isn't a fairly obvious C, D, ..., that will beat A. Now what is the real argument in favour of StringBuilder in Java and streamContents: in Smalltalk? s := ''. 1 to: n do: [:i | s := s , 'X']. makes a string of n Xs but takes O(n**2) time and turns over O(n**2) memory. s := String streamContents: [:o | 1 to: n do: [:i | o nextPut: $X] makes a string of n Xs while taking O(n) time and turning over O(n) memory. n does not have to be very big before this gets to be a HUGE difference. For what it's worth, the Java compiler turns a+b+c+d+e into code that creates a StringBuilder, stuffs a ... e into it, and then pulls a string out. There is no point in benchmarking a fixed number of concatenations against a StringBuilder in Java because they're the same thing. Smalltalk compilers don't do that. In Java and in Smalltalk you should seldom concatenation strings, but should send the fragments directly to their final destination. I've never quite made up my mind whether being toString()-centric was Java's biggest blunder or just the second biggest, but it was a pretty darned big one for sure. Smalltalk go this right: #printOn: is the basic notion and #printString the derived and best avoided one. On Sat, 16 Mar 2024 at 08:12, Noury Bouraqadi <bouraq...@gmail.com> wrote: > > I thought streamContents: was faster than using a comma binary message... > > I was wrong. Pharo is not Java :-) > > Noury > > "Run in P11" > > a := 'aaaaa'. > > b := 'bbbbb'. > > c := 'ccccc'. > > d := 'ddddd'. > > e := 'eeeeee'. > > [ a , b , c , d , e ] bench. > > "'3958888.090 per second'" > > "'3808242.503 per second'" > > > [ > > String streamContents: [ :str | > > str > > << a; > > << b; > > << c; > > << d; > > << e ] ] bench > > "'3083603.838 per second'" > > "'2927641.144 per second'" a := 'aaaaa'. > > b := 'bbbbb'. > > c := 'ccccc'. > > d := 'ddddd'. > > e := 'eeeeee'. > > [ a , b , c , d , e ] bench. > > "'3958888.090 per second'" > > "'3808242.503 per second'" > > [ > > String streamContents: [ :str | > > str > > << a; > > << b; > > << c; > > << d; > > << e ] ] bench > > "'3083603.838 per second'" > > "'2927641.144 per second'" > > a := 'aaaaa'. > b := 'bbbbb'. > c := 'ccccc'. > d := 'ddddd'. > e := 'eeeeee'. > [ a , b , c , d , e ] bench. > "'3958888.090 per second'" > "'3808242.503 per second'" > [ > String streamContents: [ :str | > str > << a; > << b; > << c; > << d; > << e ] ] bench > "'3083603.838 per second'" > "'2927641.144 per second'"