On 10 Oct 2014, at 03:40, Benjamin Pollack <benja...@bitquabit.com> wrote:

> On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <s...@stfx.eu> 
> wrote:
> [snip]
>> Apart from that, the tokenisation is not very efficient, #lines is a copy of 
>> your whole contents, so is the #split: and #trimmed. The algorithm sounds a 
>> bit lazy as well, writing it 'on purpose' with an eye for performance might 
>> yield better results.
> 
> So I was reflecting on this more.  If String and WideString were immutable, 
> then it'd be easy to avoid all of these copies; you could instead pass around 
> very tiny objects that had only three members (a String, a start position, a 
> stop position), and avoid copying very much data.  It's that String and 
> WideString are mutable that preclude that.  For fun, since I know I won't 
> mutate the stringsin this example, I actually did a quick spike where I 
> replaced #copyFrom:to: with a new method I introduced called #viewFrom:to: 
> that returned a StringView.  I'll post the code when I have a chance to clean 
> it up if there's interest, but it looks like it pretty handedly chops off 
> 120-150ms from that runtime (i.e., double the speed).
> 
> Has there been any thought to introducing some immutable collections?  Or 
> maybe I'm just missing them?  They'd be useful not just for String and 
> WideString, but really for basically any of the collection types.  The 
> implementation in most cases would be as simple as overriding #at:put: and 
> friends to throw "self shouldNotImplement", and then providing 
> methods/classes like the one I introduced to allow taking advantage of the 
> newfound immutability.
> 
> If there's interest, I'd be happy to submit a Slice we could use as a 
> concrete RFC of what this could look like.
> 
> --Benjamin

I think it is interesting that you get real measurable improvements with user 
defined string views.

I have always felt that a problem for going in that direction is that most 
primitives (which are important to get good basic string performance) are not 
flexible enough to be used really efficiently. More concretely, they should 
take start/stop indexes on all string arguments. For example, ByteString 
class>>#compare:with:collated: or #stringHash:initialHash: - it should be 
possible to do these operations on substrings *without creating the 
substrings*. 

Sven


Reply via email to