On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <s...@stfx.eu>
wrote:
[snip]
Apart from that, the tokenisation is not very efficient, #lines is a
copy of your whole contents, so is the #split: and #trimmed. The
algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye
for performance might yield better results.
So I was reflecting on this more. If String and WideString were
immutable, then it'd be easy to avoid all of these copies; you could
instead pass around very tiny objects that had only three members (a
String, a start position, a stop position), and avoid copying very much
data. It's that String and WideString are mutable that preclude that.
For fun, since I know I won't mutate the stringsin this example, I
actually did a quick spike where I replaced #copyFrom:to: with a new
method I introduced called #viewFrom:to: that returned a StringView. I'll
post the code when I have a chance to clean it up if there's interest, but
it looks like it pretty handedly chops off 120-150ms from that runtime
(i.e., double the speed).
Has there been any thought to introducing some immutable collections? Or
maybe I'm just missing them? They'd be useful not just for String and
WideString, but really for basically any of the collection types. The
implementation in most cases would be as simple as overriding #at:put: and
friends to throw "self shouldNotImplement", and then providing
methods/classes like the one I introduced to allow taking advantage of the
newfound immutability.
If there's interest, I'd be happy to submit a Slice we could use as a
concrete RFC of what this could look like.
--Benjamin