On Monday, 30 December 2013 at 21:40:58 UTC, Thomas Gann wrote:
I've written a Markov bot in D, and I have function whose job
it is to take an input string, convert all newline characters
to spaces and all uppercase letters to lowercase, and then
return an array of words that are generated by splitting the
string up by whitespace. Here is the function is question:
string[] split_sentence(string input)
{
string line;
foreach(c; input)
{
if(c == '\n' || c == '\r')
line ~= ' ';
else
line ~= c.toLower();
}
return line.splitter(' ').filter!(a => a.length).array;
}
Obviously, one issue is that because the string is immutable, I
can't modify it directly, and so I actually build an entirely
new string in place. I would have just made a mutable duplicate
of the input and modify that, but then I would get errors
returning, because it expects string[] and not char[][]. Is
there a more elegant way to do what I'm doing?
A few points:
by declaring a new string and appending to it you are risking a
lot of allocations. Either use std.array.appender or allocate the
array with the correct size at the beginning.
using .array on the end of the ufcs chain is yet another
allocation. It can be avoided using std.algorithm.copy to copy
the result back in to 'line'
In my opinion the whole API would be better as range-based:
auto splitSentence(R)(R input)
if(isInputRange!R)
{
return input
.map!(c => (c == "\n"[0] || c == "\r"[0]) ? ' ' :
c.toLower)
.splitter!(' ')
.filter!(a => !(a.empty));
}