Re: [racket] help me speed up string split?

Robby Findler Wed, 18 Jun 2014 16:02:28 -0700

Sorry-- I certainly don't dispute any of your comments on the relative
merit of the code. I was just trying to understand which parts of the
function touch bad performance in Racket.


Robby


On Wed, Jun 18, 2014 at 4:44 PM, Ryan Davis <zenspi...@gmail.com> wrote:
>
> On Jun 18, 2014, at 1:26, Robby Findler <ro...@eecs.northwestern.edu> wrote:
>
>> I think the ruby version is smarter than your Racket version.
>> Specifically, if you remove the .size (but don't print the output),
>> things slow down by a factor of about 2 for me.
>>
>> $  time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f)'
>>> /dev/null
>> real 0m1.127s
>> user 0m1.071s
>> sys 0m0.054s
>> $  time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).size'
>> 655360
>>
>> real 0m0.561s
>> user 0m0.512s
>> sys 0m0.047s
>
> It's possible I grabbed the wrong input. Doing the equivalent take(10) 
> results in roughly the same time. Either way, ruby isn't using lazy lists or 
> anything to cheat in the case of .size. You're probably just seeing a time 
> discrepancy for output generation.
>
> % time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).take(10)'
> [0.0, 0.28858451, -0.020294171, -0.13290514, -0.9952786, -0.98311061, 
> -0.91352645, -0.99511208, -0.98318457, -0.92352702]
>
> real    0m3.896s
> user    0m3.611s
> sys     0m0.281s
>
> % time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).size'
> 4124473
>
> real    0m3.886s
> user    0m3.612s
> sys     0m0.263s
>
>> But meanwhile, it looks like the regexp matching is slower than in
>> Ruby. Below are some variations on the code that explore some speedups
>> for the computation you intended to happen, I believe. The middle
>> version (fetch-whitespace-delimited-string2) takes the same amount of
>> time as the version of ruby that is forced to actually compute the
>> list. So maybe there are still more improvements to be done to regexp
>> port matching, at least for certain regexps, I'm not sure.
>
> The first one works, but takes ~55k ms. :(
>
> I could never get the second one to work. It comes back instantly w/ 0. I 
> think my input file doesn't QUITE conform to what you're looking for. I snuck 
> in an extra read-char in before the loop, but still no go.
>
> Either way, it's a really painful way to think about this type of task.
>
> "Read it all in, split on whitespace, convert everything to floats"
>
> vs
>
> "Create a function that takes a function, opens the input, sets up a loop, 
> calls the given function, compares the result and if it is truthy, appends 
> the string converted to a number to the front of a list (which must be later 
> reversed). The passed function needs to set up a loop, read in a character, 
> and if that character is not eof, space, or newline it loops concatinating 
> that character with a list of previously found characters. If eof, or 
> whitespace is discovered, it reverses the collected list and converts it to a 
> string."
>
> That reads more like what you'd write in C rather than a beautiful functional 
> language with a very complete library. It took me several times longer to 
> convert your code to English as it did to write several of these variants in 
> ruby, racket, or mathematica. While it is much more nuanced, I just don't 
> think it comes as naturally or gracefully as the task as laid out in my head. 
> Even my cleanest (and fastest!) versions took several iterations to come by 
> and are a bit of a stretch:
>
>         (define path (path->string (expand-user-path 
> "~/Desktop/X_train.txt")))
>
>         (define (gc)
>           (collect-garbage)(collect-garbage)(collect-garbage))
>
>         (define (go fn)
>           (time (take (with-input-from-file path fn) 10)))
>
>         (gc)
>         (go (lambda ()    ; cpu time: 10428 real time: 10410 gc time:  631 
> (consistently ~1/2 the gc)
>               (for/list ([s (in-port read)])
>                 s)))
>
>         (gc)
>         (go (lambda ()    ; cpu time:  9883 real time:  9872 gc time: 1182
>               (for/list ([s (in-port (curry read-string 16))])
>                 (string->number (string-trim s)))))
>
>         (gc)
>         (go (lambda ()    ; cpu time: 15006 real time: 14981 gc time: 1154
>               (for/list ([s (in-port (curry read-string 16))])
>                 (read (open-input-string s)))))
>
> I would like to think that the regexp engine can be sped up to the point 
> where "read it, split it, convert it" is a valid option. Until then, I'll 
> stick to for/list + in-port + read, but remembering that that particular 
> combination is the right one is going to be difficult.
>
> P.S. Mathematica only takes ~8 seconds and is no speed demon for this type of 
> task:
>
> In[2]:= AbsoluteTiming[
>  Take[Flatten[Import["~/Desktop/X_train.txt", "Table"]], 10]]
>
> Out[2]= {8.443284, {0.288585, -0.0202942, -0.132905, -0.995279, \
> -0.983111, -0.913526, -0.995112, -0.983185, -0.923527, -0.934724}}
>
> more importantly, I wrote that from memory.
>

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Re: [racket] help me speed up string split?

Reply via email to