Sorry-- I certainly don't dispute any of your comments on the relative merit of the code. I was just trying to understand which parts of the function touch bad performance in Racket.
Robby On Wed, Jun 18, 2014 at 4:44 PM, Ryan Davis <zenspi...@gmail.com> wrote: > > On Jun 18, 2014, at 1:26, Robby Findler <ro...@eecs.northwestern.edu> wrote: > >> I think the ruby version is smarter than your Racket version. >> Specifically, if you remove the .size (but don't print the output), >> things slow down by a factor of about 2 for me. >> >> $ time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f)' >>> /dev/null >> real 0m1.127s >> user 0m1.071s >> sys 0m0.054s >> $ time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).size' >> 655360 >> >> real 0m0.561s >> user 0m0.512s >> sys 0m0.047s > > It's possible I grabbed the wrong input. Doing the equivalent take(10) > results in roughly the same time. Either way, ruby isn't using lazy lists or > anything to cheat in the case of .size. You're probably just seeing a time > discrepancy for output generation. > > % time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).take(10)' > [0.0, 0.28858451, -0.020294171, -0.13290514, -0.9952786, -0.98311061, > -0.91352645, -0.99511208, -0.98318457, -0.92352702] > > real 0m3.896s > user 0m3.611s > sys 0m0.281s > > % time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).size' > 4124473 > > real 0m3.886s > user 0m3.612s > sys 0m0.263s > >> But meanwhile, it looks like the regexp matching is slower than in >> Ruby. Below are some variations on the code that explore some speedups >> for the computation you intended to happen, I believe. The middle >> version (fetch-whitespace-delimited-string2) takes the same amount of >> time as the version of ruby that is forced to actually compute the >> list. So maybe there are still more improvements to be done to regexp >> port matching, at least for certain regexps, I'm not sure. > > The first one works, but takes ~55k ms. :( > > I could never get the second one to work. It comes back instantly w/ 0. I > think my input file doesn't QUITE conform to what you're looking for. I snuck > in an extra read-char in before the loop, but still no go. > > Either way, it's a really painful way to think about this type of task. > > "Read it all in, split on whitespace, convert everything to floats" > > vs > > "Create a function that takes a function, opens the input, sets up a loop, > calls the given function, compares the result and if it is truthy, appends > the string converted to a number to the front of a list (which must be later > reversed). The passed function needs to set up a loop, read in a character, > and if that character is not eof, space, or newline it loops concatinating > that character with a list of previously found characters. If eof, or > whitespace is discovered, it reverses the collected list and converts it to a > string." > > That reads more like what you'd write in C rather than a beautiful functional > language with a very complete library. It took me several times longer to > convert your code to English as it did to write several of these variants in > ruby, racket, or mathematica. While it is much more nuanced, I just don't > think it comes as naturally or gracefully as the task as laid out in my head. > Even my cleanest (and fastest!) versions took several iterations to come by > and are a bit of a stretch: > > (define path (path->string (expand-user-path > "~/Desktop/X_train.txt"))) > > (define (gc) > (collect-garbage)(collect-garbage)(collect-garbage)) > > (define (go fn) > (time (take (with-input-from-file path fn) 10))) > > (gc) > (go (lambda () ; cpu time: 10428 real time: 10410 gc time: 631 > (consistently ~1/2 the gc) > (for/list ([s (in-port read)]) > s))) > > (gc) > (go (lambda () ; cpu time: 9883 real time: 9872 gc time: 1182 > (for/list ([s (in-port (curry read-string 16))]) > (string->number (string-trim s))))) > > (gc) > (go (lambda () ; cpu time: 15006 real time: 14981 gc time: 1154 > (for/list ([s (in-port (curry read-string 16))]) > (read (open-input-string s))))) > > I would like to think that the regexp engine can be sped up to the point > where "read it, split it, convert it" is a valid option. Until then, I'll > stick to for/list + in-port + read, but remembering that that particular > combination is the right one is going to be difficult. > > P.S. Mathematica only takes ~8 seconds and is no speed demon for this type of > task: > > In[2]:= AbsoluteTiming[ > Take[Flatten[Import["~/Desktop/X_train.txt", "Table"]], 10]] > > Out[2]= {8.443284, {0.288585, -0.0202942, -0.132905, -0.995279, \ > -0.983111, -0.913526, -0.995112, -0.983185, -0.923527, -0.934724}} > > more importantly, I wrote that from memory. > ____________________ Racket Users list: http://lists.racket-lang.org/users