Re: Need help sorting by specific fields in file.

James Edward Gray II Mon, 02 Feb 2004 13:34:25 -0800

On Feb 2, 2004, at 1:49 PM, Dennis G. Wicks wrote:

Greetings;
I have a file that I need to sort and currently I am just
sorting it by
@datalist = sort(@datalist);

Okay, but you're not sorting a file there. You're sorting an array. Maybe that array was loaded from a file, but we're dealing with in memory arrays now.

You mention below that this is numeric data, so that should be:

@datalist = sort { $a <=> $b } @datalist;

but it will eventually have many more records and many of
them may be quite large, but I only need to sort on the
first six characters which would be faster. Wouldn't it?

Honestly, I don't know, but I suspect that it wouldn't. I could be wrong. You could benchmark to find out. Thing is, first you have to grab the first 6 characters off of all of them, that takes time. You're also assuming that the default comparison, whatever it is, is comparing every single character. I would hope it sort circuits as soon as it has enough information to compare things. If it does, you might slow it down, instead of speed it up.

More importantly, this is a problem to consider AFTER it becomes slow. You say it's fast now. Great. Don't touch it. Computers can handle a lot of information very fast these days and humans who spend time coding a "faster" solution for something that was already happening in the blink of an eye are silly.

My worry, just from reading your message, was: Is the dataset too big to be placed into a single array in memory, as it seems you are doing?

I have looked at perldoc and it shows things like

@articles = sort {$a <=> $b} @files;
but I can't figure out how to tell the sort that $a and $b
are the first six characters of @datalist. That is numeric
data BTW.
Any help or pointers appreciated.

@dataset = sort { substr($a, 0, 6) <=> substr($b, 0, 6) } @dataset;

That's just a basic grab the first 6 numbers and compare approach. For big data sets though, we can possibly get faster:

@dataset =
        map { $$_[1] }
        sort { $$a[0] <=> $$b[0] }
        map { [ substr($_, 0, 6), $_ ] } @dataset;

That's a little trickier. First, we build a list of all the sub strings, then we compare, then we restore the original, but now sorted list. Usually, this is faster, if a sort has complex transformations needed to compare data and the list is big. However, I'm not sure how much faster a dereference for those array refs I used is going to be over a substr() call.

Again, you would have to benchmark to see if we're making any meaningful gains here, which I'm doubtful of. Perl includes a standard Benchmark module for this.

I hope that helps.

James


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Need help sorting by specific fields in file.

Reply via email to