Ok, I have looked at the example that was the slowest: 5 times slower than gcc.
k-nucleitide: http://shootout.alioth.debian.org/u32/benchmark.php?test=knucleotide&lang=all The C++ code is giving me headaches from only looking at it. I am still not sure what the hell they have done (or cheated) to reach this speed. Looking at the Python example explains what the benchmark actually does in less than a screenpage. I am trying to implement a completely new Pascal version. Below is what I have already, it will be about 50% faster than the previous Pascal example. I am cheating too. Instead of a hash table in the commonly known sense my "hash table" is implemented as a trie. It still does what the hash table would be supposed to do, and therefore does not violate the spirit of this benchmark, it will hold key-value pairs, the key is the DNA fragment (string) and the value is its count. in a wider sense this could be seen as just a different way to implement a hash table, to the outside it is behaving like one. The main goal is to first count the occurrences of fragments of several sizes and store the count for each of them in a hash table (I store them in a trie because I think it is about storing an looking up vast amounts of key-value pairs and not how exactly this hash table is internally implemented). Then in the second step use this to look up and print a few of them. 'GTCA' or 'gtca', when you shr 1 and $03 each char you end up with numbers from 0..3, so the first thing I do is transform the entire thing into a string of numbers 0..3. Then my alphabet has only 4 letters. Each node in my trie has an array of 4 sub nodes for the possible next letter of the key. I am counting all the frequencies all at once, each time a node is missing I insert it and if it is already there I increase its count. This will give me all counts of all fragments of all lengths from 1..18 with one pass. Then I look up and print the required ones. The sorted output of 'aa'..'tt' sorted by frequency is still missing. I don't have much time for this today anymore, I will continue later. Also I doubt they will ever accept it. I only wanted to know why on earth the C++ version can be so fast and why the existing Pascal version is so slow and how fast this problem can be solved at all. Bernd
kn2.lpr
Description: Binary data
callgrind.out.3803
Description: Binary data
_______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal