Hey all,

I've been poking around the sources and documentation for some insight into the 
details of how ansistrings are implemented, and I am left with some questions.


It would be nice if, when comparing two ansistrings, fpc would first check to see if 
these two pointers are pointing to the same spot in memory, i.e. the same TAnsiRec. If 
they happen to be pointing to the same, a potentially long operation is reduced to a 
simple comparison of two memory addresses which probably only takes one processor 
cycle.

Looking at fpc_ansistr_compare in astrings.inc, and at cgadd.addstring (the only 
function that seems to call fpc_ansistr_compare), it appears not to do this. Perhaps 
I'm wrong? I don't _really_ understand what the code is doing. I believe the sources 
I'm looking at are 1.0.10.

If this quick comparison is in fact not implemented, I'd like to do it myself. (There 
are a number of places where I am checking long ansistrings for equality, and there is 
a reasonable chance that both pointers are pointing to the same address.)

( @s1[1] = @s2[1] ) seems to give the right result. is this the best way? or is it 
quicker/slower to use ( pointer(s1) = pointer(s2) )   (no doubt more elegant)



Now on to the second question... getting those ansistrings pointed to the same 
address! (Some of them already are, but I'd like to get more...)

I was kind of surprised to find that

  s1 := 'hello';
  s2 := 'hello';
  writeln ( pointer(s1) = pointer(s2) );     ...writes FALSE

Thus I assume that
  readln (s1);
  readln (s2);    ... would NEVER point them at the same address

Of course, checking every string against every other string would comprise an absurd 
performance hit in most cases. What I'd really like is to have a relatively small 
number of constant strings that could be compared against, and only when reading in a 
data file, or perhaps certain fields in a data file. (yeah, reading will take 
longer)... Then if my data file (and thus my filled pascal array) has 1,000 instances 
of "some_complex_but_often_identically_repeated_data_value", I get the following:
  - lots of memory savings
  - operator "=" gives a very fast TRUE result when they are pointing to the same

In fact, all the string comparison operators return a particular value if the two are 
equal, so they could all give a fast result in this case. This could happen both when 
comparing datum values against each other, and when comparing them to a constant 
string in my code.

Do resource strings offer this kind of intelligence, or are they designed for a 
completely different purpose, and offer no performance improvement for this special 
application?

Is there some special mode or compiler directive that does more string uniqueness 
checking, including at compile time (e.g. to find my two identical 'hello' 
assignments)?

Or shall I bite the bullet and decide whether I need to implement some of this stuff 
myself?

(( Typcasting my data into something that's not literal strings would of course be the 
best thing to do. However, I'm not quite ready for it yet, and would like to do what I 
can to help performance until I get to the point where I can put my data in pascal 
records. ))


Cheers!
-David



_______________________________________________
fpc-pascal maillist  -  [EMAIL PROTECTED]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to