Not actually a hash-table, but an AvgLvlTree, which can be used the same way. The AvgLvlTree unit comes with Lazarus; if you don't have that, you can download avglvltree.pas here:
http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/avglvltree.pas?root=lazarus&view=log There doesn't seem to be a lot of information available on using regular expressions and hash-tables in Free Pascal, so I decided to post this. The program reads a text file and counts the number of unique words, and also displays the number of times the most common word was found. Since TStringToPointerTree actually maps strings to pointers, I wrote some tiny utility functions to handle the casting from integers to pointers and back. Although I was worried that this tree-structure would be slower than a hash-table, it turned out to be quite fast. {$mode objfpc} uses AvgLvlTree, regexpr, classes, dateutils, Sysutils; const file_to_process = 'Bartlett--Quotations.txt'; type // string-integer table t_si_table = TStringToPointerTree; t_si_item = PStringToPointerItem; procedure si_table_put( table: t_si_table; k: string; v: longint ); begin table[ k ] := pointer( v ) end; function si_table_get( table : t_si_table; key : string ): longint; begin exit( longint( table[ key ] ) ) end; procedure si_table_inc( table: t_si_table; k: string ); begin table[ k ] := pointer( si_table_get( table, k ) + 1 ) ; end; function si_table_key( item : t_si_item ): string; begin exit( item^.name ) end; function si_table_val( item : t_si_item ): longint; begin exit( longint( item^.value ) ) end; function si_table_new( sensitive : boolean ): t_si_table; begin exit( t_si_table.create( sensitive ) ) end; var table : t_si_table ; table_item : t_si_item ; line, word : string; re : TRegExpr; max_count, count : int32; lines : TStringList; time_1 : TDateTime; begin time_1 := time; re := TRegExpr.Create; re.Expression := '(?i)[a-z]+' ; // case-insensitive table := si_table_new( true ) ; // true to make it case-sensitive. lines := TStringList.Create; lines.LoadFromFile( file_to_process ); for line in lines do begin if re.exec( line ) then begin si_table_inc( table, re.match[0] ); while re.execNext do si_table_inc( table, re.match[0] ); end; end; lines.free; writeln( 'Unique word count: ', table.count ); max_count := 0; for table_item in table do begin count := si_table_val( table_item ); if count > max_count then begin word := si_table_key( table_item ); max_count := count end; end; writeln( 'Commonest word: ', word, ' (', max_count, ')' ); writeln( 'Elapsed time: ', milliSecondsBetween( time_1, time ), ' milliseconds.' ); table.free; re.free end. Processing the file "Bartlett--Quotations.txt", which is about 319 kilobytes in size, produces this output: Unique word count: 8911 Commonest word: the (2200) Elapsed time: 94 milliseconds. Documentation for TRegExpr can be found here: http://regexpstudio.com/TRegExpr/Help/tregexpr_interface.html _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal