Re: statistics of text

Shawn Corey Wed, 02 Nov 2005 07:51:02 -0800

Ing. Branislav Gerzo wrote:

I did this by hand...but anyone know how to this effectively in perl?
I think I have to build hash of all possibilities of 2 words sentences (in
input txt are allowed only [0-9a-z ]), in list I will have lines of
input txt, and iterate every key in hash over array, writing value to
hash its occurence ("foo bar" => 5)...hm ?


Not that this program reports that "foo bar" occurs twice on line 8.
"bar bar" - 2 times (lines: 4, 6)
"bar foo" - 3 times (lines: 5, 6, 8)
"foo bar" - 6 times (lines: 3, 4, 5, 7, 8, 8)
"foo bars" - 1 time (lines: 10)
"foo foo" - 1 time (lines: 7)
"foob bar" - 1 time (lines: 9)

To see the structure or %Pairs, uncomment the print Dumper line.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my %Pairs = ();

while( <DATA> ){
  chomp;
  my @words = split /\s+/;
  for( my $word = shift @words;
       @words;
       $word = shift @words
  ){
    push @{ $Pairs{$word}{$words[0]}}, $.;
  }
}
# print Dumper( \%Pairs );

for my $first ( sort keys %Pairs ){
  for my $second ( sort keys %{ $Pairs{$first} } ){
    my @lines = @{ $Pairs{$first}{$second} };
    my $count = scalar( @lines ) . " time";
    $count .= 's' unless scalar( @lines ) == 1;

print "\"$first $second\" - $count (lines: ", join( ', ', @lines ),")\n";

  }
}

__END__
foo
bar
foo bar
foo bar bar
bar foo bar
bar bar foo
foo foo bar
foo bar foo bar
foob bar
foo bars


--

Just my 0.00000002 million dollars worth,
   --- Shawn

"Probability is now one. Any problems that are left are your own."
   SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: statistics of text

Reply via email to