Ing. Branislav Gerzo am Mittwoch, 2. November 2005 14.52: > Hi all, > > I have quite interesting work. Example: > > In txt I have some words (up to 100.000) - words.txt (without line > numbers): > 1. foo > 2. bar > 3. foo bar > 4. foo bar bar > 5. bar foo bar > 6. bar bar foo > 7. foo foo bar > 8. foo bar foo bar > 9. foob bar > 10.foo bars > > and so on... > > Now, I have to find all 2 words sentences with their sums in the list. > For example for this list it could be (without reporting lines): > "foo bar" - 5 times (lines: 3, 4, 5, 7, 8) > "bar bar" - 2 times (lines: 4, 6) > "bar foo" - 3 times (lines: 5, 6, 8) > "foo foo" - 1 time (line: 7) > "foob bar" - 1 time (line: 9) > "foo bars" - 1 time (line: 10) > > I did this by hand...but anyone know how to this effectively in perl? > I think I have to build hash of all possibilities of 2 words sentences (in > input txt are allowed only [0-9a-z ]), in list I will have lines of > input txt, and iterate every key in hash over array, writing value to > hash its occurence ("foo bar" => 5)...hm ?
Here is another variant: - combining only adjacent words - counting all pairs per line - keyword must be part of pair #!/usr/bin/perl -w use strict; use warnings; my $prereq=qr/\bfoo\b/; # pair must match this my %found; while (<DATA>) { /$prereq/ or next; # filtering not enough chomp; # eventually also trim line my @w=split /\s+/; next if @w < 2; map {$found{$w[$_].' '.$w[$_+1]}++} [EMAIL PROTECTED]; } print join "\n", grep /$prereq/, # filtering the rest map "$_: ".$found{$_}, keys %found __END__ foo bar foo bar foo bar bar bar foo bar bar bar foo foo foo bar foo bar foo bar foob bar foo bars # prints foo foo: 1 foo bar: 6 bar foo: 3 foo bars: 1 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>