Hi all i am new to the list!
i am new to Linux and new to PERL too. I am trying to get this perl script up and running. I have installed OpenSuse-Linux 11.3 What is wanted: I have a bunch of HTML-files, stored in a folder. with the Perl-Script (see below) i want to parse the HTML-files. I have stored the script to the following place: Basisordner (german word for base folder) > user > perl > My question is - how to name the paths ... a. to the html-folder that contains the HTML-files that need to be parsed (i named this folder html.files) b. how to name the file that has to be created... i suggest that this files also is located in the same directory: Basisfolder (german word for base folder) > user > perl > guess that this makes it easy... Please do not bear with me for the Noob-Questions. If i have to explain more - please let me know! Love to hear from your - Many thanks in advance for any and all help. floobee here the code #!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::TokeParser; # my $file = 'school.html'; my @html_files = File::Find::Rule->file->name( '*.html.files' )->in( $ +html_dir ); my $p = HTML::TokeParser->new($file) or die "Can't open: $!"; my %school; while (my $tag = $p->get_tag('div', '/html')) { # first move to the right div that contains the information last if $tag->[0] eq '/html'; next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inh +alt_large'; $p->get_tag('h1'); $school{'location'} = $p->get_text('/h1'); while (my $tag = $p->get_tag('div')) { last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq +'fusszeile'; # get the school name from the heading next unless exists $tag->[1]{'class'} and $tag->[1]{'c +lass'} eq 'fm_linkeSpalte'; $p->get_tag('h2'); $school{'name'} = $p->get_text('/h2'); # verify format for school type $tag = $p->get_tag('span'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class +'} eq 'schulart_text') { warn "unexpected format: parsing stopped"; last; } $school{'type'} = $p->get_text('/span'); # verify format for address $tag = $p->get_tag('p'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class +'} eq 'einzel_text') { warn "unexpected format: parsing stopped"; last; } $school{'address'} = clean_address($p->get_text('/p')) +; # find the description $tag = $p->get_tag('p'); $school{'description'} = $p->get_text('/p'); } } print qq/$school{'name'}n/; print qq/$school{'location'}n/; print qq/$school{'type'}n/; foreach (@{$school{'address'}}) { print "$_\n"; } print qq/nDescription: $school{'description'}n/; sub clean_address { my $text = shift; my @lines = split "\n", $text; foreach (@lines) { s/^s+//; s/s+$//; } return @lines; } ___________________________________________________________ GRATIS: Spider-Man 1-3 sowie 300 weitere Videos! Jetzt kostenlose Movie-FLAT freischalten! http://movieflat.web.de -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/