On Wed, May 27, 2009 at 6:27 PM, Stephen Reese <rsre...@gmail.com> wrote: > List, > > I've been working on a method to parse a PDF or TXT document and > output the results to XML over at Experts Exchange. > http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24439630.html > > You may view the attached document or if the mailing list doesn't > allow here is a copy of the document I would like to parse: > http://filedb.experts-exchange.com/incoming/2009/05_w22/143310/XenApp-Secure-Gateway-Server-VL0.txt > > Basically I would like to take the following code and modify it to > parse a TXT instead of a PDF document: > > #!/usr/bin/perl > use strict; > use warnings; > use Data::Dumper; > use CAM::PDF; > > my $pdf = CAM::PDF->new('XenApp_WebInterface_Server_VL04.pdf'); > my $text; > foreach (1..$pdf->numPages) { > $text .= $pdf->getPageText($_); > } > > while($text =~ /Vulnerability Key:\s* > (\S+)\s+STIG ID:\s* > (\S+)\s+Release Number:\s* > (\S+)\s+Status:\s* > (\S+)\s+Short Name:\s* > (\S+)\s+Long Name:\s* > (\S+)\s+IA Controls:\s* > (\S+)\s+Categories:\s* > (\S+)\s+Effective Date:\s* > (\S+)\s+Condition:\s* > (\S+)\s+Policy:\s* > (\S+)/g) { > > print "<Vuln> > <Vulnerability_Key_>$1</Vulnerability_Key_> > <STIG_ID>$2</STIG_ID_> > <Release_Number_>$3</Release_Number_> > <Status_>$4</Status_> > <Short_Name_>$5</Short_Name_> > <Long_Name_>$6</Long_Name_> > <IA_Controls_><IA_Control><ID>$7<ID></IA_Control></IA_Controls_> > <Categories_>$8</Categories_> > <Effective_Date_>$9</Effective_Date_> > <Condition_><subitem><title>$10</title><data></data></subitem></Condition_> > <Policy_>$11</Policy_> > </Vuln>\n"; > } >
I've tried to modify the script but I'm all over the place. Should I use a WHILE statement to open the FILE and and then FOREACH to parse each set of data? Or the other way around? Thanks #!/usr/bin/perl use strict; use warnings; open (FILE, 'XenApp_WebInterface_Server_VL04.txt'); while(<FILE>) { foreach($_ =~ /Vulnerability Key:\s* (\S+)\s+STIG ID:\s* (\S+)\s+Release Number:\s* (\S+)\s+Status:\s* (\S+)\s+Short Name:\s* (\S+)\s+Long Name:\s* (\S+)\s+IA Controls:\s* (\S+)\s+Categories:\s* (\S+)\s+Effective Date:\s* (\S+)\s+Condition:\s* (\S+)\s+Policy:\s* (\S+)/g) { print "<Vuln> <Vulnerability_Key_>$1</Vulnerability_Key_> <STIG_ID>$2</STIG_ID_> <Release_Number_>$3</Release_Number_> <Status_>$4</Status_> <Short_Name_>$5</Short_Name_> <Long_Name_>$6</Long_Name_> <IA_Controls_><IA_Control><ID>$7<ID></IA_Control></IA_Controls_> <Categories_>$8</Categories_> <Effective_Date_>$9</Effective_Date_> <Condition_><subitem><title>$10</title><data></data></subitem></Condition_> <Policy_>$11</Policy_> </Vuln>\n"; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/