On Wed, May 27, 2009 at 6:27 PM, Stephen Reese <rsre...@gmail.com> wrote:
> List,
>
> I've been working on a method to parse a PDF or TXT document and
> output the results to XML over at Experts Exchange.
> http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24439630.html
>
> You may view the attached document or if the mailing list doesn't
> allow here is a copy of the document I would like to parse:
> http://filedb.experts-exchange.com/incoming/2009/05_w22/143310/XenApp-Secure-Gateway-Server-VL0.txt
>
> Basically I would like to take the following code and modify it to
> parse a TXT instead of a PDF document:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
> use CAM::PDF;
>
> my $pdf = CAM::PDF->new('XenApp_WebInterface_Server_VL04.pdf');
> my $text;
> foreach (1..$pdf->numPages) {
>        $text .= $pdf->getPageText($_);
> }
>
> while($text =~ /Vulnerability Key:\s*
> (\S+)\s+STIG ID:\s*
> (\S+)\s+Release Number:\s*
> (\S+)\s+Status:\s*
> (\S+)\s+Short Name:\s*
> (\S+)\s+Long Name:\s*
> (\S+)\s+IA Controls:\s*
> (\S+)\s+Categories:\s*
> (\S+)\s+Effective Date:\s*
> (\S+)\s+Condition:\s*
> (\S+)\s+Policy:\s*
> (\S+)/g) {
>
> print "<Vuln>
> <Vulnerability_Key_>$1</Vulnerability_Key_>
> <STIG_ID>$2</STIG_ID_>
> <Release_Number_>$3</Release_Number_>
> <Status_>$4</Status_>
> <Short_Name_>$5</Short_Name_>
> <Long_Name_>$6</Long_Name_>
> <IA_Controls_><IA_Control><ID>$7<ID></IA_Control></IA_Controls_>
> <Categories_>$8</Categories_>
> <Effective_Date_>$9</Effective_Date_>
> <Condition_><subitem><title>$10</title><data></data></subitem></Condition_>
> <Policy_>$11</Policy_>
> </Vuln>\n";
> }
>

I've tried to modify the script but I'm all over the place. Should I
use a WHILE statement to open the FILE and and then FOREACH to parse
each set of data? Or the other way around? Thanks

#!/usr/bin/perl
use strict;
use warnings;

open (FILE, 'XenApp_WebInterface_Server_VL04.txt');

while(<FILE>)
{
foreach($_ =~ /Vulnerability Key:\s*
(\S+)\s+STIG ID:\s*
(\S+)\s+Release Number:\s*
(\S+)\s+Status:\s*
(\S+)\s+Short Name:\s*
(\S+)\s+Long Name:\s*
(\S+)\s+IA Controls:\s*
(\S+)\s+Categories:\s*
(\S+)\s+Effective Date:\s*
(\S+)\s+Condition:\s*
(\S+)\s+Policy:\s*
(\S+)/g) {

print "<Vuln>
<Vulnerability_Key_>$1</Vulnerability_Key_>
<STIG_ID>$2</STIG_ID_>
<Release_Number_>$3</Release_Number_>
<Status_>$4</Status_>
<Short_Name_>$5</Short_Name_>
<Long_Name_>$6</Long_Name_>
<IA_Controls_><IA_Control><ID>$7<ID></IA_Control></IA_Controls_>
<Categories_>$8</Categories_>
<Effective_Date_>$9</Effective_Date_>
<Condition_><subitem><title>$10</title><data></data></subitem></Condition_>
<Policy_>$11</Policy_>
</Vuln>\n";
}

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to