> -----Original Message-----
> From: Southworth, Harry [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 24, 2003 10:38 PM
> To: [EMAIL PROTECTED]
> Subject: MS Word question
> 
> 
> I'm running Perl on Cygwin on top of Windows 2000.
> 
> I have a lot of ascii text files that someone has thoughfully saved as
> Microsoft Word documents. If I open them in a text editor, I 
> can see the
> ascii text, but there is some junk at the top and bottom. 
> Testing the files
> with -T tells me they're not text.
> 
> grep seems able to search through the text in the files, but 
> I'd like to
> strip the junk out and save them as simple text files. I 
> tried a few things
> in Perl, but didnt' get anywhere.
> 
> I'd be grateful if someone were able to provide some pointers.
> 

Well you could use Win32::OLE to automate saving the docs as text.
The process is fairly simple.

<snip>
use strict;
use Win32::OLE;

Win32::OLE->Option(Warn => sub {&error});

use constant TEXTFORMAT => 2;

my $wd;
my $doc;

my $docin  = 'c:\temp\doccy.doc';
my $docout = 'c:\temp\doccy.txt';

$wd = new Win32::OLE("Word.Application");
$doc = $wd->Documents->Open($docin);
$doc->SaveAs($docout, TEXTFORMAT);
$doc->Close();

undef $wd;

sub error
{
        die Win32::OLE->LastError;
}
</snip>

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to