Hi

I need to extract the text from a word document.

I know how  to :

Print a word document using the default printer :

----------------------------------------------------------------------------
---------------------------------------------------------------------
use strict;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Word';

my $Word = Win32::OLE->new('Word.Application', 'Quit');
# $Word->{'Visible'} = 1; # if you want to see what's going on
$Word->Documents->Open("C:\\DOCUMENTS\\test.doc")
|| die("Unable to open document ", Win32::OLE->LastError());
$Word->ActiveDocument->PrintOut({
Background => 0,
Append => 0,
Range => wdPrintAllDocument,
Item => wdPrintDocumentContent,
Copies => 1,
PageType => wdPrintAllPages,
});
----------------------------------------------------------------------------
---------------------------------------------------------------------
Create a new word document :
----------------------------------------------------------------------------
---------------------------------------------------------------------
use Win32::OLE;
# check if Word exists
my $x = Win32::OLE->GetActiveObject('Word.Application');
die "Word not Installed" if $@;

# start Word program die if unable to
unless (defined $x) {
        $x = Win32::OLE->new('Word.Application', sub { $_[0]->Quit; } ) 
        or die 'Cannot start Word';
}

# Create new document
my $d = $x->Documents->Add;
# define selection
my $s = $x->Selection;
#set lines to be written to document
@line = ('This is a test line',
'This is second test Line',
'This is the third line',
);

# $c is the color
# $start is the start of Range
# $end is the end of Range
# $r is the Range object
my ($c, $start, $end, $r) = (2, 0, 0, );
foreach (@line)
{ 
        $end += length($_) + 1;
        # put the text
        $s->TypeText($_);
        # define the Range
        $r = $d->Range($start, $end);
        # Set font to 12 and color
        $r->Font->{Size} = 12;
        $r->Font->{ColorIndex} = $c++;
        $s->TypeText("\n");
        $start = $end;
}

# List Range Objects
ListObj($r);
#List Document Objects
ListObj($d);
sub ListObj 
{
        foreach (sort keys %$r) 
        {
                print "Keys: $_ - $r->{$_}\n";
        }
}
undef $x;
----------------------------------------------------------------------------
---------------------------------------------------------------------
Now I want to extract the text from the word document and store it in a
string for manipulation.
I've been hacking at it for two days now but success remains illusive :(
I would also like to use this as an MS Word to Plain Text conversion
utility.
Any help would be greatly appreciated.

-aman





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to