Thinking about this, about a week ago, there was a discussion about parsing word
document. Just to dig the text from doc. It sounds interesting, and since I have
in my crazy mind an idea, that I'd create a database of all documents that were
ever created in our company and put them in to the database, I'd also need to
know what's in those docs. So I followed the given link and it seems to be
pretty easy. Just install a program and pass it a document and it will parse the
text. The link is:
http://wvware.sourceforge.net/
Check out the site, may be you could find something similiar for rtf documents,
'cause I think its format is much easier to "crack".
hth
Dezider.
Plutarck wrote:
> Ick...I'd say it's a good idea, but it's going to be a bi...tter fight with
> technology.
>
> First, you have to have some application do the loading/unloading. PHP can't
> do that, of course.
>
> But, you could use some form of java...but you'd have to get fancy. Or you
> could just use file upload in a form, which is easier.
>
> If you do that, you need only parse out the file.
>
> The best way to do that is pick a text format that does what you want it to
> do, and is universal across platforms. You don't even need to worry about
> the editor they use, as long as it's saved in the proper format.
>
> I reccomend you use either a word document, or perhaps Rich Text Format is
> best (rtf).
>
> Then you just have to figure out how text is saved in that format, and
> viola. You just use PHP to go from there...
>
> ....I'm sure it's easier said than done, and I have absolutely no clue how
> the content of rtf files is different from txt (but I'd love to know!), but
> I can see it being very possible if you pick only a few standard file
> formats, and use the file upload features.
>
> It's actually a very good idea. I'm surprised no one has done it...which
> should probably worry you ;)
>
> --
> Plutarck
> Should be working on something...
> ....but forgot what it was.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]