The tokenizer that Erik was referring to is actually a PHP lexer, not a general tokenizer. Perhaps you would be looking for the strtok() function?

www.php.net/strtok

Brad Wright wrote:
Erik, thanks, are you able to pint me to some good reference sources on
tokenizer's... i have never come across them before


I have been scouring the web, and am coming up a decided blank. :)


Cheers,


Brad


Nel vino la verità, nella birra la forza, nell'acqua i bacilli -------------------------------------------------------------------------- In wine there is truth, in beer there is strength, in water there are bacteria


From: Erik Price <[EMAIL PROTECTED]>
Date: Tue, 18 Mar 2003 16:45:47 -0500
To: Brad Wright <[EMAIL PROTECTED]>
Cc: PHP General List <[EMAIL PROTECTED]>
Subject: Re: [PHP] Using PHP to get a word count of a MSword doc



Brad Wright wrote:

Thanks for the reply Rene,

Any change of a code sample of how u did this?? Im not at all experienced in
Java.

According to the manual, PHP does have some tokenizer functions:


http://www.php.net/manual/en/ref.tokenizer.php

However, the documentation appears to be lacking as they are still under
development.  Using it might be somewhat straightforward if you are
accustomed to using a tokenizer in another language (like Java) but if
not, it's really a little too difficult to explain in an email.

A less elegant but ultimately quicker and probably more reliable
solution might be to investigate some kind of external word-counting
program that knows how to parse .DOC files (good luck on that part), and
call this from your PHP script using system().  Catch-22: the only
libraries I am familiar with that can parse .DOC files are the Jakarta
POI libraries, which are written in Java.  But I am sure that if you
scour the web you can find some Perl, Python, or maybe even PHP-based
solution.


Erik








-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to