Have you tried antiword?
http://www.winfield.demon.nl/
karl
11 jan 2010 kl. 21.04 skrev maxSchlein:
I was looking for an option for Text extraction from a word doc.
Currently I am using POI; however, when there is a table in the doc,
for
each column POI brings back a . The whites
I was looking for an option for Text extraction from a word doc.
Currently I am using POI; however, when there is a table in the doc, for
each column POI brings back a . The whitespace analyzer is not filtering
out this character. So whatever word or phrase that is the last word or
phrase wi
Hi,
I'm using Lucene 2.4.1 and am seeing occasional index corruption. It shows
up when I call MultiSearcher.search(). MultiSearcher.search() throws the
following exception:
ArrayIndexOutOfBoundsException. The error is: Array index out of range: ###
where ### is a number representing an index
Super! Thanks for bringing closure.
Mike
On Mon, Jan 11, 2010 at 12:55 PM, Yuliya Palchaninava wrote:
> Thanks again.
>
> Disabling norms, where it was possible without influencing the search quality,
> has solved the problem:
> - The not optimized version of the index has become smaller.
> - T
Thanks again.
Disabling norms, where it was possible without influencing the search quality,
has solved the problem:
- The not optimized version of the index has become smaller.
- The optimized index has practically the same size as the not optimized one.
Yuliya
> -Ursprüngliche Nachricht---
Hey out there,
in lucene it's not possible to create a Field based on a TokenStream
AND supply a stored value.
Is there a reason why a Field constructor in the form of
public Field(String name, TokenStream tokenStream, String storedValue)
does not exist?
I am using trees of TeeSinkTokenFilter
If you're searching for terms "giving" and "and", it will only
highlight those terms, not the whole sentence.. that's how the
highlighter is meant to work: highlight what the user did query. Also
there's no built-in concept of sentence.
regards,
Sanne
2010/1/11 Li Leon :
> Just figured out, misse
changing MultiTermRewriteMethod fixed all previous incompatibility issue.
After setting this:
myQueryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
highlighter get compatible with rewrite, query.rewrite().toString() works as
before and scoring works fine for wildc