I use JTidy also, but not for Lucene parsing. There is no easy way of
handling this, you simply have to remove all crappy Microsoft inserts as
they come.
-Original Message-
From: Gaston [mailto:[EMAIL PROTECTED]
Sent: 03 December 2005 13:49
To: java-user@lucene.apache.org
Subject: best ht
On Dec 3, 2005, at 1:26 PM, Jeff Rodenburg wrote:
In one of the Google Labs whitepapers (
http://labs.google.com/papers/mapreduce-osdi04.pdf), a programming
construct
known as MapReduce is used in a variety of jobs/tasks within Google's
operation. As an example of the application of MapRedu
On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote:
> Paul Elschot wrote:
>
> >In somewhat more readable layout:
> >
> >+(url:term1^4.0 anchor:term1^2.0 content:term1
> > title:term1^1.5 host:term1^2.0)
> >+(url:term2^4.0 anchor:term2^2.0 content:term2
> > title:term2^1.5 host:term2^
In one of the Google Labs whitepapers (
http://labs.google.com/papers/mapreduce-osdi04.pdf), a programming construct
known as MapReduce is used in a variety of jobs/tasks within Google's
operation. As an example of the application of MapReduce, the whitepaper
refers to Distributed Sorting.
Essent
Hallo,
JTidy is a very good HTMLParser but for HTML Websites made with the help
of Microssoft Office Products like Word for example it is not optimal.
Because ist returns "Microsoft specific HTML Tags" instead of only text.
Or as should I handle HTML Pages with source begins so
"
http://ww
Paul Elschot wrote:
In somewhat more readable layout:
+(url:term1^4.0 anchor:term1^2.0 content:term1
title:term1^1.5 host:term1^2.0)
+(url:term2^4.0 anchor:term2^2.0 content:term2
title:term2^1.5 host:term2^2.0)
url:"term1 term2"~2147483647^4.0
anchor:"term1 term2"~4^2.0
content:"term1 t
Doug Cutting wrote:
Andrzej Bialecki wrote:
For a simple TermQuery, if the DF(term) is above 10%, the response
time from IndexSearcher.search() is around 400ms (repeatable, after
warm-up). For such complex phrase queries the response time is around
1 sec or more (again, after warm-up).
Ar
On Dec 2, 2005, at 6:21 PM, John Powers wrote:
Hello,
Lucene only lets you use a wildcard after a term, not before, correct?
What work arounds are there for that?
If I have an item 108585-123
And another 332323-123
How can I look for all the -123 family of items?
To clarify something that no