Chris Hostetter wrote:
: I think though, that I will need a setter on the reader, rather than the
: writer. That is, I don't know what factor we want until I know how
: large the index is. And I don't know how large the index will be at the
: time of creating the writer, but I can just ask for
Overall avg. freq.? No, but you should be able to calculate that
yourself.
Otis
--- Supheakmungkol SARIN <[EMAIL PROTECTED]> wrote:
> Thanks for your help.
>
> By the way does Lucene provide any API to retrieve the
> average frequecy of a term in the index directly? My
> goal is to compare the
Dear all,
I'd like to add some other stopwords to the
StandardAnalyzer. How do i do this?
Thanks a lot in advance,
Mungkol
__
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com
--
Thanks for your help.
By the way does Lucene provide any API to retrieve the
average frequecy of a term in the index directly? My
goal is to compare the freq. of a term in a doc. with
the average freq. of that term of all the indexed doc.
in order to retrieve the good keywords.
Regards,
Mungkol
On Nov 13, 2005, at 6:27 PM, Chris Hostetter wrote:
I believe if you really want to determine settings like this after
building the index, you'll need to do an initial build the index using
best guess values -- then if the calculations you do once the index is
built aren't close enough to your
On Nov 13, 2005, at 8:19 PM, Friedland, Zachary (EDS - Strategy) wrote:
What is the largest lucene index that has been built? We're looking to
build a sort of data warehouse that will hold transaction log files as
long as possible. This index would grow at the rate of 10 million
documents per m
Largest index? Who knows! :)
Lucene's internal limit is the size of the doc Id (max Integer).
People typically roll their indices when they reach a certain size, but
if you don't need your queries to be fast and always need all the data,
then this may not make sense for you (well, it still may, a
Check out Lucene from CVS and look in the contrib/ directory:
contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java
Otis
--- Supheakmungkol SARIN <[EMAIL PROTECTED]> wrote:
> Dear all,
>
> I'd like to extract each term and its frequency in the
> index and each file in order
What is the largest lucene index that has been built? We're looking to
build a sort of data warehouse that will hold transaction log files as
long as possible. This index would grow at the rate of 10 million
documents per month indefinitely. Is there a limit where lucene will
fail? What should
Dear all,
I'd like to extract each term and its frequency in the
index and each file in order to get the potential
keywords of each file. Does Lucene provide any
built-in method to do that?
Thank you in advance,
Mungkol
_
: I think though, that I will need a setter on the reader, rather than the
: writer. That is, I don't know what factor we want until I know how
: large the index is. And I don't know how large the index will be at the
: time of creating the writer, but I can just ask for maxDoc() at the time
: o
On Nov 13, 2005, at 6:11 PM, Daniel Noll wrote:
Now, to figure out how to set it.
There's no setter that I can see... then again it may be in trunk,
and just not in the version we're stuck on for the time being.
I haven't checked 1.4.3, but yes, I'm looking at the subversion
trunk. It's
Marvin Humphrey wrote:
You want indexInterval. Here's an excerpt from the docs in
TermInfosWriter.
Excellent, that looks like exactly what we're after. Now, to figure out
how to set it.
There's no setter that I can see... then again it may be in trunk, and
just not in the version we're
Hi, Karl,
Looking at the Lucene 1.2 source code, looks to me that the
MultiFieldQueryParser generates a BooleanQuery. Each sub-query with the
BooleanQuery is for one field. The actually calculation of the scoring is
with BooleanScorer.java, where the scores from each sub-query is
accumulated.
So,
Hello all,
I have a question about searching within multiple fields. I have the
following code for doing that (searchFields provides two fields in which I
want to search):
IndexSearcher searcher = new IndexSearcher(indexDirectory);
// search over multiple index fields
Query query = MultiFieldQuer
: Oh...ok. Where is this method created then, I can't seem to find it in
: QueryParser?
grep for "Query Query"
-Hoss
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hello Sebastian,
thank you for sharing your experience. I am happy that I am not the only
person with this problem.
I have read the previous paper by Robertson et al
http://citeseer.ist.psu.edu/robertson04simple.html
where he wrote about the danger of using combined scores and provided a
solut
Oh...ok. Where is this method created then, I can't seem to find it in
QueryParser?
Thanks.
--
Regards,
Eugene
Erik Hatcher wrote:
:)
Query(field) in this case is a method call.
Erik
-
To unsubscribe, e-mail: [E
On 13 Nov 2005, at 13:39, Eugene Ezekiel wrote:
I got this nagging problem that I can't figure out in the source
code of Lucene.
In the file org/apache/lucene/queryParser/QueryParser.java,
there's a method called parse that returns a Query (see below):
public Query parse(String query) thr
I got this nagging problem that I can't figure out in the source code of
Lucene.
In the file org/apache/lucene/queryParser/QueryParser.java,
there's a method called parse that returns a Query (see below):
public Query parse(String query) throws ParseException {
ReInit(new FastCharStream(n
On Sun, Nov 13, 2005 at 12:04:41AM +0100, Karl Koch wrote:
> My aim is to combine this two scores. The Lucenes score is normalisied
> between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less then
> 1.0 (if it did not). The user model looks the same in this perspective -
> although base
21 matches
Mail list logo