Hello,
I did an indexer that parses some files and indexes them using lucene. I
want to benchmark the whole thing, so I'd like to count the tokens
being indexed so I can calculate the average number of indexed tokens
per second. Is there a way to count the number of tokens on a document?
While I'm
On Oct 30, 2008, at 7:28 PM, Anshul jain wrote:
I want to give more weight to some terms in the document. Like title
of the
book should be given more weight than the contents. And we are
testing over
a wide varieties of lucene queries, with quotes, w/o quotes, phrase,
span
etc.
If the w
For indexing, I use the following:
===
writer = new IndexWriter(INDEX_DIR,new WhitespaceAnalyzer(),true
,IndexWriter.MaxFieldLength.UNLIMITED);
Document doc = new Document();
String tmpword = this.getProperForm(word1, word2);
doc.add(new Field("WORDS", tmpword, Field.Store.YES,
Yes, the problem goes away when I do the following:
synchronized(doc)
{
doc.add(field);
}
Thanks.
[I'll use a Lock to do this properly]
-glen
2008/10/31 Yonik Seeley <[EMAIL PROTECTED]>:
> On Fri, Oct 31, 2008 at 11:53 AM, Glen Newton <[EMAIL PROTECTED]> wrote:
>> I have concurrent threads
On Fri, Oct 31, 2008 at 11:53 AM, Glen Newton <[EMAIL PROTECTED]> wrote:
> I have concurrent threads adding Fields to the same Document, but
> getting some odd behaviour.
> Before going into too much depth, is Document thread-safe?
No, it's not.
synchronizing on Document when adding a new field wo
Hello,
I am using Lucene 2.3.1.
I have concurrent threads adding Fields to the same Document, but
getting some odd behaviour.
Before going into too much depth, is Document thread-safe?
thanks,
Glen
http://zzzoot.blogspot.com/
--
-
---
Thanks very much for your helps, I will inform
if we can improve later in any way.
Best regards, Lisheng
-Original Message-
From: Albert Juhe [mailto:[EMAIL PROTECTED]
Sent: Friday, October 31, 2008 5:49 AM
To: java-user@lucene.apache.org
Subject: Re: Any Spanish analyzer available?
Hi
Hi,
This is my first version, it isn't fast, because I want to get this
information without modifying index.
Now I'm working to improve it (including freeling).
public String docsTerme(IndexReader reader, String terme) {
String resultat = "";
TermPositions tP;
ArrayList a
You need to give us more information for meaningful replies, like
the analyzers you use when indexing and searching, the exact
query you use, perhaps the snippets of the code, etc.
That said, things to check:
Get a copy of Luke and examine your index. You can even
run queries through that tool and
Was my message sent successfully ?
I received this automated response from [EMAIL PROTECTED] right after sending
the message !!
===
Dear sender,
Delivery of your message has failed. This is an automatic reply.
The domain magentanews.com has been changed and is longer in use. Please rese
Hi,
Actually I'm using a Spanish analyzer for my search engine, I don't know if
it's the best, but its useful for my purpose.
http://www.nabble.com/file/p20265229/SpanishAnalyzer.java
SpanishAnalyzer.java
http://www.nabble.com/file/p20265229/SpanishStemFilter.java
SpanishStemFilter.java
http:/
I have documents containing multiple words in the the field "word"
for example, one of the documents contain in the field "word" the following:
homeowners work
When searching for single words (i.e. homewoners ) I get hits.
However, searching for the exact phrase "homeowners work" gives me no hits
Thanks for the quick reply :). For now, I'd settle with just storing cache
values in soft references so at least the GC would be able to free up some
space when it needs to.
I think I'll just try to override the default sorting mechanism by
subclassing FieldSortedHitQueue. I'll let you know how i
20 fields on a huge index? Wow - not sure there is a ton you can do with
that...anyone have any suggestions for that one? Distributed should help
I suppose, but thats a lot of sort fields for a large index.
If LUCENE-831 ever gets off the ground you will be able to change the
cache used, and p
Hi,
I'm having a similar problem with my application, although we are using
lucene 2.3.2. The problem we have is that we are required to sort on most of
the fields (20 at least). Is there any way of changing the cache being used?
I can't seem to find a way, since the cache is being accessed using
Erick Erickson wrote:
I'm not sure what *could* be easier than looping with IndexSearcher.doc(),
looping from 1 to maxDoc. Of course you'll have to pay some attention to
whether you get a document back or not, and I'm not quite sure whether you'd
have to worry about getting deleted documents. But
16 matches
Mail list logo