document ids in "cached" in Hits and index merge

2005-06-24 Thread jian chen
Hi, I have a stupid question regarding the transient nature of the document ids. As I understand, documents will obtain new doc ids during the index merge. Suppose if you do a search and got the Hits object. When you iterate through the documents by id, the index merge happens. How the merge and

Re: Span query performance issue

2005-06-24 Thread jian chen
Hi, I think Span query in general should do more work than simple Phrase query. Phrase query, in its simplest form, should just try to find all terms that are adjacent to each other. Meanwhile, Span query does not necessary be adjacent to each other, but, with other words in between. Therefore, I

issues building a large index

2005-06-24 Thread Lokesh Bajaj
Hi; I am a newcomer to this list and trying out Lucene for the first time. It looks really useful and I am evaluating it for a potentially very large index that my company might need to build. As I was investigating using Lucene, I wanted to know what the performance of optimize/index merg

Span query performance issue

2005-06-24 Thread yahootintin . 11533894
Hi, I'm comparing SpanNearQuery to PhraseQuery results and noticing about an 8x difference on Linux. Is a SpanNearQuery doing 8x as much work? I'm considering diving into the code if the results sounds unusual to people. But if its really doing that much more work, I won't spend time optimiz

Make Document Class a Interface

2005-06-24 Thread Rohit Lodha
Hi all, Why can we not make document class as interface? -Rohit

Re: Alternate Lucene Query Highlighter

2005-06-24 Thread Erik Hatcher
David - please create a new issue in Lucene's Bugzilla system (see the Lucene website for a link) and then add your code to the newly created issue. This sounds like a very valuable contribution! Erik On Jun 24, 2005, at 4:46 PM, Bohl, David wrote: I created a lucene query highlighter

Alternate Lucene Query Highlighter

2005-06-24 Thread Bohl, David
I created a lucene query highlighter (borrowing some code from the one in the sandbox) that my company is using. It better handles phrase queries, doesn't break HTML entities, and has the ability to either highlight terms in an entire document or to highlight fragments from the document. I would

Re: IndexSearcher

2005-06-24 Thread Dan Funk
Lucene uses a lock file to prevent simultaneous writes to index. You can just delete the file at C:\DOCUME~1\tom\LOCALS~1\Temp\Lucene-81022e186820264e5b78801c219b8e8b-commit.lock and be on your way. avrootshell wrote: Hi, I'm using using lucene for full text search. It worked gr8. But now

Re: Field.Keyword vs new Field(String, String, true, true, true)

2005-06-24 Thread Erik Hatcher
On Jun 24, 2005, at 2:46 PM, Yousef Ourabi wrote: I have a quick question on the Field class. What is the difference between this: for () Field content = new Field("content", contentArray[i], true, true, true, true); doc.add(content); and this: doc.add(Field.Keyword("userAlias", userAli

Re: Document ID

2005-06-24 Thread Mario Ivankovits
Hi! Is there any way to force the document id inside the lucene index, if I have my own internal numbering scheme, it would be nice to have that reflected inside the lucene index...anyway? Simply put your ID as additional field to your document. You never should rely on lucenes document id as

Re: Re-indexing

2005-06-24 Thread Erik Hatcher
On Jun 24, 2005, at 3:15 PM, [EMAIL PROTECTED] wrote: Does lucene have adaptive re-indexing option? I have indexed several large tables. I need to add extra documents to the tables every now and then. Do I need to re-index the whole table all the time or there is any way to add the new doc

Re: Document ID

2005-06-24 Thread Erik Hatcher
On Jun 24, 2005, at 3:08 PM, Yousef Ourabi wrote: Hello: Is there any way to force the document id inside the lucene index, if I have my own internal numbering scheme, it would be nice to have that reflected inside the lucene index...anyway? For a domain-centric identifier, use a custom field

Re: QueryParser implicit conjunction

2005-06-24 Thread Erik Hatcher
On Jun 24, 2005, at 2:54 PM, John Fereira wrote: Last month there was a brief thread about changing the implicit conjuction for search terms from an OR to AND with a response that the API provides a setOperator method for doing so. A site I am developing also required that "AND" be the i

Re-indexing

2005-06-24 Thread tareque
Does lucene have adaptive re-indexing option? I have indexed several large tables. I need to add extra documents to the tables every now and then. Do I need to re-index the whole table all the time or there is any way to add the new documents to the indexing with less fuss? Thanks Tareque --

Document ID

2005-06-24 Thread Yousef Ourabi
Hello: Is there any way to force the document id inside the lucene index, if I have my own internal numbering scheme, it would be nice to have that reflected inside the lucene index...anyway? IF not is the document ID created on creation or on addition to the index? And is there a way to retrieve

QueryParser implicit conjunction

2005-06-24 Thread John Fereira
Last month there was a brief thread about changing the implicit conjuction for search terms from an OR to AND with a response that the API provides a setOperator method for doing so. A site I am developing also required that "AND" be the implicit conjuction so I've tried changing that using

Field.Keyword vs new Field(String, String, true, true, true)

2005-06-24 Thread Yousef Ourabi
I have a quick question on the Field class. What is the difference between this: for () Field content = new Field("content", contentArray[i], true, true, true, true); doc.add(content); and this: doc.add(Field.Keyword("userAlias", userAlias )); In the first example where the constructor is uesed

IndexSearcher

2005-06-24 Thread avrootshell
Hi, I'm using using lucene for full text search. It worked gr8. But now when i try to search,its throwing an error like this: .java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]:\DOCUME~1\tom\LOCALS~1\Temp\lucene-81022e186820264e5b78801c219b8e8b-commit.lock at org.apache.l

SimpleHTMLFormatter Class

2005-06-24 Thread Giovanni Dima
I would want to create an extract of text in order to evidence the text part in which some keywords appear (tipical of seach engine results). The java class SimpleHTMLFormatter, Highlighter can be useful to this? They belong package lucene or th are the other java package? Thanks.. Giovanni

Re: Sorting by field from a given 'offset'

2005-06-24 Thread Stephane Bailliez
Stephane Bailliez wrote: Hi all, "starting from document that has field f1 = a, give me the first n documents sorted by field 'z' ordered by asc/desc" To be more specific, that's something like: [...] // get the document field 'dt' from a search ... // search must occur starting from the d

Re:Re:How lucene and nucth work together?

2005-06-24 Thread Giovanni Dima
Hi Andrzej. Thanks a lot for your advices! Using Luke I've resolved my problem! Bye! Giovanni

Sorting by field from a given 'offset'

2005-06-24 Thread Stephane Bailliez
Hi all, I'm trying to do the following: "starting from document that has field f1 = a, give me the first n documents sorted by field 'z' ordered by asc/desc" Is it possible to build a complex query more efficient than: 1) search for the document where f1 = a 2) get the field f2 from this do

Re: Best way to index document page by page?

2005-06-24 Thread Erik Hatcher
On Jun 24, 2005, at 3:28 AM, JMA wrote: Greetings, I have a requirement to search documents page by page. For example, in a 500 page document, if someone searches for "foo", I need to return "Found foo on page 4,6,24,100,223,401, and 455". The way I've implemented this is to index eac

Best way to index document page by page?

2005-06-24 Thread JMA
Greetings, I have a requirement to search documents page by page. For example, in a 500 page document, if someone searches for "foo", I need to return "Found foo on page 4,6,24,100,223,401, and 455". The way I've implemented this is to index each *page* separately, so my 500 page document is a