Re: Cross-field multi-word and query

2005-10-24 Thread Chris Hostetter
I may be wrong, but i think what you are talking about is a BooleanQuery containing several MaxDisjunctionQuery. take a look at the code in this patch... http://issues.apache.org/jira/browse/LUCENE-323 : Date: Mon, 24 Oct 2005 20:13:55 +0300 : From: Maxim Patramanskij <[EMAIL PROTECTED

Re: Delete doesn't delete?

2005-10-24 Thread Chris Hostetter
can you provide some sample code that demonstrates this problem? ... preferably something that uses hardcoded data to built up an index in a RAMDirectory so ayone can run the test without needing any external data? it would make it a lot easier for other people to help you diagnose what's going o

Re: Webfarm and Index Location

2005-10-24 Thread Chris Hostetter
This thread proved very usefull for several of us when discussing this topic in the past... http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/[EMAIL PROTECTED] : Date: Mon, 24 Oct 2005 07:06:49 -0400 : From: [EMAIL PROTECTED] : Reply-To: java-user@lucene.apache.org : To: j

Re: Frustrated with tokenized listing terms

2005-10-24 Thread Chris Hostetter
: I've solved this by indexing the field twice, once as author:(searchable/not : stored/tokenized) : and once as author_phrased:(not searchable/stored/not tokenized). : This works, but is it the proper way to do it? It's the most effective/efficient method i can think of. -Hoss

Another index corruption problem

2005-10-24 Thread Bill Tschumy
Many months ago I wrote this list about a corrupted index that one of my customers had. It was a mystery that was never really solved. Well, it has happened again and the stack trace looks almost identical. Here is the exception: java.io.FileNotFoundException: /Users/samegan/Library/Pref

Re: Index downwards compatible?

2005-10-24 Thread Otis Gospodnetic
Eva, Please see the CHANGES file (you can see it directly in the repository), where we record all important changes to the code, including index compatibility changes. Otis --- Eva Rissmann <[EMAIL PROTECTED]> wrote: > Hi all, > currently we are using Lucene 1.3, but soon we'd like to switch t

RE: Indexing problem - empty index files!

2005-10-24 Thread Koji Sekiguchi
Samuel, IndexWriter should be opened once and keep it open until all documents are added to the writer, then close the writer. modified sample code: final IndexWriter writer = new IndexWriter(indexLocation, new StandardAnalyzer(),true); for (Iterator iter = someData.iterator(); iter.hasNext();

Re: lucene and databases

2005-10-24 Thread Rick Hillegas
Thanks, Steven. This is an interesting approach which looks like it gets the user up and running pretty fast. Cheers, -Rick Steven Rowe wrote: Code and examples for embedding Lucene in HSQLDB and Derby relational databases: Rick Hillegas wro

Re: lucene and databases

2005-10-24 Thread Rick Hillegas
Thanks, Chris. I have a couple follow-on questions: 1) Thanks for the pointer to DBSight.net. It seems that DBSight has built some integration support for MySQL. Do you know if there are any plans to build integration support for Derby, the Apache open source database (http://db.apache.org/der

Indexing problem - empty index files!

2005-10-24 Thread Samuel Jackson
Hi to all! I'm new to Lucene and wanted to create a sample application to index certain database fields. But there seems to be some problem because the created files in the index target directory are only 1kb --> So I don't get any results of course. Here is what I did - can anyone give me a hi

Delete doesn't delete?

2005-10-24 Thread Dan Quaroni
I know there's a little bit of trickery when it comes to deletes (i.e. it's still in the index until optimize, still available to open readers, etc) however I'm having this problem: I've implemented a call to delete by term. It tells me that it deleted 1 item, but then I go and open a new read

Re: lucene and databases

2005-10-24 Thread Chris Lu
Also, you can try Compass. I remember it stores the index when you use hibernate. Chris Lu -- Lucene Full-Text Search on Any Database http://www.DBSight.net On 10/24/05, Chris Lu <[EMAIL PROTECTED]> wrote: > JDBCDirectory doesn't help you to index content in rdms. > It

Re: Is there a way to get absolutely exact phrase matching (no stop words, etc)

2005-10-24 Thread Steven Rowe
Hi Bob, StandardAnalyzer filters the token stream created by StandardTokenizer through StandardFilter, LowercaseFilter, and then StopFilter. Unless you supply a stoplist to the StandardAnalyzer constructor, you get the default set of English stopwords, from StopAnalyzer: public static fin

Index downwards compatible?

2005-10-24 Thread Eva Rissmann
Hi all, currently we are using Lucene 1.3, but soon we'd like to switch to Lucene 1.4. Can the old index be used or does it have to be recreated? And what about Lucene 1.9/2.0. Is the index downwards compatible? Thanks Eva - To

Is there a way to get absolutely exact phrase matching (no stop words, etc)

2005-10-24 Thread Bob Mason
We have a large body of documents that have xml and ocr embedded within one of the xml fields. Searches such as "group effect" are returning hits for docs such as ones that include the following: ...group of ~a- The effect... because, I take it, stop words like 'of' and 'the' and punctuation

Re: Non-scoring fields

2005-10-24 Thread Andrzej Bialecki
Daniel Naber wrote: On Montag 24 Oktober 2005 14:29, Maik Schreiber wrote: Just a quick question: How do I add non-scoring fields to a query? Set boost to 0? Yes, just use permissions:blah^0 However, a side effect of this is that Explanations are broken (return always "0.0: match require

RE: Non-scoring fields

2005-10-24 Thread Mordo, Aviran (EXP N-NANNATEK)
You can also use a filter to filter your results. As far as I know Filter does not effect the score HTH Aviran http://www.aviransplace.com -Original Message- From: Maik Schreiber [mailto:[EMAIL PROTECTED] Sent: Monday, October 24, 2005 2:24 PM To: java-user@lucene.apache.org Subject: R

Re: Non-scoring fields

2005-10-24 Thread Maik Schreiber
Just a quick question: How do I add non-scoring fields to a query? Set boost to 0? Yes, just use permissions:blah^0 Cool, thanks. -- Maik Schreiber * http://www.blizzy.de GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 Key fingerprint: CF19 AFCE 6E3D 5443 959

Re: Non-scoring fields

2005-10-24 Thread Daniel Naber
On Montag 24 Oktober 2005 14:29, Maik Schreiber wrote: > Just a quick question: How do I add non-scoring fields to a query? Set > boost to 0? Yes, just use permissions:blah^0 Regards Daniel -- http://www.danielnaber.de - To

Re: indexwriter and index searcher

2005-10-24 Thread Otis Gospodnetic
Hi Malcolm, --- MALCOLM CLARK <[EMAIL PROTECTED]> wrote: > Hi all, > > I am relatively new and scared by Lucene so please don't flame me. I won't flame, but you just hijacked somebody else's thread baaad boy! > I have abandoned Digester and am now just using other SAX stuff. No need to be

Re: lucene and databases

2005-10-24 Thread Steven Rowe
Code and examples for embedding Lucene in HSQLDB and Derby relational databases: Rick Hillegas wrote: Thanks to Yonik for replying to my last question about queries and filters. Now I have another issue. I would appreciate any pointers to attem

Re: lucene and databases

2005-10-24 Thread Chris Lu
JDBCDirectory doesn't help you to index content in rdms. It just stores the lucene index into rdms. This approach will be slower than file system based approach. For your first question, "Indexing content that is stored in a dbms", you can take a look at DBSight. It's a generic tool to easily extr

lucene and databases

2005-10-24 Thread Rick Hillegas
Thanks to Yonik for replying to my last question about queries and filters. Now I have another issue. I would appreciate any pointers to attempts to integrate Lucene with databases. There's a tantalizing reference to a class called JDBCDirectory mentioned at http://wiki.apache.org/jakarta-luce

Re: How Fast is MemoryIndex? How Much Resource Does It Use?

2005-10-24 Thread mark harwood
It is fast. >> so, why not use it for the normal operation as well? Because it only stores one document. Given the number of queries you have I'm not sure I'd run them all. How about putting them as docs into a categorisation index then using the subject document as a query to selct a subset of t

Cross-field multi-word and query

2005-10-24 Thread Maxim Patramanskij
I have the following problem: I need to construct programmatically a Boolean query against n fields having m words in my query. All possible unique combinations(sub-queries) are disjunctive between each other while boolean clauses of each combination combines with AND operator. The reason of su

Re: How Fast is MemoryIndex? How Much Resource Does It Use?

2005-10-24 Thread Christophe
Hi, Sam, Is there a reason you couldn't build a test case and try it, in your environment and on your hardware? That seems to be the only way to really answer the question. On 24 Oct 2005, at 09:54, Sam Lee wrote: How much of a performance impact if I store queries as documents first? A

Re: How Fast is MemoryIndex? How Much Resource Does It Use?

2005-10-24 Thread Sam Lee
How much of a performance impact if I store queries as documents first? Actually, I just thought of a way to first select queries with certain quality before doing memoryindex, so it will trim it to much less than 10. But has anyone done MemoryIndex? I need some real-world examples that

Non-scoring fields

2005-10-24 Thread Maik Schreiber
Hi, Just a quick question: How do I add non-scoring fields to a query? Set boost to 0? To be more specific, my documents have a "permissions" field containing the names of groups who are allowed to access the document. When searching, I search for the particular user's group (a user is in exactl

Re: indexwriter and index searcher

2005-10-24 Thread Erik Hatcher
I think you really need to show us some code. If your XML documents are small enough, then perhaps DOM (via JDOM) would be a much simpler way to navigate XML via XPath. Erik On 24 Oct 2005, at 11:07, MALCOLM CLARK wrote: Hi all, I am relatively new and scared by Lucene so please don

Re: indexwriter and index searcher

2005-10-24 Thread MALCOLM CLARK
Hi all, I am relatively new and scared by Lucene so please don't flame me.I have abandoned Digester and am now just using other SAX stuff. I have used the sandbox stuff to parse an XML file with SAX which then bungs it into a document in a Lucene index.The bit I'm stuck on is how is a element

Re: How Fast is MemoryIndex? How Much Resource Does It Use?

2005-10-24 Thread markharw00d
If so, why not use it for the normal operation as well? Because MemoryIndex only allows you to store/query one document. It is fast, but I would not suggest running 1 queries against it. Why not try store the queries as documents in a special index and query them using the subject documen

Re: indexwriter and index searcher

2005-10-24 Thread Erik Hatcher
On 24 Oct 2005, at 10:07, Dan Adams wrote: If I have a directory open and I open an index writer and add a document do I have to close the directory and re-open it before I can open a searcher and have the new document be included in the search? Yes. In general, is it good to keep the dire

indexwriter and index searcher

2005-10-24 Thread Dan Adams
If I have a directory open and I open an index writer and add a document do I have to close the directory and re-open it before I can open a searcher and have the new document be included in the search? In general, is it good to keep the directory open or is it better to open the document each tim

Re: Recommendation on Reading or Websites or Examples of How to Use Lucene?

2005-10-24 Thread Grant Ingersoll
I find the unit tests in the actual code to be quite helpful. Have also found various talks and articles sprinkled throughout the web. I think the Wiki lists some of these, see: http://wiki.apache.org/jakarta-lucene/HowTo http://wiki.apache.org/jakarta-lucene/Resources Sam Lee wrote: Hi,

Webfarm and Index Location

2005-10-24 Thread msftblows
Hey- I would like to store my index in one location, and then have all my IIS servers on the farm call that one index. Basically, I am looking for the best approach here...and any ideas anyone has... Options: 1. Store index on SAN and have each server call that location...seems this is an

Re: How Fast is MemoryIndex? How Much Resource Does It Use?

2005-10-24 Thread Olena Medelyan
Hi Sam, to do such matching you first of all need something that keeps semantic information about words: e.g. a thesaurus, where "red", "blue" and "black" are all grouped under the same term "colour". Otherwise, how will your system know that "nike red shoes" should match to "nike shoes -black" an

Frustrated with tokenized listing terms

2005-10-24 Thread JMA
Greetings... Quick question, perhaps I am missing something. I have a bunch of documents where one of the indexed fields is "author". For example: book1, by "John Smith" book2, by "Steve Smith" book3, by "John Smith" I would like to find all distinct authors in my index. I want to support sear