Does lucene performance suffer with a lot of empty fields ?
I have 1 generic index, but am Indexing a lot of different things, like actors, politicians, scientists, sportsmen. And as you can see that though there are some common fields, like name & DOB, there are also fields for each of these types of people that are different. e.g. Actors will have "Movies, TV shows, ", politicians will have "Political party...", scientists will have "publications, inventions ..." Also, I do not want to create multiple indexes, as the number of such types & hence the number of indices can get out of hand, eg I could decide to add "footballers", "tennis players". I am sure I am not the 1st who's facing this problem. From what I gather, I can go ahead & create an Index & for each Document & only add the relevant fields. Is this correct? I should still be able to search with queries like "mel Movies:braveheart". Right ? Would this impact the search performance ? Any other words of caution for me ? Thanks, mek
FileNotFoundException
When the indexing process still running on a index and I try to search something on this index I retrive this error message: java.io.FileNotFoundException: \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot find the file specified) How can I solve this.
RE: Sorting
: I take your point that Berkley DB would be much less clumsy, but an : application that's already using a relational database for other purposes : might as well use that relational database, no? if you already have some need to access data about each matching doc from a relational DB, then sure you might as well let it sort for you -- but just bcause your APP has some DB connections open doesn't mean that's a worthwhile reason to ask it to do the sort ... your app might have some netowrk connections open to an IMAP server as well .. that doesn't mean you should convert the docs to email messages and ask the IMAP server to sort them :) : I'm not really with you on the random access file, Chris. Here's where I am : up to with my [mis-]understanding... : : I want to sort on 2 terms. Happily these can be ints (the first is an INT : corresponding to a 10 minute timestamp "YYMMDDHHI" and the second INT is a : hash of a string, used to group similar documents together within those 10 : minute timestamps). When I initially warm up the FieldCache (first search : after opening the Searcher), I start by generating two random access files : with int values at offsets corresponding to document IDs for each of these; : the first file would have ints corresponding to the timestamp and the second : would have integers corresponding to the hash. I'd then need to generate a : third file which is equivalent to an array dimensioned by document ID, with : document IDs in compound sort order?? i'm not sure why you think you need the third file ... you should be able to use the two files you created exactly the way the existing code would use the two arrays if you were using an in memory FieldCache (with file seeks instead of array lookups) .. i think the class you want to look at is FieldSortedHitQueue : In a big index, it will take a while to walk through all of the documents to : generate the first two random access files and the sort process required to : generate the sorted file is going to be hard work. well .. yes. but that's the trade off, the reason for the RAM based FieldCache is speed .. if you don't have that RAM to use, then doing the same things on disk gets slower. Bear in mind, there have been some improvements recently to the ability to grab individual stored fields per document (FieldSelector is the name of the class i think) ... i haven't tried those out yet, but they could make Sorting on a stored field (which wouldn't require building up any cache - RAM or Disk based) feasible regardless of the size of your result sets ... but i haven't tried that yet. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)
: Sure I would love to! Can you ping me at [EMAIL PROTECTED] and : let me know what I need to do? Do I just post it to JIRA? instructions on submitting code can be found in the wiki.. http://wiki.apache.org/jakarta-lucene/HowToContribute note in particular that since you are primarily submiting new files, you'll need to "svn add" them locally in order for them to be included in patches created by "svn diff". As for where it might make sense for them to live: there is an existing "contrib/analyzers" package which might make the most sense. Also note that while test cases aren't stricly mandatory for newly contributed code, it does go a long way towards documenting expected behavior, and encouraging committers to commit it :) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: dash-words
Hi Yonik, >> So a Phrase search to "The xmen story" will fail. With a slop of 1 the >> doc will be found. >> >> But when generating the query I won't know when to use a slop. So adding >> slops isn't a nice solution. > > If you can't tolerate slop, this is a problem. I use the WordDelimiterFilter now without slop, because in other cases it's an amelioration. But I (or better my app) stumbled now over a non Phrase Query: If I am searching for a title named (sorry for the german example). "lage der arbeiterjugend in westberlin" (indexed with WordDelimiterFilter + lowercase) with a query like this +arbeiterjugend +west-berlin I get no results. org.apache.lucene.queryParser.QueryParser.parse makes this query (with WordDelimiterFilter) with Default QueryParser.AND_OPERATOR: +titel:arbeiterjugend +titel:"west (berlin westberlin)" with +arbeiterjugend +westberlin I get the result. It seems that the synonyms don't work with the query. How do you solve this in Solr? Do I have to build a TermQuery? thanks in advance, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting
> file seeks instead of array lookups I'm with you now. So you do seeks in your comparator. For a large index you might as well use java.io.RandomAccessFile for the "array", because there would be little value in buffering when the comparator is liable to jump all around the file. This sounds very expensive, though. If you don't open a Searcher to frequently, it makes sense (in my muddled mind) to pre-sort to reduce the number of seeks. That was the half-baked idea of the third file, which essentially orders document IDs. > Bear in mind, there have been some improvements recently to the ability to grab individual stored fields per document I can't see anything like that in 2.0. Is that something in the Lucene HEAD build? -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 09:37 To: java-user@lucene.apache.org Subject: RE: Sorting : I take your point that Berkley DB would be much less clumsy, but an : application that's already using a relational database for other purposes : might as well use that relational database, no? if you already have some need to access data about each matching doc from a relational DB, then sure you might as well let it sort for you -- but just bcause your APP has some DB connections open doesn't mean that's a worthwhile reason to ask it to do the sort ... your app might have some netowrk connections open to an IMAP server as well .. that doesn't mean you should convert the docs to email messages and ask the IMAP server to sort them :) : I'm not really with you on the random access file, Chris. Here's where I am : up to with my [mis-]understanding... : : I want to sort on 2 terms. Happily these can be ints (the first is an INT : corresponding to a 10 minute timestamp "YYMMDDHHI" and the second INT is a : hash of a string, used to group similar documents together within those 10 : minute timestamps). When I initially warm up the FieldCache (first search : after opening the Searcher), I start by generating two random access files : with int values at offsets corresponding to document IDs for each of these; : the first file would have ints corresponding to the timestamp and the second : would have integers corresponding to the hash. I'd then need to generate a : third file which is equivalent to an array dimensioned by document ID, with : document IDs in compound sort order?? i'm not sure why you think you need the third file ... you should be able to use the two files you created exactly the way the existing code would use the two arrays if you were using an in memory FieldCache (with file seeks instead of array lookups) .. i think the class you want to look at is FieldSortedHitQueue : In a big index, it will take a while to walk through all of the documents to : generate the first two random access files and the sort process required to : generate the sorted file is going to be hard work. well .. yes. but that's the trade off, the reason for the RAM based FieldCache is speed .. if you don't have that RAM to use, then doing the same things on disk gets slower. Bear in mind, there have been some improvements recently to the ability to grab individual stored fields per document (FieldSelector is the name of the class i think) ... i haven't tried those out yet, but they could make Sorting on a stored field (which wouldn't require building up any cache - RAM or Disk based) feasible regardless of the size of your result sets ... but i haven't tried that yet. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] smime.p7s Description: S/MIME cryptographic signature
searching oracle databse records using apache Lucene
Hi All, I am confused with Apache Lucene. I want to search my databse table records using apache lucene. But what i found is that Lucene is full-text search engine.This means is it only used to search documents text or anything else ? I want to search my databse like e.g. select * from tableName where username="abc"; using apache lucene. I am using Oracle 9i/Java for this. Any idea/link/suggessions will be very much appreciable. Thanks in advance Sandip Patil. -- View this message in context: http://www.nabble.com/searching-oracle-databse-records-using-apache-Lucene-tf2032743.html#a5591986 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Seach In slide with lucene
Dear All, I am facing a unknown situaltion. I am using webdav search, it is working fine, i know it is slower than lucene. I am using jakarta-slide-2.1 and lucene-2.1. I have configured my domain.xml file as:- ./index I saw that in store/index folder is getting created. But in searching slide is not using lucene index. Can anybody tell me :- 1) If i am using right versions of slide and lucene. 1). How can i search slide using lucene index. 2). What will be the structure of query. Thanks in Advance... - Heres a new way to find what you're looking for - Yahoo! Answers
Re: searching oracle databse records using apache Lucene
hi sandip, first get all those fields on which you want search and store it in some var. then apply indexing with these var. then fire ur search query regards amit kumar DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Pvt. Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Pvt. Ltd. does not accept any liability for virus infected mails. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
When the indexing process still running on a index and I try to search something on this index I retrive this error message: java.io.FileNotFoundException: \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot find the file specified) How can I solve this. Could you provide some more context about your application or a small test case that shows the error happening? This sounds likely to be a locking issue. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileNotFoundException
For the index process I use IndexModifier class. That happens when I try to search something into the index in the same time that the index process still running. the code for indexing: System.setProperty("org.apache.lucene.lockDir", System .getProperty("user.dir")); File folder = new File(getIndexPath()); Directory dir = null; if (folder.isDirectory() && folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), false); } else if (!folder.isFile() && !folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), true); } else { System.out.println("Bad index folder"); System.exit(1); } boolean newIndex = true; if (dir.fileExists("segments")) { newIndex = false; } // long lastindexation = dir.fileModified("segments"); writer = new IndexModifier(dir, new SimpleAnalyzer(), newIndex); dir.close(); writer.setUseCompoundFile(true); ... Code For searching: MultiSearcher multisearch = new MultiSearcher(indexsearcher); Hits hits = this.multisearch.search(this.getBoolQuery()); ... -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 13:45 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException > When the indexing process still running on a index and I try to search > something on this index I retrive this error message: > java.io.FileNotFoundException: > \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot find > the file specified) > > How can I solve this. Could you provide some more context about your application or a small test case that shows the error happening? This sounds likely to be a locking issue. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Seach In slide with lucene
I believe you'll need to inquire with the Slide community, which unfortunately is a bit inactive lately. Erik On Aug 1, 2006, at 7:31 AM, aslam bari wrote: Dear All, I am facing a unknown situaltion. I am using webdav search, it is working fine, i know it is slower than lucene. I am using jakarta- slide-2.1 and lucene-2.1. I have configured my domain.xml file as:- ./index I saw that in store/index folder is getting created. But in searching slide is not using lucene index. Can anybody tell me :- 1) If i am using right versions of slide and lucene. 1). How can i search slide using lucene index. 2). What will be the structure of query. Thanks in Advance... - Here’s a new way to find what you're looking for - Yahoo! Answers - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Does lucene performance suffer with a lot of empty fields ?
I can't speak to performance, but there's no problem having different fields for different documents. Stated differently, you don't need to have all fields in all documents. It took me a while to get my head out of database tables and accept this I doubt there's a problem with speed, but as always some measurements over your particular data count most. Erick On 8/1/06, Mek <[EMAIL PROTECTED]> wrote: I have 1 generic index, but am Indexing a lot of different things, like actors, politicians, scientists, sportsmen. And as you can see that though there are some common fields, like name & DOB, there are also fields for each of these types of people that are different. e.g. Actors will have "Movies, TV shows, ", politicians will have "Political party...", scientists will have "publications, inventions ..." Also, I do not want to create multiple indexes, as the number of such types & hence the number of indices can get out of hand, eg I could decide to add "footballers", "tennis players". I am sure I am not the 1st who's facing this problem. From what I gather, I can go ahead & create an Index & for each Document & only add the relevant fields. Is this correct? I should still be able to search with queries like "mel Movies:braveheart". Right ? Would this impact the search performance ? Any other words of caution for me ? Thanks, mek
Re: searching oracle databse records using apache Lucene
You're absolutely right, lucene is a text searching tool, not a database tool. There's no point in trying to jump through hoops to use lucene if your database already works for you. If you're trying to do text searches, particularly if want to ask questions like "find the words biggest and large within 5 words of each other", then you might want to think about lucene. Or even if you want to just make simple searches over text. But to select rows from a database table, there's no reason to try to use lucene. Use a database API instead. Best Erick On 8/1/06, Sandip <[EMAIL PROTECTED]> wrote: Hi All, I am confused with Apache Lucene. I want to search my databse table records using apache lucene. But what i found is that Lucene is full-text search engine.This means is it only used to search documents text or anything else ? I want to search my databse like e.g. select * from tableName where username="abc"; using apache lucene. I am using Oracle 9i/Java for this. Any idea/link/suggessions will be very much appreciable. Thanks in advance Sandip Patil. -- View this message in context: http://www.nabble.com/searching-oracle-databse-records-using-apache-Lucene-tf2032743.html#a5591986 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
two things come to mind 1> are you absolutely sure that your reader and writer are pointing to the same place? Really, absolutely, positively sure? You've hard-coded the path into both writer and reader just to be really, absolutely positively sure? Or, you could let the writer close and *then* try the reader to see if it's a timing issue or a path issue. 2> You say that the indexer is still open. Is there any chance it hasn't yet written anything to disk? I'm not sure of the internals, but there has been some discussion that internally a writer uses a RAMdir for a while then periodically flushes the results to disk. It's possible that you're writer hasn't written anything yet. 3> (so I can't count). Have you used Luke to open your index to see if that works (and the file is in the place you expect)? FWIW Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: For the index process I use IndexModifier class. That happens when I try to search something into the index in the same time that the index process still running. the code for indexing: System.setProperty("org.apache.lucene.lockDir", System .getProperty("user.dir")); File folder = new File(getIndexPath()); Directory dir = null; if (folder.isDirectory() && folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), false); } else if (!folder.isFile() && !folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), true); } else { System.out.println("Bad index folder"); System.exit(1); } boolean newIndex = true; if (dir.fileExists("segments")) { newIndex = false; } // long lastindexation = dir.fileModified("segments"); writer = new IndexModifier(dir, new SimpleAnalyzer(), newIndex); dir.close(); writer.setUseCompoundFile(true); ... Code For searching: MultiSearcher multisearch = new MultiSearcher(indexsearcher); Hits hits = this.multisearch.search(this.getBoolQuery()); ... -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 13:45 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException > When the indexing process still running on a index and I try to search > something on this index I retrive this error message: > java.io.FileNotFoundException: > \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot find > the file specified) > > How can I solve this. Could you provide some more context about your application or a small test case that shows the error happening? This sounds likely to be a locking issue. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileNotFoundException
It's the same when I try to open the index with luke -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:24 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException two things come to mind 1> are you absolutely sure that your reader and writer are pointing to the same place? Really, absolutely, positively sure? You've hard-coded the path into both writer and reader just to be really, absolutely positively sure? Or, you could let the writer close and *then* try the reader to see if it's a timing issue or a path issue. 2> You say that the indexer is still open. Is there any chance it hasn't yet written anything to disk? I'm not sure of the internals, but there has been some discussion that internally a writer uses a RAMdir for a while then periodically flushes the results to disk. It's possible that you're writer hasn't written anything yet. 3> (so I can't count). Have you used Luke to open your index to see if that works (and the file is in the place you expect)? FWIW Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: > > For the index process I use IndexModifier class. > That happens when I try to search something into the index in the same > time that the index process still running. > > the code for indexing: > System.setProperty("org.apache.lucene.lockDir", System > .getProperty("user.dir")); > File folder = new File(getIndexPath()); > Directory dir = null; > if (folder.isDirectory() && folder.exists()) { > dir = FSDirectory.getDirectory(getIndexPath(), false); > } else if (!folder.isFile() && !folder.exists()) { > dir = FSDirectory.getDirectory(getIndexPath(), true); > } else { > System.out.println("Bad index folder"); > System.exit(1); > } > boolean newIndex = true; > if (dir.fileExists("segments")) { > newIndex = false; > } > // long lastindexation = dir.fileModified("segments"); > writer = new IndexModifier(dir, new SimpleAnalyzer(), newIndex); > dir.close(); > writer.setUseCompoundFile(true); > ... > > Code For searching: > > MultiSearcher multisearch = new MultiSearcher(indexsearcher); > Hits hits = this.multisearch.search(this.getBoolQuery()); > ... > > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: 01 August 2006 13:45 > To: java-user@lucene.apache.org > Subject: Re: FileNotFoundException > > > > When the indexing process still running on a index and I try to search > > something on this index I retrive this error message: > > java.io.FileNotFoundException: > > \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot > find > > the file specified) > > > > How can I solve this. > > Could you provide some more context about your application or a small > test case that shows the error happening? This sounds likely to be a > locking issue. > > Mike > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching oracle databse records using apache Lucene
Eric, I'm sure that is entirely true. E.g. in E&P industry we have a bunch of legacy relational databases that are tremendously complex. Therefore the presentation layer for them is never good since the user is exposed to the data model complexity every time he uses this database. So, giving up on the structured API idea, flattening out the content and indexing is seems to be not a bad idea at all. Especially if you are facing the problem of integrating results from several legacy databases or integrating the flat file store with a database. Regards, Vasily On Tue, 2006-08-01 at 09:18 -0400, Erick Erickson wrote: > You're absolutely right, lucene is a text searching tool, not a database > tool. There's no point in trying to jump through hoops to use lucene if your > database already works for you. > > If you're trying to do text searches, particularly if want to ask questions > like "find the words biggest and large within 5 words of each other", then > you might want to think about lucene. Or even if you want to just make > simple searches over text. > > But to select rows from a database table, there's no reason to try to use > lucene. Use a database API instead. > > Best > Erick > > On 8/1/06, Sandip <[EMAIL PROTECTED]> wrote: > > > > > > Hi All, > > > > I am confused with Apache Lucene. > > > > I want to search my databse table records using apache lucene. > > But what i found is that Lucene is full-text search engine.This means is > > it > > only used to search documents text or anything else ? > > > > I want to search my databse like e.g. > > > > select * from tableName where username="abc"; > > using apache lucene. > > > > I am using Oracle 9i/Java for this. > > Any idea/link/suggessions will be very much appreciable. > > > > Thanks in advance > > Sandip Patil. > > -- > > View this message in context: > > http://www.nabble.com/searching-oracle-databse-records-using-apache-Lucene-tf2032743.html#a5591986 > > Sent from the Lucene - Java Users forum at Nabble.com. > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --- Vasily Borisov Director Business Development Kadme AS Tel: +47 51 87 42 54 Fax: + 47 51 87 42 51 Mob: +47 45 20 40 42 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
So it sounds like you're not writing the index to the place you think you are. Have you just looked in the directories and checked that there are files there? If Luke can't find them, they're not where you think they are. Especially if your writer had closed before you looked. Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: It's the same when I try to open the index with luke -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:24 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException two things come to mind 1> are you absolutely sure that your reader and writer are pointing to the same place? Really, absolutely, positively sure? You've hard-coded the path into both writer and reader just to be really, absolutely positively sure? Or, you could let the writer close and *then* try the reader to see if it's a timing issue or a path issue. 2> You say that the indexer is still open. Is there any chance it hasn't yet written anything to disk? I'm not sure of the internals, but there has been some discussion that internally a writer uses a RAMdir for a while then periodically flushes the results to disk. It's possible that you're writer hasn't written anything yet. 3> (so I can't count). Have you used Luke to open your index to see if that works (and the file is in the place you expect)? FWIW Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: > > For the index process I use IndexModifier class. > That happens when I try to search something into the index in the same > time that the index process still running. > > the code for indexing: > System.setProperty("org.apache.lucene.lockDir", System > .getProperty("user.dir")); > File folder = new File(getIndexPath()); > Directory dir = null; > if (folder.isDirectory() && folder.exists()) { > dir = FSDirectory.getDirectory(getIndexPath(), false); > } else if (!folder.isFile() && !folder.exists()) { > dir = FSDirectory.getDirectory(getIndexPath(), true); > } else { > System.out.println("Bad index folder"); > System.exit(1); > } > boolean newIndex = true; > if (dir.fileExists("segments")) { > newIndex = false; > } > // long lastindexation = dir.fileModified("segments"); > writer = new IndexModifier(dir, new SimpleAnalyzer(), newIndex); > dir.close(); > writer.setUseCompoundFile(true); > ... > > Code For searching: > > MultiSearcher multisearch = new MultiSearcher(indexsearcher); > Hits hits = this.multisearch.search(this.getBoolQuery()); > ... > > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: 01 August 2006 13:45 > To: java-user@lucene.apache.org > Subject: Re: FileNotFoundException > > > > When the indexing process still running on a index and I try to search > > something on this index I retrive this error message: > > java.io.FileNotFoundException: > > \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot > find > > the file specified) > > > > How can I solve this. > > Could you provide some more context about your application or a small > test case that shows the error happening? This sounds likely to be a > locking issue. > > Mike > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching oracle databse records using apache Lucene
I agree completely. I was mostly responding to what appeared to be an attempt to use lucene to actually execute a database query, which is entirely different from restructing legacy data into a more-usable form as you point out, and in which case all bets are off. Erick On 8/1/06, Vasily Borisov <[EMAIL PROTECTED]> wrote: Eric, I'm sure that is entirely true. E.g. in E&P industry we have a bunch of legacy relational databases that are tremendously complex. Therefore the presentation layer for them is never good since the user is exposed to the data model complexity every time he uses this database. So, giving up on the structured API idea, flattening out the content and indexing is seems to be not a bad idea at all. Especially if you are facing the problem of integrating results from several legacy databases or integrating the flat file store with a database. Regards, Vasily On Tue, 2006-08-01 at 09:18 -0400, Erick Erickson wrote: > You're absolutely right, lucene is a text searching tool, not a database > tool. There's no point in trying to jump through hoops to use lucene if your > database already works for you. > > If you're trying to do text searches, particularly if want to ask questions > like "find the words biggest and large within 5 words of each other", then > you might want to think about lucene. Or even if you want to just make > simple searches over text. > > But to select rows from a database table, there's no reason to try to use > lucene. Use a database API instead. > > Best > Erick > > On 8/1/06, Sandip <[EMAIL PROTECTED]> wrote: > > > > > > Hi All, > > > > I am confused with Apache Lucene. > > > > I want to search my databse table records using apache lucene. > > But what i found is that Lucene is full-text search engine.This means is > > it > > only used to search documents text or anything else ? > > > > I want to search my databse like e.g. > > > > select * from tableName where username="abc"; > > using apache lucene. > > > > I am using Oracle 9i/Java for this. > > Any idea/link/suggessions will be very much appreciable. > > > > Thanks in advance > > Sandip Patil. > > -- > > View this message in context: > > http://www.nabble.com/searching-oracle-databse-records-using-apache-Lucene-tf2032743.html#a5591986 > > Sent from the Lucene - Java Users forum at Nabble.com. > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --- Vasily Borisov Director Business Development Kadme AS Tel: +47 51 87 42 54 Fax: + 47 51 87 42 51 Mob: +47 45 20 40 42 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching oracle databse records using apache Lucene
On Tue, 2006-08-01 at 15:32 +0200, Vasily Borisov wrote: > the presentation layer for them is never good since the user is > exposed to the data model complexity Isn't that why we have facades? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileNotFoundException
I'm sure that it's the good location. When the index process is finished then I can access the index. I know why but I don't know how to solve it. When I indexing a lot of file with the extension cfs are created and after few second the file are merge in an other file ex: I have a file with this name _8df.cfs and after few second this file disappeared (because it merged with an other file with a new name) so the IndexSearcher can't find it. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:49 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException So it sounds like you're not writing the index to the place you think you are. Have you just looked in the directories and checked that there are files there? If Luke can't find them, they're not where you think they are. Especially if your writer had closed before you looked. Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: > > It's the same when I try to open the index with luke > > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: 01 August 2006 15:24 > To: java-user@lucene.apache.org > Subject: Re: FileNotFoundException > > two things come to mind > > 1> are you absolutely sure that your reader and writer are pointing to > the > same place? Really, absolutely, positively sure? You've hard-coded the > path > into both writer and reader just to be really, absolutely positively > sure? > Or, you could let the writer close and *then* try the reader to see if > it's > a timing issue or a path issue. > > 2> You say that the indexer is still open. Is there any chance it hasn't > yet > written anything to disk? I'm not sure of the internals, but there has > been > some discussion that internally a writer uses a RAMdir for a while then > periodically flushes the results to disk. It's possible that you're > writer > hasn't written anything yet. > > 3> (so I can't count). Have you used Luke to open your index to see if > that > works (and the file is in the place you expect)? > > FWIW > Erick > > On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: > > > > For the index process I use IndexModifier class. > > That happens when I try to search something into the index in the same > > time that the index process still running. > > > > the code for indexing: > > System.setProperty("org.apache.lucene.lockDir", System > > .getProperty("user.dir")); > > File folder = new File(getIndexPath()); > > Directory dir = null; > > if (folder.isDirectory() && folder.exists()) { > > dir = FSDirectory.getDirectory(getIndexPath(), false); > > } else if (!folder.isFile() && !folder.exists()) { > > dir = FSDirectory.getDirectory(getIndexPath(), true); > > } else { > > System.out.println("Bad index folder"); > > System.exit(1); > > } > > boolean newIndex = true; > > if (dir.fileExists("segments")) { > > newIndex = false; > > } > > // long lastindexation = dir.fileModified("segments"); > > writer = new IndexModifier(dir, new SimpleAnalyzer(), > newIndex); > > dir.close(); > > writer.setUseCompoundFile(true); > > ... > > > > Code For searching: > > > > MultiSearcher multisearch = new > MultiSearcher(indexsearcher); > > Hits hits = this.multisearch.search(this.getBoolQuery()); > > ... > > > > -Original Message- > > From: Michael McCandless [mailto:[EMAIL PROTECTED] > > Sent: 01 August 2006 13:45 > > To: java-user@lucene.apache.org > > Subject: Re: FileNotFoundException > > > > > > > When the indexing process still running on a index and I try to > search > > > something on this index I retrive this error message: > > > java.io.FileNotFoundException: > > > \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot > > find > > > the file specified) > > > > > > How can I solve this. > > > > Could you provide some more context about your application or a small > > test case that shows the error happening? This sounds likely to be a > > locking issue. > > > > Mike > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
I think its a directory access synchronisation problem, I have also posted about this before. The scenario can be like this .. When Indexwriter object is created it reads the segment information from the file "segments" which nothing but list of files with .cfs or mayn more type, at teh same time IndexSearcher object is created which also make a list of index files from segements file, then you invoke the some write operation which triggers the index pemrging, fragmenting etc started haoppening and it modifies the file list in the segments file, but still we have the IndexerSearcher object with old file list and probably that throws the FileNotFoundExcpetion becuase physically the file is not there. May be I am wrong but I try to put some light on this issue. I posted the similar problem with subject "FileNotFoundException: occurs during the optimization of index", I am also experiencing the similar problem when the index optimization task runs on the index and parallally search function is also running. thx, supriya WATHELET Thomas wrote: I'm sure that it's the good location. When the index process is finished then I can access the index. I know why but I don't know how to solve it. When I indexing a lot of file with the extension cfs are created and after few second the file are merge in an other file ex: I have a file with this name _8df.cfs and after few second this file disappeared (because it merged with an other file with a new name) so the IndexSearcher can't find it. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:49 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException So it sounds like you're not writing the index to the place you think you are. Have you just looked in the directories and checked that there are files there? If Luke can't find them, they're not where you think they are. Especially if your writer had closed before you looked. Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: It's the same when I try to open the index with luke -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:24 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException two things come to mind 1> are you absolutely sure that your reader and writer are pointing to the same place? Really, absolutely, positively sure? You've hard-coded the path into both writer and reader just to be really, absolutely positively sure? Or, you could let the writer close and *then* try the reader to see if it's a timing issue or a path issue. 2> You say that the indexer is still open. Is there any chance it hasn't yet written anything to disk? I'm not sure of the internals, but there has been some discussion that internally a writer uses a RAMdir for a while then periodically flushes the results to disk. It's possible that you're writer hasn't written anything yet. 3> (so I can't count). Have you used Luke to open your index to see if that works (and the file is in the place you expect)? FWIW Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: For the index process I use IndexModifier class. That happens when I try to search something into the index in the same time that the index process still running. the code for indexing: System.setProperty("org.apache.lucene.lockDir", System .getProperty("user.dir")); File folder = new File(getIndexPath()); Directory dir = null; if (folder.isDirectory() && folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), false); } else if (!folder.isFile() && !folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), true); } else { System.out.println("Bad index folder"); System.exit(1); } boolean newIndex = true; if (dir.fileExists("segments")) { newIndex = false; } // long lastindexation = dir.fileModified("segments"); writer = new IndexModifier(dir, new SimpleAnalyzer(), newIndex); dir.close(); writer.setUseCompoundFile(true); ... Code For searching: MultiSearcher multisearch = new MultiSearcher(indexsearcher); Hits hits = this.multisearch.search(this.getBoolQuery()); ... -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 13:45 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException When the indexing process still running on a index and I try to search something on this index I retrive this error message: java.io.FileNotFoundException: \\tradluxstmp01\JavaIndex\tra\index_EN\_2hea.fnm (The system cannot find the file specified) How can I solve this. Could you provide some more context about your ap
RE: FileNotFoundException
Have you solved thisproblem? -Original Message- From: Supriya Kumar Shyamal [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 16:30 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException I think its a directory access synchronisation problem, I have also posted about this before. The scenario can be like this .. When Indexwriter object is created it reads the segment information from the file "segments" which nothing but list of files with .cfs or mayn more type, at teh same time IndexSearcher object is created which also make a list of index files from segements file, then you invoke the some write operation which triggers the index pemrging, fragmenting etc started haoppening and it modifies the file list in the segments file, but still we have the IndexerSearcher object with old file list and probably that throws the FileNotFoundExcpetion becuase physically the file is not there. May be I am wrong but I try to put some light on this issue. I posted the similar problem with subject "FileNotFoundException: occurs during the optimization of index", I am also experiencing the similar problem when the index optimization task runs on the index and parallally search function is also running. thx, supriya WATHELET Thomas wrote: > I'm sure that it's the good location. > When the index process is finished then I can access the index. > I know why but I don't know how to solve it. > When I indexing a lot of file with the extension cfs are created and > after few second the file are merge in an other file > ex: > I have a file with this name _8df.cfs and after few second this file > disappeared (because it merged with an other file with a new name) so > the IndexSearcher can't find it. > > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: 01 August 2006 15:49 > To: java-user@lucene.apache.org > Subject: Re: FileNotFoundException > > So it sounds like you're not writing the index to the place you think > you > are. Have you just looked in the directories and checked that there are > files there? If Luke can't find them, they're not where you think they > are. > Especially if your writer had closed before you looked. > > Erick > > On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: > >> It's the same when I try to open the index with luke >> >> -Original Message- >> From: Erick Erickson [mailto:[EMAIL PROTECTED] >> Sent: 01 August 2006 15:24 >> To: java-user@lucene.apache.org >> Subject: Re: FileNotFoundException >> >> two things come to mind >> >> 1> are you absolutely sure that your reader and writer are pointing to >> the >> same place? Really, absolutely, positively sure? You've hard-coded the >> path >> into both writer and reader just to be really, absolutely positively >> sure? >> Or, you could let the writer close and *then* try the reader to see if >> it's >> a timing issue or a path issue. >> >> 2> You say that the indexer is still open. Is there any chance it >> > hasn't > >> yet >> written anything to disk? I'm not sure of the internals, but there has >> been >> some discussion that internally a writer uses a RAMdir for a while >> > then > >> periodically flushes the results to disk. It's possible that you're >> writer >> hasn't written anything yet. >> >> 3> (so I can't count). Have you used Luke to open your index to see if >> that >> works (and the file is in the place you expect)? >> >> FWIW >> Erick >> >> On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: >> >>> For the index process I use IndexModifier class. >>> That happens when I try to search something into the index in the >>> > same > >>> time that the index process still running. >>> >>> the code for indexing: >>> System.setProperty("org.apache.lucene.lockDir", System >>> .getProperty("user.dir")); >>> File folder = new File(getIndexPath()); >>> Directory dir = null; >>> if (folder.isDirectory() && folder.exists()) { >>> dir = FSDirectory.getDirectory(getIndexPath(), false); >>> } else if (!folder.isFile() && !folder.exists()) { >>> dir = FSDirectory.getDirectory(getIndexPath(), true); >>> } else { >>> System.out.println("Bad index folder"); >>> System.exit(1); >>> } >>> boolean newIndex = true; >>> if (dir.fileExists("segments")) { >>> newIndex = false; >>> } >>> // long lastindexation = dir.fileModified("segments"); >>> writer = new IndexModifier(dir, new SimpleAnalyzer(), >>> >> newIndex); >> >>> dir.close(); >>> writer.setUseCompoundFile(true); >>> ... >>> >>> Code For searching: >>> >>> MultiSearcher multisearch = new >>> >> MultiSearcher(indexsearcher); >> >>> Hits hits = this.multisearch.search(this.getBoolQuery()); >>> ... >>> >>> -Original Message- >>
Re: FileNotFoundException
I should say not exactly, the temporary solution I made is that, I always copy the existing index to different directory run the modification or optimization task and then copy back, somethign like flip flop mechanism.. current index <-- searcher copy to --> temp index <-- run optimization temp index <-- switch searcher, so searcher pomits to temtp index copy back --> current index <-- swicth back the searcher again This is somehow the critical issue and there is some promise in lucene saying that the locking mechanism will be much more sophisticated in future release. Thanks, supriya WATHELET Thomas wrote: Have you solved thisproblem? -Original Message- From: Supriya Kumar Shyamal [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 16:30 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException I think its a directory access synchronisation problem, I have also posted about this before. The scenario can be like this .. When Indexwriter object is created it reads the segment information from the file "segments" which nothing but list of files with .cfs or mayn more type, at teh same time IndexSearcher object is created which also make a list of index files from segements file, then you invoke the some write operation which triggers the index pemrging, fragmenting etc started haoppening and it modifies the file list in the segments file, but still we have the IndexerSearcher object with old file list and probably that throws the FileNotFoundExcpetion becuase physically the file is not there. May be I am wrong but I try to put some light on this issue. I posted the similar problem with subject "FileNotFoundException: occurs during the optimization of index", I am also experiencing the similar problem when the index optimization task runs on the index and parallally search function is also running. thx, supriya WATHELET Thomas wrote: I'm sure that it's the good location. When the index process is finished then I can access the index. I know why but I don't know how to solve it. When I indexing a lot of file with the extension cfs are created and after few second the file are merge in an other file ex: I have a file with this name _8df.cfs and after few second this file disappeared (because it merged with an other file with a new name) so the IndexSearcher can't find it. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:49 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException So it sounds like you're not writing the index to the place you think you are. Have you just looked in the directories and checked that there are files there? If Luke can't find them, they're not where you think they are. Especially if your writer had closed before you looked. Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: It's the same when I try to open the index with luke -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 15:24 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException two things come to mind 1> are you absolutely sure that your reader and writer are pointing to the same place? Really, absolutely, positively sure? You've hard-coded the path into both writer and reader just to be really, absolutely positively sure? Or, you could let the writer close and *then* try the reader to see if it's a timing issue or a path issue. 2> You say that the indexer is still open. Is there any chance it hasn't yet written anything to disk? I'm not sure of the internals, but there has been some discussion that internally a writer uses a RAMdir for a while then periodically flushes the results to disk. It's possible that you're writer hasn't written anything yet. 3> (so I can't count). Have you used Luke to open your index to see if that works (and the file is in the place you expect)? FWIW Erick On 8/1/06, WATHELET Thomas <[EMAIL PROTECTED]> wrote: For the index process I use IndexModifier class. That happens when I try to search something into the index in the same time that the index process still running. the code for indexing: System.setProperty("org.apache.lucene.lockDir", System .getProperty("user.dir")); File folder = new File(getIndexPath()); Directory dir = null; if (folder.isDirectory() && folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), false); } else if (!folder.isFile() && !folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), true); } else { System.out.println("Bad index folder"); System.exit(1); } boolean newIndex = true; if (dir.fileExists("segments")) { newIndex = false; } // long lastindexation = dir.fileModified("segments"); writer
Re: FileNotFoundException
I think its a directory access synchronisation problem, I have also posted about this before. The scenario can be like this .. When Indexwriter object is created it reads the segment information from the file "segments" which nothing but list of files with .cfs or mayn more type, at teh same time IndexSearcher object is created which also make a list of index files from segements file, then you invoke the some write operation which triggers the index pemrging, fragmenting etc started haoppening and it modifies the file list in the segments file, but still we have the IndexerSearcher object with old file list and probably that throws the FileNotFoundExcpetion becuase physically the file is not there. May be I am wrong but I try to put some light on this issue. I posted the similar problem with subject "FileNotFoundException: occurs during the optimization of index", I am also experiencing the similar problem when the index optimization task runs on the index and parallally search function is also running. Lucene has file-based locking for exactly this reason. Can you double-check that the same lockDir is being used in both your IndexModifier process and your searching process? Also: this directory can't be an NFS mount -- there are known problems with the current Lucene locking implementation and NFS file systems. Are you using NFS? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileNotFoundException
Yes -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 17:10 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException > I think its a directory access synchronisation problem, I have also > posted about this before. The scenario can be like this .. > > When Indexwriter object is created it reads the segment information from > the file "segments" which nothing but list of files with .cfs or mayn > more type, at teh same time IndexSearcher object is created which also > make a list of index files from segements file, then you invoke the some > write operation which triggers the index pemrging, fragmenting etc > started haoppening and it modifies the file list in the segments file, > but still we have the IndexerSearcher object with old file list and > probably that throws the FileNotFoundExcpetion becuase physically the > file is not there. > > May be I am wrong but I try to put some light on this issue. > > I posted the similar problem with subject "FileNotFoundException: occurs > during the optimization of index", I am also experiencing the similar > problem when the index optimization task runs on the index and > parallally search function is also running. Lucene has file-based locking for exactly this reason. Can you double-check that the same lockDir is being used in both your IndexModifier process and your searching process? Also: this directory can't be an NFS mount -- there are known problems with the current Lucene locking implementation and NFS file systems. Are you using NFS? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
Yes Yes, you're certain you have the same lock dir for both modifier & search process? Or, Yes you're using NFS as your lock dir? Or, both? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileNotFoundException
Ok if I well understood I have to put the lock file at the same place in my indexing process and searching process. -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 17:14 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException > Yes Yes, you're certain you have the same lock dir for both modifier & search process? Or, Yes you're using NFS as your lock dir? Or, both? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
Ok if I well understood I have to put the lock file at the same place in my indexing process and searching process. That's correct. And, that place can't be an NFS mounted directory (until we fix locking implementation...). The two different processes will use this lock file to make sure it's safe to read from or write to the files in the index. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: FileNotFoundException
Ok thanks a lot. -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 01 August 2006 17:19 To: java-user@lucene.apache.org Subject: Re: FileNotFoundException > Ok if I well understood I have to put the lock file at the same place in > my indexing process and searching process. That's correct. And, that place can't be an NFS mounted directory (until we fix locking implementation...). The two different processes will use this lock file to make sure it's safe to read from or write to the files in the index. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
Yes, I use the nfs mount to share the index for other search instance and all the instances have same lock directory configured, but the only the difference is that nfs mount is read-only mount, so I have to disable the lock mechanism for search instances, only lock is enabled for index modification instance. We have 6 jboss cluster for our application. so 5 instances of jboss search on the same index and the 6th instance used for index update. supriya Michael McCandless wrote: I think its a directory access synchronisation problem, I have also posted about this before. The scenario can be like this .. When Indexwriter object is created it reads the segment information from the file "segments" which nothing but list of files with .cfs or mayn more type, at teh same time IndexSearcher object is created which also make a list of index files from segements file, then you invoke the some write operation which triggers the index pemrging, fragmenting etc started haoppening and it modifies the file list in the segments file, but still we have the IndexerSearcher object with old file list and probably that throws the FileNotFoundExcpetion becuase physically the file is not there. May be I am wrong but I try to put some light on this issue. I posted the similar problem with subject "FileNotFoundException: occurs during the optimization of index", I am also experiencing the similar problem when the index optimization task runs on the index and parallally search function is also running. Lucene has file-based locking for exactly this reason. Can you double-check that the same lockDir is being used in both your IndexModifier process and your searching process? Also: this directory can't be an NFS mount -- there are known problems with the current Lucene locking implementation and NFS file systems. Are you using NFS? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Mit freundlichen Grüßen / Regards Supriya Kumar Shyamal Software Developer tel +49 (30) 443 50 99 -22 fax +49 (30) 443 50 99 -99 email [EMAIL PROTECTED] ___ artnology GmbH Milastr. 4 10437 Berlin ___ http://www.artnology.com __ News / Aktuelle Projekte: * artnology gewinnt Ausschreibung des Bundesministeriums des Innern: Softwarelösung für die Verwaltung der Sammlung zeitgenössischer Kunstwerke zur kulturellen Repräsentation des Bundes. Projektreferenzen: * Globaler eShop und Corporate-Site für Springer: www.springeronline.com * E-Detailing-Portal für Novartis: www.interaktiv.novartis.de * Service-Center-Plattform für Biogen: www.ms-life.de * eCRM-System für Grünenthal: www.gruenenthal.com ___ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
Yes, I use the nfs mount to share the index for other search instance and all the instances have same lock directory configured, but the only the difference is that nfs mount is read-only mount, so I have to disable the lock mechanism for search instances, only lock is enabled for index modification instance. We have 6 jboss cluster for our application. so 5 instances of jboss search on the same index and the 6th instance used for index update. OK unfortunately this won't work. Well, it will "work" but you'll hit occasional FileNotFoundExceptions on your searchers, whenever a searcher tries to restart itself while the updater is writing a new segments file. Even though the searcher's are read only, they still need to briefly hold the commit lock to ensure the updater doesn't write a new segments file while the searcher is reading it (and opening each segment). We are working towards a fix for lock files over NFS mounts, first by decoupling locking from directory implementation (http://issues.apache.org/jira/browse/LUCENE-635) and second by creating better LockFactory implementations for different cases (eg at least a locking implementation based on native OS locks). But this is still in process... I think the best workaround for now is to take an approach like Solr: http://incubator.apache.org/solr/features.html http://incubator.apache.org/solr/tutorial.html whereby the single writer will occasionally (at a known safe time) make a snapshot of its index, and then the multiple searchers can switch to that index once it's safe. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search matching
I guess so, but without any information about your code nobody can tell what. If you provide more information you willl get help!! regards simon On 8/1/06, Rajiv Roopan <[EMAIL PROTECTED]> wrote: Hello, I have an index of locations for example. I'm indexing one field using SimpleAnalyzer. doc1: albany ny doc2: hudson ny doc3: new york ny doc4: new york mills ny when I search for "new york ny" , the first result returned is always "new york mills ny". Am I doing something incorrect? thanks in advance, rajiv - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search matching
Ok, this is how I'm indexing. Both in indexing and searching I'm using SimpleAnalyzer() String loc = "New York, NY"; doc.add(new Field("location", loc, Field.Store.NO, Field.Index.TOKENIZED)); String loc2 = "New York Mills, NY"; doc.add(new Field("location", loc2, Field.Store.NO, Field.Index.TOKENIZED )); and this is how I'm searching... String searchStr = "New York, NY"; Analyzer analyzer = new SimpleAnalyzer(); QueryParser parser = new QueryParser("location", analyzer); parser.setDefaultOperator(QueryParser.AND_OPERATOR); Query query = parser.parse( searchStr ); Hits hits = searcher.search( query ); I've tried all query types and everytime "new york mills, ny" is in hits(0). Both results have a score of 1.0. I know I can add some kind of sort to always make the shorter field first. But shouldn't the first by default, due to the scoring algorithm, be "new york, ny" because it's a shorter field? let me know if i'm missing something. thanks! rajiv On 8/1/06, Simon Willnauer <[EMAIL PROTECTED]> wrote: I guess so, but without any information about your code nobody can tell what. If you provide more information you willl get help!! regards simon On 8/1/06, Rajiv Roopan <[EMAIL PROTECTED]> wrote: > Hello, I have an index of locations for example. I'm indexing one field > using SimpleAnalyzer. > > doc1: albany ny > doc2: hudson ny > doc3: new york ny > doc4: new york mills ny > > when I search for "new york ny" , the first result returned is always "new > york mills ny". Am I doing something incorrect? > > thanks in advance, > rajiv > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: FileNotFoundException
For the index process I use IndexModifier class. That happens when I try to search something into the index in the same time that the index process still running. the code for indexing: System.setProperty("org.apache.lucene.lockDir", System .getProperty("user.dir")); File folder = new File(getIndexPath()); Directory dir = null; if (folder.isDirectory() && folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), false); } else if (!folder.isFile() && !folder.exists()) { dir = FSDirectory.getDirectory(getIndexPath(), true); } else { System.out.println("Bad index folder"); System.exit(1); } boolean newIndex = true; if (dir.fileExists("segments")) { newIndex = false; } // long lastindexation = dir.fileModified("segments"); writer = new IndexModifier(dir, new SimpleAnalyzer(), newIndex); dir.close(); writer.setUseCompoundFile(true); ... BTW, one thing that I don't think is right is the "dir.close()" statement after you creat the IndexModifier. I think you should not call dir.close() until you are done with the IndexModifier (ie, at the same time you call IndexModifier.close()). It sounds like it's unrelated to your NFS locking issue but still could cause other problems... Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search matching
Rajiv, Have a look at the details provided by IndexSearcher.explain() for those documents, and you'll get some insight into the factors used to rank them. Since both scores are 1.0, you'll probably want to implement your own custom Similarity and override the lengthNorm() to adjust that factor. Another technique you can use is to expand a users query into a more sophisticated boolean query, such that a users query for "new york ny" would become (in Query.toString format): +new +york +ny "new york ny", which would boost exact matches. Erik On Aug 1, 2006, at 1:19 PM, Rajiv Roopan wrote: Ok, this is how I'm indexing. Both in indexing and searching I'm using SimpleAnalyzer() String loc = "New York, NY"; doc.add(new Field("location", loc, Field.Store.NO, Field.Index.TOKENIZED)); String loc2 = "New York Mills, NY"; doc.add(new Field("location", loc2, Field.Store.NO, Field.Index.TOKENIZED )); and this is how I'm searching... String searchStr = "New York, NY"; Analyzer analyzer = new SimpleAnalyzer(); QueryParser parser = new QueryParser("location", analyzer); parser.setDefaultOperator(QueryParser.AND_OPERATOR); Query query = parser.parse( searchStr ); Hits hits = searcher.search( query ); I've tried all query types and everytime "new york mills, ny" is in hits(0). Both results have a score of 1.0. I know I can add some kind of sort to always make the shorter field first. But shouldn't the first by default, due to the scoring algorithm, be "new york, ny" because it's a shorter field? let me know if i'm missing something. thanks! rajiv On 8/1/06, Simon Willnauer <[EMAIL PROTECTED]> wrote: I guess so, but without any information about your code nobody can tell what. If you provide more information you willl get help!! regards simon On 8/1/06, Rajiv Roopan <[EMAIL PROTECTED]> wrote: > Hello, I have an index of locations for example. I'm indexing one field > using SimpleAnalyzer. > > doc1: albany ny > doc2: hudson ny > doc3: new york ny > doc4: new york mills ny > > when I search for "new york ny" , the first result returned is always "new > york mills ny". Am I doing something incorrect? > > thanks in advance, > rajiv > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search with accents
Hello there, I have a brazilian portuguese index, which has been analyzed with BrazilianAnalyzer. When searching words with accents, however, they're not found -- for instance, if the index contains some text with the word "maçã" and I search for that very word, I get no hits, but if I search "maca" (which is another portuguese word) then the document containing "maçã" is found. I've seen posts in the archive indicating that I should use ISOLatin1AccentFilter to handle this, but I don't quite see how: should I leave indexation as it is and use this filter only for search queries or should I apply it in both cases? Thank you, Eduardo Cordeiro
RE: Search with accents
Hi, Have you used the same BrazilianAnalyzer when searching? Best regards, Lisheng -Original Message- From: Eduardo S. Cordeiro [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 01, 2006 1:40 PM To: java-user@lucene.apache.org Subject: Search with accents Hello there, I have a brazilian portuguese index, which has been analyzed with BrazilianAnalyzer. When searching words with accents, however, they're not found -- for instance, if the index contains some text with the word "maçã" and I search for that very word, I get no hits, but if I search "maca" (which is another portuguese word) then the document containing "maçã" is found. I've seen posts in the archive indicating that I should use ISOLatin1AccentFilter to handle this, but I don't quite see how: should I leave indexation as it is and use this filter only for search queries or should I apply it in both cases? Thank you, Eduardo Cordeiro - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search with accents
Yes...here's how I create my QueryParser: QueryParser parser = new QueryParser("text", new BrazilianAnalyzer()); 2006/8/1, Zhang, Lisheng <[EMAIL PROTECTED]>: Hi, Have you used the same BrazilianAnalyzer when searching? Best regards, Lisheng -Original Message- From: Eduardo S. Cordeiro [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 01, 2006 1:40 PM To: java-user@lucene.apache.org Subject: Search with accents Hello there, I have a brazilian portuguese index, which has been analyzed with BrazilianAnalyzer. When searching words with accents, however, they're not found -- for instance, if the index contains some text with the word "maçã" and I search for that very word, I get no hits, but if I search "maca" (which is another portuguese word) then the document containing "maçã" is found. I've seen posts in the archive indicating that I should use ISOLatin1AccentFilter to handle this, but I don't quite see how: should I leave indexation as it is and use this filter only for search queries or should I apply it in both cases? Thank you, Eduardo Cordeiro - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search with accents
Hi, In this case I guess we may need to find out what exactly BrazilianAnalyzer do on the input string: BrazilianAnalyzer braAnalyser = new BrazilianAnalyzer(); TokenStream ts1 = braAnalyzer.tokenStream("text", new StringReader(queryStr)); ... // what BrazilianAnalyzer do? Also what exactly ISOLatin1AccentFilter can do: WhiteSpaceAnalyzer wsAnalyzer = new wsAnalyzer(); TokenStream tmpts = wsAnalyzer.tokenStream("text", new StringReader(queryStr)); TokenStream ts2 = new ISOLatin1AccentFilter(tmpts); // what ISOLatin1AccentFilter do? to see what is wrong with ts1 and see if ts2 can do better job? I have never used ISOLatin1AccentFilter before, I am not sure if the way to test it is really OK, here I merely suggest a way to test. Best regards, Lisheng -Original Message- From: Eduardo S. Cordeiro [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 01, 2006 2:34 PM To: java-user@lucene.apache.org Subject: Re: Search with accents Yes...here's how I create my QueryParser: QueryParser parser = new QueryParser("text", new BrazilianAnalyzer()); 2006/8/1, Zhang, Lisheng <[EMAIL PROTECTED]>: > Hi, > > Have you used the same BrazilianAnalyzer when > searching? > > Best regards, Lisheng > > -Original Message- > From: Eduardo S. Cordeiro [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 01, 2006 1:40 PM > To: java-user@lucene.apache.org > Subject: Search with accents > > > Hello there, > > I have a brazilian portuguese index, which has been analyzed with > BrazilianAnalyzer. When searching words with accents, however, they're > not found -- for instance, if the index contains some text with the > word "maçã" and I search for that very word, I get no hits, but if I > search "maca" (which is another portuguese word) then the document > containing "maçã" is found. > > I've seen posts in the archive indicating that I should use > ISOLatin1AccentFilter to handle this, but I don't quite see how: > should I leave indexation as it is and use this filter only for search > queries or should I apply it in both cases? > > Thank you, > Eduardo Cordeiro > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Does lucene performance suffer with a lot of empty fields ?
: >From what I gather, I can go ahead & create an Index & for each Document & : only add the relevant fields. Is this correct? : I should still be able to search with queries like "mel Movies:braveheart". : Right ? : : Would this impact the search performance ? : Any other words of caution for me ? it will absolutely work -- the one performance issue you may want to consider is that by default a "fieldNorm" is computed for every document and every field, and these are kept in memory -- there is a way to turn them off on a per field basis (you have to turn them off for every doc, if even one doc wants a norm for field X, then every doc gets a norm for field X) how to "omitNorms" for a field, and what the pros (save space) and cons (no "lengthNorm" or "field boosts") are has been discussed extensively in the past. search the archives for anything i've put in quotes and you'll find lots of info on this. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting
: I'm with you now. So you do seeks in your comparator. For a large index you : might as well use java.io.RandomAccessFile for the "array", because there : would be little value in buffering when the comparator is liable to jump all yep .. that's what i was getting at ... but i'm not so sure that buffering won't be usefull. I've i'm not mistaken, all Scorers are by contract expected to score docs in docId order so when your hits are being collected for sorting, you should allways be moving forward in the file -- but you may skip ahead alot when the result set isn't a high percentage of the total number of docs. (i may be wrong about all Scorers going in docId order ... if you explicilty use the 1.4 BooleanScorer you may not get that behavior, but i think everything else works that way ... perhaps someone else can verify that) : around the file. This sounds very expensive, though. If you don't open a : Searcher to frequently, it makes sense (in my muddled mind) to pre-sort to : reduce the number of seeks. That was the half-baked idea of the third file, : which essentially orders document IDs. presort on what exactly, the field you want to sort on? -- That's esentially what the TermEnum is. I'm not sure how having that helps you ... let's assume you've got some data structure (let's not worry about the file/ram or TermEnum distinction just yet) containing every document in your index of 100,000,000 products sorted on the price field, and you've done a search for "apple" and there are 1,000,000 docIds for matching products ready to be collected by your new custom Scoring code ... how does the full list of all docIds sorted by price help you as you are given docIds and have to decide if that doc is better or worse then the docs you've already collected? : > Bear in mind, there have been some improvements recently to the ability to : grab individual stored fields per document : : I can't see anything like that in 2.0. Is that something in the Lucene HEAD : build? I guess so ... search the java-dev archives for "lazy field loading" or "Fieldable" .. that should find some of the discussion about it and the jira issue with the changes. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: About search performance
My question is about deal with the multi clauses booleanQuery, the num of clauses is giant and induce the performance.So I want some other method to replace this query to improve the performance. Now through filter achieve the goal. Thanks for the suggestions. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexsearcher - one instance in PHP via javabridge?
Hello everyone, I'm having tons of fun right now with Lucene indexing a large (15 millions documents) library. I'm developing the web front end, and I read on this mailing list that it's better to have one instance of IndexSearcher. I'm using Lucene in PHP via JavaBridge (and Tomcat), but I can't figure out how to instantiate an unique copy of IndexSearcher, somehow get it in my webpage for each thread, and destroy it before I add stuff to the index (at the end of each day). I'm trying to follow these instructions, but I have zero experience with Java, JVMs, Tomcat, etc. Could somebody help me with this one? Thanks in advance! Instructions: I commend you for giving all the information that's relevant. For the sake of simplicity, and because it is the vast majority of use cases, could you endorse the following as the simplest, most correct way (i.e. a best practice) to implement Lucene for Web applications. 1- create an IndexSearcher instance in the servlet's init() method, and cache it in the web application context, 2- in the doGet or doPost() methods, lookup the index searcher instance in the web application context and use it to run queries, 3- close the IndexSearcher in the destroy() method. This is *simple*, and *correct*. It doesn't create a new IndexSearcher per query, doesn't use a static field, nor a singleton, nor a pool. All ideas that have been suggested but have issues, or are more difficult to implement. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexsearcher - one instance in PHP via javabridge?
: I'm trying to follow these instructions, but I have zero experience with : Java, JVMs, Tomcat, etc. Could somebody help me with this one? Thanks in : advance! if you want to eliminate your need to write java code (or servlets) completely take a look at Solr ... it provides a webservicesish API for indexing and searching, and handles all of the Lucene "best practices" for you... http://incubator.apache.org/solr/ http://incubator.apache.org/solr/tutorial.html There's even some examples on the wiki about how to talk to Solr via PHP (but i don't really know anything about PHP so I can't comment on the quality of the code) http://wiki.apache.org/solr/SolPHP -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]