Re: Memory Leak when using Custom Sort (i.e., DistanceSortSource) of LocalLucene with Lucene

2008-06-10 Thread Otis Gospodnetic
Hi Ethan, Yes, it would be good to have this in JIRA. Please see http://wiki.apache.org/lucene-java/HowToContribute for info about generating the patch, etc. Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Ethan Tao <[EMAIL PROTEC

Re: swapping Lucene's on RAM drive

2008-06-10 Thread Anshum
Hi Tom, Just a suggestion, why don't you try explicitly using tmpfs. (considering you use a unix/linux server. I tried both, RAMDirectory implementation and tmpfs and found the latter more appealing. Firstly, in this case you'll have more control of cleaning up/retaining as per your needs. Also, u

Re: retrieve all docs efficiently - just one field

2008-06-10 Thread Karl Wettin
11 jun 2008 kl. 00.35 skrev 1world1love: We have our lucene index and we want to search the section text for the word "panama" AND We want to select from the demographics table where age > 50. -- Now I need to intersect the master table IDs from my lucene hits and my table results. I

Scoring filters

2008-06-10 Thread Karl Wettin
Each of my filters represent single boosting term queries. But when using the filter instead o the boosting term query I loose the score (not sure this is true) and payload boost (if any), both essential for the quality of my results. If I was to add payloads to the bits that are set, what

retrieve all docs efficiently - just one field

2008-06-10 Thread 1world1love
Greetings all. I have read many posts concerning similar use cases, but I am still a little hazy on the best way to achieve what I need to do. Here is the background: 2 million documents with multiple sections, some sections contain structured data, some unstructured. We parse the docs and place

Re: number of hits per document

2008-06-10 Thread Spencer Tickner
Hi John, Sorry I don't have a solution for you but I'm trying to do the same thing. I would love to hear from you if you have any success with this. Cheers, Spencer [EMAIL PROTECTED] On Tue, Jun 10, 2008 at 6:28 AM, John Byrne <[EMAIL PROTECTED]> wrote: > Hi, > > I could do it that way, but cou

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-06-10 Thread Grant Ingersoll
On Jun 10, 2008, at 3:35 PM, Michael McCandless wrote: Grant, Can you describe any details on how this app is using Lucene? It's in Solr using the trunk. EG are you using autoCommit=false or true? ac=false Is more than one thread adding documents to the index? I don't believe so, b

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-06-10 Thread Michael McCandless
Grant, Can you describe any details on how this app is using Lucene? EG are you using autoCommit=false or true? Is more than one thread adding documents to the index? Any changes to the defaults in IndexWriter? After seeing that exception, does IndexReader.open also hit that exception

RE: indexing images of a document

2008-06-10 Thread Steven A Rowe
Hi Bernd, On 06/10/2008 at 2:40 PM, Bernd Mueller wrote: > Actually it is not about searching the binary data. I just > wanna have the whole document (together with the "image stuff") > to be stored in the index. So when I got a hit of a search on a > document that I can extract the document with

RE: indexing images of a document

2008-06-10 Thread Bernd Mueller
Hi Steve, Actually it is not about searching the binary data. I just wanna have the whole document (together with the "image stuff") to be stored in the index. So when I got a hit of a search on a document that I can extract the document with its images (and the images' meta information). Nice w

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-06-10 Thread Grant Ingersoll
Hi Paul, Not sure if this was resolved, but I don't think it was. Can you try reproducing this with setCompoundFile(false)? That is, turn of compound files. I have an intermittent report of an exception that looks eerily similar that I am trying to track down and I am not using CFS and

Re: How to get the error position in QueryParser/ParseException

2008-06-10 Thread Chris Hostetter
: : When using "new QueryParser(...).parse(...)" I'd like to get the : position where the error was detected (to show it to the user). : See (and run) the code below. : : This is not possible via "e.currentToken" (that's null). Nevertheless : this position will be printed within the getMessage(

Re: Concurrent query benchmarks

2008-06-10 Thread Glen Newton
Thanks for the positive feedback. :-) Yes, right now the benchmark only uses one IndexSearcher for all threads, but I have completed an extension that allows you to either 1) have multiple searchers for the same index; or 2) have multiple indexes (copies of one another) with a single searcher per

Re: Concurrent query benchmarks

2008-06-10 Thread Chris Lu
Good work! I would like to see how it performs with several index reader instances, which is said to increase concurrency. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Datab

RE: Running Lucene in a Clustered Environment

2008-06-10 Thread lutan
I and you have the same problem to solve,and I also research recently. and I heard that: Lucene is a highly optimized inverted index search engine. It stored a number of inverted indexes in a custom file format that is highly optimized to ensure that the indexes can be loaded by searchers quic

RE: Running Lucene in a Clustered Environment

2008-06-10 Thread lutan
I and you have the same problem to solve,and I also research recently. and I heard that: Lucene is a highly optimized inverted index search engine. It stored a number of inverted indexes in a custom file format that is highly optimized to ensure that the indexes can be loaded by searchers quic

RE: The performance of lucene searching(web entironment) test

2008-06-10 Thread lutan
Thanks for the reply! In my test case , I start loadrunner jsut test for 5 minute,and the response growth slowly.the TPS(transactions per second) seems stoped at 10 finally. I will run a test for a longer time again. In addition,does lucene has bottleneck about the number of documents or inde

RE: indexing images of a document

2008-06-10 Thread Steven A Rowe
Hi Bernd, It's still not clear what you want to do. What will a search look like? On 06/10/2008 at 8:36 AM, Bernd Mueller wrote: > I will try to explain what I mean with image stuff. An image in > xml-documents is usually an url to the location where the image is > stored. Additionally, such an

Re: Concurrent query benchmarks

2008-06-10 Thread Glen Newton
2008/6/9 Otis Gospodnetic <[EMAIL PROTECTED]>: > Hi Glen, > > Thanks for sharing. Does your benchmarking tool build on top of > contrib/benchmark? (not sure if that one lets you specify the number of > concurrent threads -- if it does not, perhaps this is an opportunity to add > this functional

Re: The performance of lucene searching(web entironment) test

2008-06-10 Thread Toke Eskildsen
On Tue, 2008-06-10 at 21:11 +0800, lutan wrote: > [A lot of text with code and no newlines, making it very hard to read] In your test you're reusing the searcher. For each search your program performs, you will see faster response times, until the searcher is fully warmed. If your production-syst

Re: swapping Lucene's on RAM drive

2008-06-10 Thread Tom
Thanks for reacting Anshum, I also posted this on the Compass forum, but I was not sure where the responsibility was. But your information did give me a good start. From the Compass' RAMDirectoryStore source code: "uses org.apache.lucene.store.RAMDirectory" So appearantly I use two separate

Re: number of hits per document

2008-06-10 Thread John Byrne
Hi, I could do it that way, but couting the spans per document is specific to SpanQuerys. I would still have to count hits for TermQuerys separately. I was looking for a generic way to count hits for any instance of Query within a document. To put it another way, the ability to find the Term

The performance of lucene searching(web entironment) test

2008-06-10 Thread lutan
I have recently done some tests on lucene. I do not know whether the test results normal. hd entironment:Intel(R) Xeon(R) CPU 5110 @ 1.60GHz4GB ram sw entironment:centOS4.6+sun jdk 1.5+jboss+lucene2.3.2+je-analysis(a chinese analysis)there are 10 million+ documents which total about 3GB test

Re: number of hits per document

2008-06-10 Thread Grant Ingersoll
A SpanQuery is just a Query, so the traditional way of Querying still applies, i.e. you get back a list of matching documents. Beyond that, if you just want to operate on the spans, just keep track of how often the doc() method changes. HTH, Grant On Jun 9, 2008, at 11:21 AM, John Byrne wr

Re: indexing images of a document

2008-06-10 Thread Bernd Mueller
Hi Erick, Thanks for your fast reply. I will try to explain what I mean with image stuff. An image in xml-documents is usually an url to the location where the image is stored. Additionally, such an image tag contains information like offset, width and height. I am thinking about storing the

Re: indexing images of a document

2008-06-10 Thread Erick Erickson
You add as many fields to the document while indexing as you need to correctly contain your "image stuff". If this answer seems cryptic, it's as clear as your problem statement . To give you a meaningful answer, we need a much clearer problem statement. What is "image stuff", what format is it in,

indexing images of a document

2008-06-10 Thread Bernd Mueller
Hello, I have XML-documents containing image information. These images should be indexed with the document by having one additional field with the image stuff. Could anyone please give me some hints how I can manage this? Regards, Bernd ---

Re: Running Lucene in a Clustered Environment

2008-06-10 Thread Shalin Shekhar Mangar
Hi Kalani, Are you aware of Apache Solr? On Tue, Jun 10, 2008 at 2:32 PM, Eric Bowman <[EMAIL PROTECTED]> wrote: > Kalani Ruwanpathirana wrote: > >> Hi, >> >> Thanks for the reply. It seems that SAN is not an option for my case . >> However the other option is acceptable though I have to do some

Memory Leak when using Custom Sort (i.e., DistanceSortSource) of LocalLucene with Lucene

2008-06-10 Thread Ethan Tao
Hi, We had the memory leak issue when using DistanceSortSource of LocalLucene for repeated query/search. In about 450 queries, we are experiencing out of memory error. After dig in the code, we found the problem source is coming from Lucene package, the way how it handles "custom" type compara

Re: swapping Lucene's on RAM drive

2008-06-10 Thread Anshum
Hi Tom, perhaps this is regarding Compass and you would have wanted to post this query there. In case you are looking and asking about running 2 instances of lucene on the machine with 2 separate indexes in RAM, could you specify : * Do you use tmpfs or do you load it using the RamDirectory class?

swapping Lucene's on RAM drive

2008-06-10 Thread Tom
I have not been able to find much information about this, hence this question. Currently I use Lucene through Compass with the data stored in RAM. The indexed information is updated daily and therefore I create a new Compass/Lucene combination every day, let it load the new data and then swap

Re: Running Lucene in a Clustered Environment

2008-06-10 Thread Eric Bowman
Kalani Ruwanpathirana wrote: Hi, Thanks for the reply. It seems that SAN is not an option for my case . However the other option is acceptable though I have to do some extra work with clustering. I got to know that there is an option called "Database clustered local search" ( http://bugs.sakai

Re: Running Lucene in a Clustered Environment

2008-06-10 Thread Kalani Ruwanpathirana
Hi, Thanks for the reply. It seems that SAN is not an option for my case . However the other option is acceptable though I have to do some extra work with clustering. I got to know that there is an option called "Database clustered local search" ( http://bugs.sakaiproject.org/confluence/display/

Re: Compass - Reloading Domain Object Defintiion Files

2008-06-10 Thread Konstantyn Smirnov
I was having a similar prob. See here: http://www.nabble.com/Alternative-to-Compass-Searchable-plugin-tp17248352p17248352.html -- View this message in context: http://www.nabble.com/Compass---Reloading-Domain-Object-Defintiion-Files-tp17742490p17749796.html Sent from the Lucene - Java Users mai