Error Tolerant Query Parser

2007-04-03 Thread Mohsen Saboorian
Sorry for dual posting. I've just inadvertently submit form before writing the body :) Is there any error tolerant query parser ever written for Lucene? What is the way websites use for advanced searching with Lucene? -- View this message in context: http://www.nabble.com/Error-Tolerant-tf35240

Error-Tolerant

2007-04-03 Thread Mohsen Saboorian
-- View this message in context: http://www.nabble.com/Error-Tolerant-tf3524057.html#a9831495 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional co

Re: Limitations of lucene

2007-04-03 Thread Otis Gospodnetic
speed - query complexity, concurrency, hardware, size of index document size - no practical limit, as far as I know security - lucene doesn't deal with it, your app has to implement security Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag -

Re: Lucene Optimizations

2007-04-03 Thread Otis Gospodnetic
Hi Nilesh, I have a few queries regarding optimizing lucene for search performance. 1. We index around 50 million text documents with index size greater than 40GB, and hence runtime performance is curcial. Our system has only simple keyword queries. Each search returns an object of type Hits whic

Lucene Optimizations

2007-04-03 Thread Nilesh Bansal
Hi all, I have a few queries regarding optimizing lucene for search performance. 1. We index around 50 million text documents with index size greater than 40GB, and hence runtime performance is curcial. Our system has only simple keyword queries. Each search returns an object of type Hits which

OT re Emulating Pages Search

2007-04-03 Thread Jason Pump
If the documents have some sort of fixed ranking value (pageweight) and the documents are arranged in the index in that order then at some point you can say there is no reason to look for more matches, e.g. even if the words were next to each other in query order, the document couldn't possibly

Re: Limitations of lucene

2007-04-03 Thread ashwin kumar
yes pls give me the limitations of lucene in speed, size document length and other such factors and also security issues concering lucene. regards ashwin On 4/3/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Is there some area in particular you are interested in? Speed, size, document length,

Unique City, State results from index based on zip

2007-04-03 Thread freaktet
I am having the following problem. I have an index I built from a standard US Zip Code table. Users can search for any combination of City, State and Zip. If they search for City, State, I want to find unique results, instead of one result for every zip code that city state has. For instance, a se

Re: Range search in numeric fields

2007-04-03 Thread Antony Bowesman
Ivan Vasilev wrote: Hi All, I have the following problem: I have to implement range search for fields that contain numbers. For example the field size that contains file size. The problem is that the numbers are not kept in strings with strikt length. There are field values like this: "32", "4

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Erick Erickson
I thought you could simply add a ConstantScoreQuery (whose constructor takes a Filter) to a BooleanQuery. It seems that doing this at the very top level with a MUST would do the trick. Erick On 4/3/07, Paul Elschot <[EMAIL PROTECTED]> wrote: On Tuesday 03 April 2007 17:44, Erick Erickson w

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Paul Elschot
On Tuesday 03 April 2007 17:44, Erick Erickson wrote: > ... > Then simply add the users filter to a BooleanQuery (MUST) > that you use when you search. > Adding a Filter to a BooleanQuery is not (yet) possible. For the moment one needs to use the Searcher methods that take a filter and a query.

Re: Index updates between machines

2007-04-03 Thread Andy Liu
Sounds like you might have an I/O issue. If you have multiple partitions / disks on the searching server you can search from one partition and copy to another and alternate. If you're using RAID different RAID levels are optimized for simultaneous reads and writes. If you have a 3rd machine you

Re: Index updates between machines

2007-04-03 Thread Otis Gospodnetic
How fast are your disks? Perhaps they are having trouble keeping up with simultaneous searches and massive file copying. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Chun Wei Ho <[

Re: TestSpellCheck not working

2007-04-03 Thread davep626
Sorry for the lack of detail, first error is: junit.framework.AssertionFailedError: expected:<56> but was:<29> at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:282) at junit.framework.Assert.assertEquals(Assert.java:64)

TestSpellCheck not working

2007-04-03 Thread davep626
I can't get TestSpellCheck to work. Documents appear to be added but all queries return zero hits. Is this TestCase working for anyone? -- View this message in context: http://www.nabble.com/TestSpellCheck-not-working-tf3521578.html#a9823734 Sent from the Lucene - Java Users mailing list archi

Re: Range search in numeric fields

2007-04-03 Thread Andy Liu
You can try using MemoryCachedRangeFilter. https://issues.apache.org/jira/browse/LUCENE-855 It stores field values in memory as longs so your values don't have to be lexigraphically comparable. Also, MemoryCachedRangeFilter can be orders of magnitude faster than standard RangeFilter, depending

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Grant Ingersoll
I seem to recall this type of question coming up from time to time over the years, but don't have any specific pointers, so you may find it useful to dig into the archives. -Grant On Apr 3, 2007, at 12:00 PM, Jonathan O'Connor wrote: Erick, thanks for the tips. I guess I'll have to dust of

Range search in numeric fields

2007-04-03 Thread Ivan Vasilev
Hi All, I have the following problem: I have to implement range search for fields that contain numbers. For example the field size that contains file size. The problem is that the numbers are not kept in strings with strikt length. There are field values like this: "32", "421", "1201". So when

Re: Limitations of lucene

2007-04-03 Thread Grant Ingersoll
Is there some area in particular you are interested in? Speed, size, document length, other? On Apr 3, 2007, at 11:55 AM, ashwin kumar wrote: hi all i would like to know the limitation of lucene index.if anybody have any links or ebooks please forward.. thanks regards ashwin --

Re: Extracting a subset of an index

2007-04-03 Thread Steven Rowe
Karl Wettin's code to facilitate index copying may be useful (the below link is to a post of Karl's to the java-dev mailing list): Steve Erick Erickson wrote: > In the immortal words of Erik H. ...it depends... >

IndexWriter Quandry

2007-04-03 Thread Kvailis
Hey, all - I'm pretty new to Lucene (2.0.0) and and having an issue with the IndexWriter: if I set the boolean argument to 'true' it goes ahead and writes indexes that turn out to be perfectly usable; taking the same exact code and swithing the boolean to 'false' immediately throws a FileNotFoundE

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Jonathan O'Connor
Erick, thanks for the tips. I guess I'll have to dust off my copy of LIA and get cracking. BTW, in our system a user is just a group with a single fixed member of itself. Ciao, Jonathan O'Connor XCOM Dublin

Limitations of lucene

2007-04-03 Thread ashwin kumar
hi all i would like to know the limitation of lucene index.if anybody have any links or ebooks please forward.. thanks regards ashwin

Re: Using Lucene to apply LSI

2007-04-03 Thread Faizan Ahmed
José Ramón Pérez Agüera wrote: you need to use JAMA combined with Lucene, using the vectors that are builded by lucene to compute SVD with JAMA http://math.nist.gov/javanumerics/jama/ Thanks for your help. I am new to Lucene and do not know how to build a vector. I will have my input data s

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Erick Erickson
Storage isn't too much of a problem, 12.5 M since a Lucene Filter is just a BitSet, one bit per document. (plus some trivial overhead). Computational costs... only you know. But, is every user allowed individual permissions or are users part of groups that have permissions? Filters have logi

Index updates between machines

2007-04-03 Thread Chun Wei Ho
We are running a search service on the internet using two machines. We have a crawler machine which crawls the web and merges new documents found into the Lucene index. We have a searcher machine which allows users to perform searches on the Lucene index. Periodically, we would copy the newest ve

Re: search-time boosting

2007-04-03 Thread Erick Erickson
I don't think index time boosts are the way to go. From a message on this list that I printed out (from Otis? Yonik? Chris? someone who knows wy more about this topic than I do, probably Chris given the lack of capitalization ) '...index time field boosts are a way to express things like

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Jonathan O'Connor
Michael, as usual its never so easy! Some users can see almost all documents, and some other users can see very few. I did find an interesting document that describes the problem (but offers no solutions :-() http://www.ideaeng.com/pub/entsrch/v3n4/article01.html. This article talks about early a

Re: Extracting a subset of an index

2007-04-03 Thread Erick Erickson
In the immortal words of Erik H. ...it depends... The big issue is whether you have fields in your index that are NOT stored (i.e. Field.Store.NO). If this is the case, your documents will not be complete, and adding it to the fresh index will not include the un-stored data. It's actually prett

Extracting a subset of an index

2007-04-03 Thread jafarim
Hi folks, I need to extract a subset of an index so that I can move some documents to another isolated machine to be searched locally. I'm not sure whether the following scenario is correct: - extracting the documents from the index by using one of the doc(i) methods - adding the same Document obj

Re: search-time boosting

2007-04-03 Thread Daniel Rosher
Hi Ofer, I think your best option is to boost the field for your category field during index time with Field.setBoost(floatBoost) You will have to reindex your corpus however. Regards, Dan On 4/2/07, Ofer Nave <[EMAIL PROTECTED]> wrote: I'd like to be able to boost documents at search-time,

Re: Index updates between machines

2007-04-03 Thread Daniel Rosher
Hi CW, You might find this email from Doug Cutting useful, not NFS but using rsync and hard links ... besides NFS without failover introduces a single point of faliure. http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html Regards, Dan On 4/3/07, Chun Wei Ho <[EMAIL PROTECT

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Daniel Rosher
Hi Jonathon, Since the number of users in your application is small, perhaps you could apply a pre-generated filter per user, and apply this to the search, however this won't scale well if the number of users grow. Another idea might be to have several filters,each of which detail a particular t

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Michael D. Curtin
Jonathan O'Connor wrote: I have a database of a million documents and about 100 users. The documents can have an access control list, and there is a complex, recursive algorithm to say if a particular user can see a particular document. My problem is that my search algorithm is to first do a st

Design Problem: Searching large set of protected documents

2007-04-03 Thread Jonathan O'Connor
Hi, I have a database of a million documents and about 100 users. The documents can have an access control list, and there is a complex, recursive algorithm to say if a particular user can see a particular document. My problem is that my search algorithm is to first do a standard lucene search for

Re: flush, optimize and FileNotFound exceptions

2007-04-03 Thread Simon Wistow
On Tue, Apr 03, 2007 at 08:31:20AM -0400, Michael McCandless said: > Optimize actually does its own flush before optimizing, so you don't > need to call it yourself and in fact calling it after optimize will > just be a harmless no-op. Ah, that's good to know. > You should be worried about this

Re: flush, optimize and FileNotFound exceptions

2007-04-03 Thread Michael McCandless
"Simon Wistow" <[EMAIL PROTECTED]> wrote: > I have an Indexer which inserts tasks onto a queue and then has a thread > which consumes the tasks (Index, Update or Delete) and executes them. If > the Indexer is shut down it stops the thread, waits until it's finished > its current task and then co

Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-03 Thread Grant Ingersoll
On Apr 3, 2007, at 2:05 AM, Antony Bowesman wrote: Otis Gospodnetic wrote: Here is one more related question. It looks like the o.a.l.benchmark.Driver class is supposed to be a generic driver class that uses the Benchmarker configured in one of those conf/*.xml files. However, I see Sta

Re: How to calculate centroid from HITS?

2007-04-03 Thread Grant Ingersoll
You could use Term Vectors (TVs) to do this, but I don't know of any existing code for it. Might be a good contrib module, though. Search this list or see Lucene In Action or I have some TV sample code at http://www.cnlp.org/apachecon2005/ You might also check the Carrot2 project, which h

flush, optimize and FileNotFound exceptions

2007-04-03 Thread Simon Wistow
I have an Indexer which inserts tasks onto a queue and then has a thread which consumes the tasks (Index, Update or Delete) and executes them. If the Indexer is shut down it stops the thread, waits until it's finished its current task and then consumes any other tasks on the queue. Then it runs