Re: ParalleReader and synchronization between indexes

2008-04-30 Thread Rajesh parab
My apologies for quick follow-ups and thanks for pointers/suggestions Grant and Otis. I did check various threads on Java user forum around this topic, but could not find a solution. Some most relevant topics that end with same question I am currently having. http://www.gossamer-threads.com/lists

Re: Does Lucene Supports Billions of data

2008-04-30 Thread Yonik Seeley
On Wed, Apr 30, 2008 at 7:10 PM, Daniel Noll <[EMAIL PROTECTED]> wrote: > On Thursday 01 May 2008 00:01:48 John Wang wrote: > > I am not sure how well lucene would perform with > 2 Billion docs in a > > single index anyway. > > Even if they're in multiple indexes, the doc IDs being ints will sti

Re: ParalleReader and synchronization between indexes

2008-04-30 Thread Otis Gospodnetic
Bravo Grant! Rajesh, I believe the following will work: - delete your small index - optimize your big index (needed? Not 100% sure, but I think it is) - loop through the docs in your "big" index - for each document in the big index, add a document to the small index When you are done you have b

Re: Please help with Gradient Formatter

2008-04-30 Thread markharw00d
Here you go: Analyzer a=new StandardAnalyzer(); //open an index String textFieldName="contents"; IndexReader reader=IndexReader.open("E:/indexes/uksites"); IndexSearcher searcher=new IndexSearcher(reader); QueryParser qp=new QueryParser(textFieldNa

Re: Does Lucene Supports Billions of data

2008-04-30 Thread Daniel Noll
On Thursday 01 May 2008 00:01:48 John Wang wrote: > I am not sure how well lucene would perform with > 2 Billion docs in a > single index anyway. Even if they're in multiple indexes, the doc IDs being ints will still prevent it going past 2Gi unless you wrap your own framework around it. Daniel

Re: Simple query API question

2008-04-30 Thread Mark Miller
When using the API you will create a Term object that specifies the field for each term...so visually its more like field1:x or field1:y or field1:z and then a rangequery set to field2, all joined using the BooleanQuery object setting Occur.must Occur.should Occur.mustnot. Take a look at the range

Simple query API question

2008-04-30 Thread Preston Price
This should be a pretty easy question to answer but I haven't been able to figure out how to do this with the API. I want to search two fields in my index; field 1 is and ID, field 2 is a date of the form mmdd. Now I can write a query string by hand to do a search like this on both fiel

Re: ParalleReader and synchronization between indexes

2008-04-30 Thread Grant Ingersoll
Rajesh, You are asking a fairly complicated question on a seldom used piece of functionality. Constantly pinging the list is just making it less likely that someone will respond with an answer. The likelihood that the 1 person who understand that code (and trust me, it really is likely

Re: ParalleReader and synchronization between indexes

2008-04-30 Thread Rajesh parab
Hi Guys, Any comments on this? I was looking into Lucene archive and came across this thread what asks the same question. http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477 Any pointers will be helpful. Regards, Rajesh --- Rajesh parab <[EMAIL PRO

RE: lucene farsi problem

2008-04-30 Thread Steven A Rowe
On 04/30/2008 at 12:50 PM, Steven A Rowe wrote: > Caveat: I don't speak, read, write, or dream in Farsi - I > just know that it mostly shares its orthography with Arabic, > and that they are both written and read right-to-left. > > How are you constructing the queries? Using QueryParser? If > so

RE: lucene farsi problem

2008-04-30 Thread Steven A Rowe
Hi Esra, Caveat: I don't speak, read, write, or dream in Farsi - I just know that it mostly shares its orthography with Arabic, and that they are both written and read right-to-left. How are you constructing the queries? Using QueryParser? If so, then I suspect the problem is that you intend

Re: Removing duplicate entries

2008-04-30 Thread João Rodrigues
>Probably something very like that, although you see none of that. Just >doing a deleteDocument(term) does it all for you. And I learned long ago >that the folks who write this kind of stuff can probably do it more >efficiently >than I can . And probably more efficiently that I can as well :) Than

Re: Removing duplicate entries

2008-04-30 Thread Erick Erickson
See below: On Tue, Apr 29, 2008 at 9:51 PM, João Rodrigues <[EMAIL PROTECTED]> wrote: > First of all, let me apologize for the double post but I got some strange > error message =\ > > >The first question is what do you mean the document > >is already in the index? Lucene doc IDs are useless > >h

bug in MultiPhraseQuery toString() method, ArrayIndexOutOfBoundsException

2008-04-30 Thread Robert . Hastings
Using Lucene 2.3.0 I'm seeing an ArrayIndexOutOfBoundsException: 0 at line 291 of MultiPhraseQuery. A test should be added for (terms.length == 0). I'm checking to see why the terms array is 0. Bob Hastings

Re: Does Lucene Supports Billions of data

2008-04-30 Thread Glen Newton
I understand. But it depends on implementation: if there are things in Lucene that are O(n^2) or worse, then Moore's Law will not help with large numbers. But if they are mostly O(n) or O(nlogn) on the large numbers, then we can wait for bigger, faster, more cores to allow us to use Lucene for bill

Re: Does Lucene Supports Billions of data

2008-04-30 Thread John Wang
I am not sure how well lucene would perform with > 2 Billion docs in a single index anyway. I have posted a while ago about considering different ways of building distributed search. A master-slave hierarchical model has been the norm, I was hoping to see more of a system built on top of a Hadoop l

Re: Does Lucene Supports Billions of data

2008-04-30 Thread Glen Newton
I have created Indexes with 1.5 billion documents. It was experimental: I took an index with 25 million documents, and merged it with itself many times. While not definitive as there were only 25m unique documents that were duplicated, it did prove that Lucene should be able to handle this number

Re: lucene farsi problem

2008-04-30 Thread Grant Ingersoll
I am not sure how Standard Analyzer will perform on Farsi. The thing to do now would be to get Luke and have a look at the actual document that matches and see what it's tokens look like. You might also try using the explain() method to see why that document matches. Also, are you sure yo

Re: lucene farsi problem

2008-04-30 Thread esra
Hi, thanks for your reply. I am using StandartAnalyzer now and my xml document is like below: i googled for farsi analyzer and found nothing also i am not sure it if would solve my problem or not. Thanks, Esra Grant Ingersoll-6 wrote: > > What Analyzer are you using? You might

Re: Exact string

2008-04-30 Thread Grant Ingersoll
On Apr 30, 2008, at 6:02 AM, WATHELET Thomas wrote: Hello, How can I procced to to find an exact string match in lucene with somes articles in my search query. For exemple: if I search for "a ball" I just want results with a ball and not "the ball" incled in the result? Is it possible to h

Re: lucene farsi problem

2008-04-30 Thread Grant Ingersoll
What Analyzer are you using? You might try looking in Luke to see what is in your index, etc. It also isn't clear to me what your documents look like. As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and see if you can find anything. Otherwise, you will have to write your

Exact string

2008-04-30 Thread WATHELET Thomas
Hello, How can I procced to to find an exact string match in lucene with somes articles in my search query. For exemple: if I search for "a ball" I just want results with a ball and not "the ball" incled in the result? Is it possible to have a blank stop word list? I have to set something special t

Re: Does Lucene Supports Billions of data

2008-04-30 Thread John Wang
lucene docids are represented in a java int, so max signed int would be the limit, a little over 2 billion. -John On Wed, Apr 30, 2008 at 11:54 AM, Sebastin <[EMAIL PROTECTED]> wrote: > > Hi All, > Does Lucene supports Billions of data in a single index store of size 14 > GB > for every search.I

lucene farsi problem

2008-04-30 Thread esra
hi, i am using lucene's "IndexSearcher" to search the given xml by keyword which contains farsi information. while searching i use ranges like آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی when i do search for "د-ژ" range the results are wrong , they are the results of " س-ظ "range.