My apologies for quick follow-ups and thanks for
pointers/suggestions Grant and Otis.
I did check various threads on Java user forum around
this topic, but could not find a solution. Some most
relevant topics that end with same question I am
currently having.
http://www.gossamer-threads.com/lists
On Wed, Apr 30, 2008 at 7:10 PM, Daniel Noll <[EMAIL PROTECTED]> wrote:
> On Thursday 01 May 2008 00:01:48 John Wang wrote:
> > I am not sure how well lucene would perform with > 2 Billion docs in a
> > single index anyway.
>
> Even if they're in multiple indexes, the doc IDs being ints will sti
Bravo Grant!
Rajesh, I believe the following will work:
- delete your small index
- optimize your big index (needed? Not 100% sure, but I think it is)
- loop through the docs in your "big" index
- for each document in the big index, add a document to the small index
When you are done you have b
Here you go:
Analyzer a=new StandardAnalyzer();
//open an index
String textFieldName="contents";
IndexReader reader=IndexReader.open("E:/indexes/uksites");
IndexSearcher searcher=new IndexSearcher(reader);
QueryParser qp=new QueryParser(textFieldNa
On Thursday 01 May 2008 00:01:48 John Wang wrote:
> I am not sure how well lucene would perform with > 2 Billion docs in a
> single index anyway.
Even if they're in multiple indexes, the doc IDs being ints will still prevent
it going past 2Gi unless you wrap your own framework around it.
Daniel
When using the API you will create a Term object that specifies the
field for each term...so visually its more like field1:x or field1:y or
field1:z
and then a rangequery set to field2, all joined using the BooleanQuery
object setting Occur.must Occur.should Occur.mustnot.
Take a look at the range
This should be a pretty easy question to answer but I haven't been
able to figure out how to do this with the API.
I want to search two fields in my index; field 1 is and ID, field 2 is
a date of the form mmdd.
Now I can write a query string by hand to do a search like this on
both fiel
Rajesh,
You are asking a fairly complicated question on a seldom used piece of
functionality. Constantly pinging the list is just making it less
likely that someone will respond with an answer. The likelihood that
the 1 person who understand that code (and trust me, it really is
likely
Hi Guys,
Any comments on this?
I was looking into Lucene archive and came across this
thread what asks the same question.
http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477
Any pointers will be helpful.
Regards,
Rajesh
--- Rajesh parab <[EMAIL PRO
On 04/30/2008 at 12:50 PM, Steven A Rowe wrote:
> Caveat: I don't speak, read, write, or dream in Farsi - I
> just know that it mostly shares its orthography with Arabic,
> and that they are both written and read right-to-left.
>
> How are you constructing the queries? Using QueryParser? If
> so
Hi Esra,
Caveat: I don't speak, read, write, or dream in Farsi - I just know that it
mostly shares its orthography with Arabic, and that they are both written and
read right-to-left.
How are you constructing the queries? Using QueryParser? If so, then I
suspect the problem is that you intend
>Probably something very like that, although you see none of that. Just
>doing a deleteDocument(term) does it all for you. And I learned long ago
>that the folks who write this kind of stuff can probably do it more
>efficiently
>than I can .
And probably more efficiently that I can as well :) Than
See below:
On Tue, Apr 29, 2008 at 9:51 PM, João Rodrigues <[EMAIL PROTECTED]> wrote:
> First of all, let me apologize for the double post but I got some strange
> error message =\
>
> >The first question is what do you mean the document
> >is already in the index? Lucene doc IDs are useless
> >h
Using Lucene 2.3.0 I'm seeing an ArrayIndexOutOfBoundsException: 0 at
line 291 of MultiPhraseQuery.
A test should be added for (terms.length == 0).
I'm checking to see why the terms array is 0.
Bob Hastings
I understand. But it depends on implementation: if there are things in
Lucene that are O(n^2) or worse, then Moore's Law will not help with
large numbers. But if they are mostly O(n) or O(nlogn) on the large
numbers, then we can wait for bigger, faster, more cores to allow us
to use Lucene for bill
I am not sure how well lucene would perform with > 2 Billion docs in a
single index anyway.
I have posted a while ago about considering different ways of building
distributed search. A master-slave hierarchical model has been the norm, I
was hoping to see more of a system built on top of a Hadoop l
I have created Indexes with 1.5 billion documents.
It was experimental: I took an index with 25 million documents, and
merged it with itself many times. While not definitive as there were
only 25m unique documents that were duplicated, it did prove that
Lucene should be able to handle this number
I am not sure how Standard Analyzer will perform on Farsi. The thing
to do now would be to get Luke and have a look at the actual document
that matches and see what it's tokens look like. You might also try
using the explain() method to see why that document matches.
Also, are you sure yo
Hi,
thanks for your reply.
I am using StandartAnalyzer now and my xml document is like below:
i googled for farsi analyzer and found nothing also i am not sure it if
would solve my problem or not.
Thanks,
Esra
Grant Ingersoll-6 wrote:
>
> What Analyzer are you using? You might
On Apr 30, 2008, at 6:02 AM, WATHELET Thomas wrote:
Hello,
How can I procced to to find an exact string match in lucene with
somes articles in my search query.
For exemple: if I search for "a ball" I just want results with a
ball and not "the ball" incled in the result?
Is it possible to h
What Analyzer are you using? You might try looking in Luke to see
what is in your index, etc. It also isn't clear to me what your
documents look like.
As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and
see if you can find anything. Otherwise, you will have to write your
Hello,
How can I procced to to find an exact string match in lucene with somes
articles in my search query.
For exemple: if I search for "a ball" I just want results with a ball
and not "the ball" incled in the result?
Is it possible to have a blank stop word list?
I have to set something special t
lucene docids are represented in a java int, so max signed int would be the
limit, a little over 2 billion.
-John
On Wed, Apr 30, 2008 at 11:54 AM, Sebastin <[EMAIL PROTECTED]> wrote:
>
> Hi All,
> Does Lucene supports Billions of data in a single index store of size 14
> GB
> for every search.I
hi,
i am using lucene's "IndexSearcher" to search the given xml by keyword which
contains farsi information.
while searching i use ranges like
آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی
when i do search for "د-ژ" range the results are wrong , they are the
results of " س-ظ "range.
24 matches
Mail list logo