hi,
i am using lucene's "IndexSearcher" to search the given xml by keyword which
contains farsi information.
while searching i use ranges like
آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی
when i do search for "د-ژ" range the results are wrong , they are the
results of " س-ظ "range.
lucene docids are represented in a java int, so max signed int would be the
limit, a little over 2 billion.
-John
On Wed, Apr 30, 2008 at 11:54 AM, Sebastin <[EMAIL PROTECTED]> wrote:
>
> Hi All,
> Does Lucene supports Billions of data in a single index store of size 14
> GB
> for every search.I
Hello,
How can I procced to to find an exact string match in lucene with somes
articles in my search query.
For exemple: if I search for "a ball" I just want results with a ball
and not "the ball" incled in the result?
Is it possible to have a blank stop word list?
I have to set something special t
What Analyzer are you using? You might try looking in Luke to see
what is in your index, etc. It also isn't clear to me what your
documents look like.
As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and
see if you can find anything. Otherwise, you will have to write your
Hi,
thanks for your reply.
I am using StandartAnalyzer now and my xml document is like below:
i googled for farsi analyzer and found nothing also i am not sure it if
would solve my problem or not.
Thanks,
Esra
Grant Ingersoll-6 wrote:
>
> What Analyzer are you using? You might
On Apr 30, 2008, at 6:02 AM, WATHELET Thomas wrote:
Hello,
How can I procced to to find an exact string match in lucene with
somes articles in my search query.
For exemple: if I search for "a ball" I just want results with a
ball and not "the ball" incled in the result?
Is it possible to h
I am not sure how Standard Analyzer will perform on Farsi. The thing
to do now would be to get Luke and have a look at the actual document
that matches and see what it's tokens look like. You might also try
using the explain() method to see why that document matches.
Also, are you sure yo
I have created Indexes with 1.5 billion documents.
It was experimental: I took an index with 25 million documents, and
merged it with itself many times. While not definitive as there were
only 25m unique documents that were duplicated, it did prove that
Lucene should be able to handle this number
I am not sure how well lucene would perform with > 2 Billion docs in a
single index anyway.
I have posted a while ago about considering different ways of building
distributed search. A master-slave hierarchical model has been the norm, I
was hoping to see more of a system built on top of a Hadoop l
I understand. But it depends on implementation: if there are things in
Lucene that are O(n^2) or worse, then Moore's Law will not help with
large numbers. But if they are mostly O(n) or O(nlogn) on the large
numbers, then we can wait for bigger, faster, more cores to allow us
to use Lucene for bill
Using Lucene 2.3.0 I'm seeing an ArrayIndexOutOfBoundsException: 0 at
line 291 of MultiPhraseQuery.
A test should be added for (terms.length == 0).
I'm checking to see why the terms array is 0.
Bob Hastings
See below:
On Tue, Apr 29, 2008 at 9:51 PM, João Rodrigues <[EMAIL PROTECTED]> wrote:
> First of all, let me apologize for the double post but I got some strange
> error message =\
>
> >The first question is what do you mean the document
> >is already in the index? Lucene doc IDs are useless
> >h
>Probably something very like that, although you see none of that. Just
>doing a deleteDocument(term) does it all for you. And I learned long ago
>that the folks who write this kind of stuff can probably do it more
>efficiently
>than I can .
And probably more efficiently that I can as well :) Than
Hi Esra,
Caveat: I don't speak, read, write, or dream in Farsi - I just know that it
mostly shares its orthography with Arabic, and that they are both written and
read right-to-left.
How are you constructing the queries? Using QueryParser? If so, then I
suspect the problem is that you intend
On 04/30/2008 at 12:50 PM, Steven A Rowe wrote:
> Caveat: I don't speak, read, write, or dream in Farsi - I
> just know that it mostly shares its orthography with Arabic,
> and that they are both written and read right-to-left.
>
> How are you constructing the queries? Using QueryParser? If
> so
Hi Guys,
Any comments on this?
I was looking into Lucene archive and came across this
thread what asks the same question.
http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477
Any pointers will be helpful.
Regards,
Rajesh
--- Rajesh parab <[EMAIL PRO
Rajesh,
You are asking a fairly complicated question on a seldom used piece of
functionality. Constantly pinging the list is just making it less
likely that someone will respond with an answer. The likelihood that
the 1 person who understand that code (and trust me, it really is
likely
This should be a pretty easy question to answer but I haven't been
able to figure out how to do this with the API.
I want to search two fields in my index; field 1 is and ID, field 2 is
a date of the form mmdd.
Now I can write a query string by hand to do a search like this on
both fiel
When using the API you will create a Term object that specifies the
field for each term...so visually its more like field1:x or field1:y or
field1:z
and then a rangequery set to field2, all joined using the BooleanQuery
object setting Occur.must Occur.should Occur.mustnot.
Take a look at the range
On Thursday 01 May 2008 00:01:48 John Wang wrote:
> I am not sure how well lucene would perform with > 2 Billion docs in a
> single index anyway.
Even if they're in multiple indexes, the doc IDs being ints will still prevent
it going past 2Gi unless you wrap your own framework around it.
Daniel
Here you go:
Analyzer a=new StandardAnalyzer();
//open an index
String textFieldName="contents";
IndexReader reader=IndexReader.open("E:/indexes/uksites");
IndexSearcher searcher=new IndexSearcher(reader);
QueryParser qp=new QueryParser(textFieldNa
Bravo Grant!
Rajesh, I believe the following will work:
- delete your small index
- optimize your big index (needed? Not 100% sure, but I think it is)
- loop through the docs in your "big" index
- for each document in the big index, add a document to the small index
When you are done you have b
On Wed, Apr 30, 2008 at 7:10 PM, Daniel Noll <[EMAIL PROTECTED]> wrote:
> On Thursday 01 May 2008 00:01:48 John Wang wrote:
> > I am not sure how well lucene would perform with > 2 Billion docs in a
> > single index anyway.
>
> Even if they're in multiple indexes, the doc IDs being ints will sti
My apologies for quick follow-ups and thanks for
pointers/suggestions Grant and Otis.
I did check various threads on Java user forum around
this topic, but could not find a solution. Some most
relevant topics that end with same question I am
currently having.
http://www.gossamer-threads.com/lists
24 matches
Mail list logo