Tom,
Very cool! Thanks for sharing your technique, which works well for
prefixed and suffixed wildcard queries. However, it doesn't address
an * in the middle of a term, say W*D. Obviously your usage doesn't
require better performance for a wildcard in the middle, so you've
done well -
Harini,
Did you close the IndexReader every time your search is finished?
If so, 10G data will take a long time to warm up the IndexReader.
Chris
--
Full-Text Search on Any Databases
http://www.dbsight.net
On 10/10/05, Koji Sekiguchi <[EMAIL PROTECTED]> wrote:
> I
Hello,
What is MMapDirectory?
I've searched mailing list archive, but cannot find it.
I could find the following explanation at Lucene 1.9 CHANGES.txt:
8. Add MMapDirectory, which uses nio to mmap input files. This is
still somewhat slower than FSDirectory. However it uses less
memory
Is it really the part of Lucene slow?
Please take thread dumps every 15 secs, 3 to 4 times.
What can you look at them?
Koji
> -Original Message-
> From: Harini Raghavan [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 11, 2005 12:38 AM
> To: java-user@lucene.apache.org
> Subject: Lucene
Sorry about that, "download" was a poor word choice.
By download, I meant that after the applet opens an input stream to the
URL, it will need to read from the stream to get all the index data from
the web server to the user's machine so the applet can perform the
search. Whether the index files a
Marc Hadfield wrote:
I'll give Span Query's a try as they can handle the 0 increment issue.
Note that PhraseQuery can now handle this too.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMA
Thanks Doug -
I'll give Span Query's a try as they can handle the 0 increment issue.
My original desire to have more than one field comes from my document
represention which includes multiple fields containing (the same)
document text using different stemmers, as, depending on the type of
que
Doug Cutting once said, back in 2003:
" The *HitCollector*-based search API is not meant to work remotely. To do
so would involve an RPC-callback for every non-zero score, which would be
extremely expensive. Also, just making *HitCollector* serializable would not
be sufficient. You'd also need to
Marc Hadfield wrote:
I actually mention your option in my email:
In principle I could store the full text in two fields with the second
field containing the types without incrementing the token index.
Then, do a SpanQuery for "Johnson" and "name" with a distance of 0.
The resulting match w
: I am not sure if I understand the BitSet solution though. Can you give me
: implementation specifics around that?
: Are you suggesting storing BitSet information in the document of each
: cat/subcat and that the boolean value of each bit will correspond to whether
: the product is blocked or not
Doug Cutting wrote:
Why not store them in the same field using positionIncrement=0 for the
types? Then they won't change positions of non-type tokens. You
should distinguish the types syntactically, e.g., prefix them with a
space or other character that does not occur within words. That way
Hoss
Thanks for the reply. The posting was an excellent write-up and helped me
visualize my problem domain and solution better.
I like the idea about storing filter information in the contract index
indexed by company. It might work in my case.
I am not sure if I understand the BitSet solution t
Marc Hadfield wrote:
I would prefer not to mix the full text and "types" in the same field as
it would make the term positions inconsistent which i depend on for
other queries.
Why not store them in the same field using positionIncrement=0 for the
types? Then they won't change positions of n
hello -
i am looking to perform queries efficiently across multiple fields that
have their token order synchronized, ie:
Field_A[100] has some relationship to Field_B[100]
for example, consider two fields, one the full text of an article and
the other the "type" of the token where type could
Jon Schuster wrote:
> The suggestion that others have made to make the search web based is
> generally the preferred route.
>
> But it is fairly straightforward to make an unsigned applet use a remote
> Lucene index. You wouldn't need to write the index and PDF files to the
> local disk; you only
The use case is when there is some data that changes frequently, but
some data is static, _and_ that the volatile index can be rebuilt in
the same order that the static one was built. The indexes must be
"parallel" in terms of the document index order. If you delete, then
you should delet
Sorry to bug people on this again and again.
I might be missing something or confused totally, But what is the use case for
a ParallelReader if the use case is not addressing the situation where we have
a index changing frequently( meaning deletes and reindex) and index not
changing , but has s
The suggestion that others have made to make the search web based is
generally the preferred route.
But it is fairly straightforward to make an unsigned applet use a remote
Lucene index. You wouldn't need to write the index and PDF files to the
local disk; you only need to be able to open an input
A while ago I had asked a question on what would be a good solution for a
situation mentioned below and I was pointed in the direction of Parallel
Reader. Looks like that will not work.
Thank you for alerting me on this.
So other than delete and reindex the document to a single index, there is
On Montag 10 Oktober 2005 20:24, John Smith wrote:
> My understanding is ParallelReader works for situations where you have a
> static index and a dynamic index.
That's no correct. Quoting the documentation:
It is up to you to make sure all indexes
are created and modified the same way. For exam
Hi
I am using the ParallelReader feature from Lucene 1.9.
I have 2 indexes, one that doesnât change and the other that changes often. I
delete and re-index documents from the dynamic index often.
I am indexing the documents with a keyword field âidâ and giving it a
unique number. Th
Peter Kim wrote:
> I'm not sure about Perl or PHP--perhaps there are some ports of Lucene
> that'll let you do that. But the most straightforward way is to just
> write a simple Java web application with a servlet that uses an
> IndexSearcher to execute a form-entered query and have it return
> res
You can use the FieldCache to access the values of multiple fields (the same
source default sorting uses).
Alternately, if you want to generate a score based on a function of multiple
fields rather than doing an absolute sort, you can use FunctionQuery:
http://issues.apache.org/jira/browse/LUCENE-
or php. here's help
http://www.devx.com/Java/Article/20509/1954?pf=true
rgds,
sameer
On 10/10/05, Dan Armbrust <[EMAIL PROTECTED]> wrote:
>
>
> serving java/jsp applications) would be to write the necessary code to
> make perl talk to java - We have done this before (for a different
> purpos
I see your words, but I hate to admit that I don't understand them in
totality!
When you say that the search is executed on the web server, that means
that we would need to code it it Perl or some such, no?
I don't see (except for a Perl or PHP script) how the search could
execute on the website
I'm not sure about Perl or PHP--perhaps there are some ports of Lucene
that'll let you do that. But the most straightforward way is to just
write a simple Java web application with a servlet that uses an
IndexSearcher to execute a form-entered query and have it return
results.
It seems like you m
Dan Armbrust wrote:
> J. David Boyd wrote:
>
>> Here's my dilemma.
>>
>> For years, we have supplied paper documentation to our customers. Many
>> pages of paper. All together, it makes a 3 foot stack when printed.
>>
>> Also for many years, customers have been asking for docs in electronic
>> f
Hi,
I am using lucene for search functionality in my j2ee application using
JBoss as app server. The lucene index directory size is almsot 10G. The
performance has been quite good until now. But after the last deploy,
when the server was restarted , the lucene search has become very slow.
It t
J. David Boyd wrote:
Here's my dilemma.
For years, we have supplied paper documentation to our customers. Many
pages of paper. All together, it makes a 3 foot stack when printed.
Also for many years, customers have been asking for docs in electronic
format, so, recently, I wrote some Perl scr
Here's my dilemma.
For years, we have supplied paper documentation to our customers. Many
pages of paper. All together, it makes a 3 foot stack when printed.
Also for many years, customers have been asking for docs in electronic
format, so, recently, I wrote some Perl scripts that convert our m
On Oct 10, 2005, at 1:44 AM, Anand Kishore wrote:
Does stemming result in failure of exact phrase matches???
It shouldn't. Please provide a simple scenario where you're seeing
such a failure. Stemming will allow you to find more than the exact
phrase, but it should always match an exact
31 matches
Mail list logo