I've also seen FileNotFound exceptions when attempting a search on an index
while it's being updated, and the searcher is in a different JVM. This is
supposed to be supported, but on Windows seems to regularly fail (for me
anyway).
The simplest solution to this would be a service oriented approa
e" which is what should have
cone in your doc when it was indexed using that analyzer.
:
: On 9/3/06, Jason Polites <[EMAIL PROTECTED]> wrote:
: >
: > Roger that. I'll double check my code.
: >
: > Thanks.
: >
: >
: > On 9/3/06, Otis Gospodnetic <[EMAIL PROT
ot;, but not "on".
This is fine, and if the user searches for:
Disney on Ice
They will get a match. But, it seems that a search for:
"Disney on Ice"
With the quotations indicating the desire for an "exact match", the absence
of stop words in the index means this
Original Message ----
From: Jason Polites <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, September 2, 2006 9:05:27 AM
Subject: Stop words in index
Hey all,
I am using the StandardAnalyzer with my own list of stop words (which is
more comprehensive than the default list), and m
Hey all,
I am using the StandardAnalyzer with my own list of stop words (which is
more comprehensive than the default list), and my expectation was that this
would omit these stop words from the index when data is indexed using this
analyzer. However, I am seeing stop words in the term vector fo
Hi all,
I understand that it is possible to "re-create" fields which are indexed but
not stored (as is done by Luke), and that this is a lossy process, however I
am wondering whether the indexed version of this remains consistent.
That is, if I re-create a non-stored field, then re-index this fi
Have you looked at the MoreLikeThis class in the similarity package?
On 8/30/06, Winton Davies <[EMAIL PROTECTED]> wrote:
Hi All,
I'm scratching my head - can someone tell me which class implements
an efficient multiple term TF.IDF Cosine similarity scoring mechanism?
There is clearly the sin
ound.. if that
helps.
On 8/28/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
Jason Polites wrote:
> Yeah.. I had a think about this, and I now remember why I originally
> came to
> the conclusion about cross-JVM access.
>
> When I was adding documents to the index, and searc
Yeah.. I had a think about this, and I now remember why I originally came to
the conclusion about cross-JVM access.
When I was adding documents to the index, and searching at the same time
(from a different JVM) I would get the occassional (but regular)
FileNotFoundException.
I don't recall the
Not sure what the desired end result is here, but you shouldn't need to
update the document jut to give it a boost factor. This can be done in the
query string used to search the index.
As for updating affecting search order, I don't think you can assume any
guarantees in this regard. You're pr
]> wrote:
Doron Cohen wrote:
> "Jason Polites" <[EMAIL PROTECTED]> wrote on 27/08/2006 09:36:07:
>
>> I would have thought that simultaneous cross-JVM access to an index was
>> outside of scope of the core Lucene API (although it would be great),
but
&
due to any reason can be thought of as the same
thing, regardless of the reason (so long as its logged).
Seems like the simplest solution too.
On 8/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 8/26/06, Jason Polites <[EMAIL PROTECTED]> wrote:
> Synchronization at this
On 8/26/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
Are you also running searchers against this index? Are they re-init'ing
frequently or being opened and then held open?
No searches running in my initial test, although I can't be certain what is
happening under the Compass hood.
This
Hi all,
When indexing with multiple threads, and under heavy load, I get the
following exception:
java.io.IOException: Access is denied
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:850)
at org.apache.lucene.store.FSDirectory$1.o
I'm not sure about the solution in the referenced thread. It will work, but
doesn't it run the risk of breaching the transaction isolation of the
database write?
The issue is when the index is notified of a database update. If it is
notified prior to the transaction commit, and the commit fails
dex all data that way. The database is not required.
To address your search complexity concern, you can create queries that
search only those Field(s) the user wants -- there is no need to have a
Field for each possible combination of content type.
Steve
Jason Polites wrote:
> Maybe I'm not u
fferent threads accessing the
index. This would also explain why you see the problem in production and
not testing.
On 8/15/06, Jason Polites <[EMAIL PROTECTED]> wrote:
My advice would be the "back-to-basics" approach. Create a test case
which creates a simple index with a few do
My advice would be the "back-to-basics" approach. Create a test case which
creates a simple index with a few documents, verify the index is as you
expect, then re-create the index and verify again. Run this test case on
your production environment (if you are able). This will determine once and
This strategy can also be nicely abstracted from your main app. Whilst I
haven't yet implemented it, my plan is to create a template style structure
which tells me which fields are in lucene, and which are externalized. This
way I don't bother storing data in lucene that it stored elsewhere, but
Sounds like you're a bit frustrated. Cheer up, the simple fact is that
engineering and business rarely see eye-to-eye. Just focus on the fact that
what you have learnt from the process will help you, and they paid for it ;)
On the issue at hand...Lucene should scale to this level, but you need
IMO you should avoid storing any data in the index that you don't need for
display. Lucene is an index (and a damn good one), not a database. If you
find yourself storing large amounts of data in the index, this could be an
indication that you may need to re-think your architecture.
In its simp
Maybe I'm not understanding your requirement, but this should be fairly
simple in Lucene.
Each document in your document management system would be represented by a
single Lucene document in the index. Each lucene document will then have
several fields, each field representing the values of the
Yes you could use lucene for this, but it may be overkill for your
requirement. If I understand you correctly, all you need to is find
documents which match "any" of the words in your list? Do you need to rank
the results? If not, it's probably easier just to create your own inverted
index of
the index. Lucene works
best when the index is light-weight. My recommendation is to think
carefully about the "role" of the index, vs the role of your data storage
approach.
On 8/11/06, Deepan Chakravarthy <[EMAIL PROTECTED]> wrote:
On Fri, 2006-08-11 at 01:58 +1000, Jason Po
I can share the data.. but it would be quicker for you to just pull out some
random text from anywhere you like.
The issue is that the text was in an email, which was one of about 2,000 and
I don't know which one. I got the 4.5MB figure from the number of bytes in
the byte array reported in the
Are your storing the contents of the fields in the index? That is,
specifying Field.Store.YES when creating the field?
In my experience fields which are not stored are not recoverable from the
index (well.. they can be reconstructed but it's a lossy process). So when
you retrieve the document,
Thanks for the Jira issue...
one question on your synchronization comment...
I have "assumed" I can't have two threads writing to the index concurrently,
so have implemented my own read/write locking system. Are you saying I
don't need to bother with this? My reading of the doco suggests that y
Hello all,
I am experiencing some performance problems indexing large(ish) amounts of
text using the IndexField.Store.COMPRESS option when creating a Field in
Lucene.
I have a sample document which has about 4.5MB of text to be stored as
compressed data within the field, and the indexing of this
There is also an open source java anti spam api which does a baysian scan of
email content (plus other stuff).
You could retro-fit to work with raw text.
www.jasen.org
(get the latest HEAD from CVS as the current release is a bit old... new
version imminent)
- Original Message -
From:
You could do it asynchronously. That is, separate off the actually
lucene search into a different thread which does the actual search, then
the calling thread simply waits for a maximum time for the search thread
to complete, then queries the status of the search thread to get the
results obtained
if ((indexFile = new File(indexDir)).exists() &&
indexFile.isDirectory())
{
exists = false;
Isn't this backwards?
Couldn't you just do:
indexFile = new File(indexDir);
exists = (indexFile.exists() && indexFile.isDirectory());
-Original Message-
From: bib_lucene bib [mailto:
31 matches
Mail list logo