Re: Index corruption and repair

2022-05-25 Thread Antony Joseph
Hi Mike, Any updates? Regards, Antony On Wed, 11 May 2022 at 01:02, Antony Joseph wrote: > Hello Mike, > > 1. As requested, the full checkindex log is attached. > > 2. We haven't made any changes to the IndexDeletionPolicy - so the > assumption is the default policy i

Re: Index corruption and repair

2022-05-10 Thread Antony Joseph
as running fine on the same system. Thanks for your assistance. Regards, Antony On Thu, 5 May 2022 at 20:06, Michael McCandless wrote: > Antony, do you maybe have Microsoft Defender turned on, which might > quarantine files that it suspects are malicious? I'm not sure if it is on >

Re: Index corruption and repair

2022-05-04 Thread Antony Joseph
Hi Michael, Any update? Regards, Antony On Sun, 1 May 2022 at 19:35, Antony Joseph wrote: > Hi Michael, > > Thank you for your reply. Please find responses to your questions below. > > Regards, > Antony > > On Sat, 30 Apr 2022 at 18:59, Michael McCandless < > lu

Re: Index corruption and repair

2022-05-01 Thread Antony Joseph
Hi Michael, Thank you for your reply. Please find responses to your questions below. Regards, Antony On Sat, 30 Apr 2022 at 18:59, Michael McCandless wrote: > Hi Antony, > > Hmm it looks like the root cause is this: > > Caused by: java.nio.file.NoSuchFileException: D:\i\

Re: Index corruption and repair

2022-04-30 Thread Antony Joseph
fos.readCommit(SegmentInfos.java:288) ... 2 more Regards, Antony On Sat, 30 Apr 2022 at 10:59, Robert Muir wrote: > The most helpful thing would be the full stacktrace of the exception. > This exception should be chaining the original exception and call > site, and maybe tell us more about

Re: Index corruption and repair

2022-04-28 Thread Antony Joseph
gards, Antony On Thu, 28 Apr 2022 at 17:00, Adrien Grand wrote: > Hi Anthony, > > This isn't something that you should try to fix programmatically, > corruptions indicate that something is wrong with the environment, > like a broken disk or corrupt RAM. I would suggest running

Index corruption and repair

2022-04-28 Thread Antony Joseph
roblem - the application logic is the same. Also, while the application runs on both Linux and Windows, so far we have observed this situation only on various Windows platforms. Would really appreciate some assistance. Thanks in advance. Regards, Antony

Range query with Lucene7.7.1 on old indexes.

2021-09-01 Thread Antony Joseph
Hi all, Using: python 2.7.14, pylucene 4.10.0 Index: xdate = long("20190101183030") doc.add(LongField('xdate', xdate, Field.Store.YES)) # stored and not analyzed Query: query = NumericRangeQuery.newLongRange("xdate", long("2019010100"), long("20190101115959"), True, True) I am getting the

In Pylucene using Apache Tika, with jnius.

2016-06-29 Thread Antony
(jnius\jnius.c:17342) File "jnius\jnius_env.pxi", line 11, in jnius.get_jnienv (jnius\jnius.c:3162) File "jnius\jnius_jvm_desktop.pxi", line 55, in jnius.get_platform_jnienv (jnius\jnius.c:3093) File "jnius\jnius_jvm_desk

Excessive mem usage with 32-bit app, on 64-bit server

2012-11-22 Thread Antony Joseph
search query is executed, when does the memory used by the result get free again? Is it after an idle period or when the JVM hits memory usage limits or what? 4. Could this be caused due to a memory leak in our code? Any "common mistakes" that we could chec

Improving indexing speed

2011-11-10 Thread antony jospeh
directory and pass the file into pool of worker threads using a queue all of the which share same index writer, How ever there is no any significant changes in indexing speed Any hints I am doing wrong or any suggestion Thanks Antony

Re: performance question - number of documents

2011-10-23 Thread Antony Sequeira
May be that assumption is wrong. I also haven't understood how search scales :( -Antony On Sun, Oct 23, 2011 at 10:18 AM, Erick Erickson wrote: > "Why would it matter...top 5 matches" Because Lucene has to calculate > the score of all documents in order to insure that it r

Re: ImportError: DLL load failed: The specified module could not be found.

2011-05-29 Thread Antony Joseph
Hi all , Finally i resolved my problem msvcp71.dll was missing. Thanks, On 25 May 2011 12:27, Antony Joseph wrote: > Hi, > > Please help me to resolve this imort error. > > Thanks > Antony > > C:\Documents and Settings\Antony>java -version > > java version

ImportError: DLL load failed: The specified module could not be found.

2011-05-24 Thread Antony Joseph
Hi, Please help me to resolve this imort error. Thanks Antony C:\Documents and Settings\Antony>java -version java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing) C:\Documents and Settings\A

Re: What doc id to use on IndexReader with SetNextReader

2011-04-18 Thread Antony Bowesman
Thanks Uwe, I assumed as much. On 18/04/2011 7:28 PM, Uwe Schindler wrote: Document d = reader.document(doc) This is the correct way to do it. Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additi

What doc id to use on IndexReader with SetNextReader

2011-04-18 Thread Antony Bowesman
and how the APIs should be used? Thanks Antony - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

NullPointerException in FieldSortedHitQueue

2011-04-14 Thread Antony Bowesman
SortField containing a comparator? Thanks Antony - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Index time boost question

2011-04-14 Thread Antony Bowesman
I have a test case written for 2.3.2 that tested an index time boost on a field of 0.0F and then did a search using Hits and got 0 results. I'm now in the process of upgrading to 2.9.4 and am removing all use of Hits in my test cases and using a Collector instead. Now the test case fails as it

DocIdSet to represent small numberr of hits in large Document set

2011-04-04 Thread Antony Bowesman
I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4). Many of our indexes are 5M+ Documents, however, only a small subset of these are relevant to any user. As a DocIdSet, backed by a BitSet or OpenBitSet, is rather inefficient in terms of memory use, what is the recommended

TopFieldDocCollector and v3.0.0

2009-12-07 Thread Antony Bowesman
gested path for migrating TopFieldDocCollector usage? Antony - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

NumberFormatException when creating field cache

2009-09-09 Thread Antony Bowesman
f data tolerance when creating these caches? At least now the only solution is to delete that Document. Perhaps the values could then be returned as 0 in the Parser implementations for numeric failures. Antony - To

Re: TermEnum with deleted dccuments

2009-05-10 Thread Antony Bowesman
Hi Mike, Thanks for the response. I looked at that issue, but my case is trivial to fix. I just keep the Set of terms I have deleted and ignore those during my second interation. Thanks Antony Michael McCandless wrote: This is known & expected. Lucene does not update the t

TermEnum with deleted dccuments

2009-05-06 Thread Antony Bowesman
returns > 0 for those terms even though the docs are deleted. Should this be the case? I have tried closing the reader between enumerations, but no difference. Antony - To unsubscribe, e-mail: java-user-unsub

Re: How to not overwrite a Document if it 'already exists'?

2009-05-05 Thread Antony Bowesman
roach for now and will try to get some performance data, so thanks for your comments Mike. Antony - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to not overwrite a Document if it 'already exists'?

2009-05-05 Thread Antony Bowesman
icult to support something like this in the IndexWriter API and if not, would it end up being more efficient that using a reader/terms to check this? Antony - To unsubscribe, e-mail: java-user-unsubscr...@lu

Which is more efficient

2009-05-05 Thread Antony Bowesman
Just wondered which was more efficient under the hood for (int i = 0; i < size; i++) terms[i] = new Term("id", doc_key[i]); This writer.deleteDocuments(terms); for (int i = 0; i < size; i++) writer.addDocument(doc[i]); Or this for (int i = 0; i < size; i++) writer.updateDoc

How to not overwrite a Document if it 'already exists'?

2009-05-05 Thread Antony Bowesman
term id:XXX? Given that opening a reader is expensive, is there any way to do this efficiently? I guess what I want is IndexWriter.addDocumentIfMissing(Term term, Document doc, Analyzer analyzer) Thanks Antony - To unsubscri

RE: test

2009-04-07 Thread Antony Joseph
Hi, In a long running process Lucene get crashed in my application, Is there any way to diagnose or how can I turn on debug logging / trace logging for Lucene? Thanks Antony -- DigitalGlue, India - To

test

2009-04-07 Thread Antony Joseph
hi -- DigitalGlue, India

Re: Lucene 2.4 - Searching

2009-01-27 Thread Antony Bowesman
well as index it, then you can get the original back from the Document. Antony - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: addIndexesNoOptimize question

2008-12-19 Thread Antony Bowesman
Thanks Mike, I'm still on 2.3.1, so will upgrade soon. Antony Michael McCandless wrote: This was an attempt on addIndexesNoOptimize's part to "respect" the maxMergeDocs (which prevents large segments from being merged) you had set on IndexWriter. However, the check was t

addIndexesNoOptimize question

2008-12-17 Thread Antony Bowesman
The javadocs state "This requires ... and the upper bound* of those segment doc counts not exceed maxMergeDocs." Can one of the gurus please explain what that means and what needs to be done to find out whether an index being merged fits that criteria. Tha

Re: Which is faster/better

2008-11-25 Thread Antony Bowesman
case for delete-by-docId is to perform a dBQ and so far, we have been using your suggestion from last year about how to do delete documents for ALL terms. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Which is faster/better

2008-11-24 Thread Antony Bowesman
better Javadocs, so it's unclear which is the 'right' one to use. Any pointers? Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: distinct field values

2008-10-14 Thread Antony Bowesman
use TermEnum + TermDocs to walk the tags / docs and see what tag the hit comes from. This would be different to walking the Hits/Documents to fetch the tag from the Document. Not sure if this is very efficient though, depends on the Document count. Antony

Re: Phrase Query

2008-09-16 Thread Antony Bowesman
Is it possible to write a document with different analyzers in different fields? PerFieldAnalyzerWrapper - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...

2008-09-11 Thread Antony Bowesman
which I should use - there seem to be comments in the dev list to avoid MultiSearcher... Any thoughts or have I spiralled too far into Lucene's depths to see where I am...? Antony - To unsubscribe, e-mail: [EMAIL P

Re: Merging indexes - which is best option?

2008-09-08 Thread Antony Bowesman
Thanks Karsten, I decided first to delete all duplicates from master(iW) and then to insert all temporary indices(other). I reached the same conclusion. As your code shows, it's a simple enough solution. You had a good point with the iW.abort() in the rollback case. A

Merging indexes - which is best option?

2008-09-04 Thread Antony Bowesman
e Document from the reader. Any views? Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Javadoc wording in IndexWriter.addIndexesNoOptimize()

2008-09-04 Thread Antony Bowesman
GE_DOCS is deprecated. Thanks Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Can TermDocs.skipTo() go backwards

2008-08-27 Thread Antony Bowesman
- It's marked as 3.0, but there was some hope for a 2.4 release. Are there any estimates for when this might get to a release - this is an exciting development for me. Thanks Antony - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Can TermDocs.skipTo() go backwards

2008-08-27 Thread Antony Bowesman
Document. What would fit my usage would be something like byte[] b = doc.getPayload("owner", ownerId); where for the given OID, I can retrieve the payload I associated with it, when I did doc.add(new Field("owner", ownerId, accessPayload); but that's no

Can TermDocs.skipTo() go backwards

2008-08-27 Thread Antony Bowesman
resort the scoreDocs by docId order and then loop with termPositions.skipTo(scoreDoc.doc). The number of hits will be typically small so it'll be fast enough. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additio

Re: Multiple index performance

2008-08-18 Thread Antony Bowesman
maybe I misunderstood your use case. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multiple index performance

2008-08-18 Thread Antony Bowesman
single index. We also support sharding across multiple index files for performance/scaling considerations, via a hash of the ownerId, but in practice have not needed it. Much will depend on your search usage. YMMV Antony - To

Re: Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-18 Thread Antony Bowesman
implications of this method. I will be using caches, but my volumes are potentially so large that I may never be able to cache everything (perhaps 500M Docs), so this has to be very quick. I'll play with both approaches and see which works best. Thanks for you time an

Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-17 Thread Antony Bowesman
, Index.NO_NORMS); doc.add(f); } then will the array elements for the corresponding Field arrays returned by Document.getFields("ownerId") Document.getFields("accessId") **guarantee** that the array element order is the same as the order they were added? Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Payloads and tokenizers

2008-08-14 Thread Antony Bowesman
nd used in invertField()? I'd rather stick with core Lucene than start making proprietary changes, but it seems I can't quite get to where I want to be without some quite cludgy code for a very simple use case :( Antony Doron Cohen wrote: IIRC first versions of patches that added paylo

Payloads and tokenizers

2008-08-13 Thread Antony Bowesman
ex.UNTOKENIZED); f.setPayload("B1"); doc.add(f); and avoid the whole unnecessary Tokenizer/Analyzer overhead and give support for payloads in untokenized fields. It looks like it would be trivial to implement in DocumentsWriter.invertField(). Or would this corrupt the Fieldabl

Re: Per user data store

2008-08-05 Thread Antony Bowesman
ce the score. If it is part of the query, the complete document set for other users will influence the hits for this user. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Modifying a document by updating a payloads?

2008-07-30 Thread Antony Bowesman
I guess they ultimately equate to the same thing - i.e. using a stored field to hold the document's "payload", but it would be an extra field to load. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Modifying a document by updating a payloads?

2008-07-30 Thread Antony Bowesman
Documents, but is it possible to update a payload for an existing Document? Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Memory leaks during indexing.

2008-07-22 Thread Antony Joseph
mb , after 3 hours the python consumption shows 140mb . *The *performance of Indexing become poor and memory leaks. please help me to solve the problem. Thanks Antony -- Antony Joseph A DigitalGlue [EMAIL PROTECTED] T: +91 22 30601091

Re: Rebuilding parallel indexes

2008-06-09 Thread Antony Bowesman
would need recreation (I'm assuming the optimization would muck up the Ids if only the parallel index was optimized). You'd also need to get the new doc Id for each doc that is added. Are docIds allocated during addDocument or during the c

Rebuilding parallel indexes

2008-06-09 Thread Antony Bowesman
27; could not allow the original docId to be re-used, thus keeping the two parallel indexes in sync without requiring a rebuild. If this could be overcome, this would make this parallel index pattern so much more useful for large volume data sets.

OT: Parsing Russian text from RTF

2008-05-15 Thread Bowesman Antony
n, apparently CP1251, but there's a lovely line in the RTFReader class /* TODO: per-font font encodings ( \fcharset control word ) ? */ Does anyone know if the RTF above is correct - the only place the translation table is set during the parse is when the 'ansi'

Re: Numerical Range Query

2008-05-13 Thread Bowesman Antony
An alternative to Lucene's NumberTools, is Solr's NumberUtils, which is more space efficient for indexing numbers, but not as pretty to look at http://lucene.apache.org/solr/api/org/apache/solr/util/NumberUtils.html Dan Hardiker wrote: > Hi, > > I've got an application which stores ratings fo

Re: Can POI provide reliable text extraction results for productionsearch engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Bowesman Antony
arsing framework and am using it in our product and have tested all of the above and the priority for Word parsing is TextMining v0.4, before POI and then the other two which I plugged in via the parse-ext parser. HTH Antony Lukas Vlcek wrote: > Hi, > > I need to find a reliable w

Re: Binding lucene instance/threads to a particular processor(or core)

2008-04-21 Thread Antony Bowesman
ed a 1:1 model of Solaris threads to LWPs. That new library had dramatic performance improvements over the old. Some background info for Java and threading http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html Antony Glen Newton wrote: I realised that not everyone on this lis

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-14 Thread Antony Bowesman
mer: all of this is purely brainstorming, i've never actually tried anything like this, it may be more trouble then it's worth. :) Thanks for the sounding board - it's always useful to get new ideas! Antony - T

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-14 Thread Antony Bowesman
multiple field, and using stored fields can 'modify' that Document. However, what happens to the DocId when the delete+add occurs and how do I ensure it stays the same. I'm on 2.3.1. I seem to recall a discussion on this in another thread, but cannot find it. Antony Chris

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Antony Bowesman
he RO data can easily be re-created, which means I can't just create the filter as part of the base search. Regards Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Antony Bowesman
eadache of scaling just Lucene, which is a simple beast, than the whole bundle of 'stuff' that comes with the database as well. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to improve performance of large numbers of successive searches?

2008-04-10 Thread Antony Bowesman
. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search emails - parsing mailbox (mbox) files

2008-04-04 Thread Antony Bowesman
fer some support , but couldn't find much documentation about it. Apache James' MIME4J is one parser and Javamail also can parse mail. I found Javamail more intuitive, but have not tested either against a large mail set for reliability and per

Re: Biggest index

2008-03-16 Thread Antony Bowesman
We're about to embark on a 25-40M documents (email data) per annum, no deletes over 10 years. Planning for index distribution, but haven't decided on the partitioning yet. Antony - To unsubscribe, e-mail: [EMAIL

Re: Using RangeFilter

2008-01-24 Thread Antony Bowesman
vivek sar wrote: I've a field as NO_NORM, does it has to be untokenized to be able to sort on it? NO_NORMS is the same as UNTOKENIZED + omitNorms, so you can sort on that. Antony - To unsubscribe, e-mail: [EMAIL PROT

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-23 Thread Antony Bowesman
even slower than 2.3 with 2.1 index. It catches up in the longer result set. Any ideas why that might be. The shared searcher multiple threads is probably quite a common use case. Antony - To unsubscribe, e-mail: [EMAIL

DateTools UTC/GMT mismatch

2008-01-22 Thread Antony Bowesman
Hi, I just noticed that although the Javadocs for Lucene 2.2 state that the dates for DateTools use UTC as a timezone, they are actually using GMT. Should either the Javadocs be corrected or the code corrected to use UTC instead. Antony

Re: Using RangeFilter

2008-01-21 Thread Antony Bowesman
vivek sar wrote: I need to be able to sort on optime as well, thus need to store it . Lucene's default sorting does not need the field to be stored, only indexed as untokenized. Antony - To unsubscribe, e-mail: [

Re: Lucene sorting case-sensitive by default?

2008-01-15 Thread Antony Bowesman
he original and indexed as lower case into multiple tokens, you will get the RuntimeException from FieldCache. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: how do I get my own TopDocHitCollector?

2008-01-10 Thread Antony Bowesman
termDocs.close(); termEnum.close(); } return retArray; I do allow for a partial cache, in which case, as you suggest, the searcher uses a FieldSelector to get the external Id from the document which then is stored to cache. Antony -Original Message- From:

Re: how do I get my own TopDocHitCollector?

2008-01-09 Thread Antony Bowesman
minScore = pq.peek().score; } else remaining++; } } HTH Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-09 Thread Antony Bowesman
http://sourceforge.net/forum/message.php?msg_id=3947448 Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Deleting a single TermPosition for a Document

2008-01-08 Thread Antony Bowesman
cument and most are not stored Antony Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Antony Bowesman <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, January 8, 2008 12:47:05 AM Subject: Deleting a single TermPo

Deleting a single TermPosition for a Document

2008-01-07 Thread Antony Bowesman
dexed. Is this something other's have wanted or are there other solutions to this problem? Thanks Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Concurrency between IndexReader and IndexWriter

2007-12-09 Thread Antony Bowesman
Looks like I got myself into a twist for nothing - the reader will see a consistent view, despite what the writer does as long as the reader remains open. Appologies for the noise... Antony - To unsubscribe, e-mail: [EMAIL

Re: Concurrency between IndexReader and IndexWriter

2007-12-09 Thread Antony Bowesman
Using Lucene 2.1 Antony Bowesman wrote: My application batch adds documents to the index using IndexWriter.addDocument. Another thread handles searchers, creating new ones as needed, based on a policy. These searchers open a new IndexReader and there is currently no synchronisation between

Concurrency between IndexReader and IndexWriter

2007-12-09 Thread Antony Bowesman
following logic if (!reader.isDeleted(n)) doc = reader.document(n) can fail with an IllegalArgumentException if the concurrent writer flushes in between the test and read. Thanks Antony - To unsubscribe, e-mail: [EMAIL

Re: deleteDocuments by Term[] for ALL terms

2007-12-04 Thread Antony Bowesman
Thanks Mike, just what I was after. Antony Michael McCandless wrote: You can just create a query with your and'd terms, and then do this: Weight weight = query.weight(indexSearcher); IndexReader reader = indexSearcher.getIndexReader(); Scorer scorer = weight.scorer(reader);

deleteDocuments by Term[] for ALL terms

2007-11-25 Thread Antony Bowesman
o with the searcher TopDocs mechanism and do that also in batches to avoid the risk of a large memory hit. I know there's lots of clever 'expert-mode' stuff under the Lucene API hood, but does anyone know any good way to do this or have

query parser behavior with operator AND

2007-06-20 Thread Antony Sequeira
operator is set to AND Is this a bug. Can some one point me to a bug if it is or help me understand so I can explain this behavior. -Antony Sequeira Tets code output follows: Testing with default operator set to OR (fo AND ba OR "fo ba") -> +:fo +:ba

Re: efficient way to filter out unwanted results

2007-06-15 Thread Antony Bowesman
using the docid. Then just check the external Id of the matched document against the exclusion list. As long as you have your searcher open, the cache will remain valid. Antony Thanks again for your help. Jay Sawan Sharma wrote: Hello Jay, I am not sure up to what level I understood

negative queries

2007-06-14 Thread Antony Sequeira
l negative. Writing the above paragraph I am beginning to realize that although my example shows the problem, it might be a wrong example in terms of me getting a solution to it :) Thanks in advance for any feedback and help. -Antony

Re: "Contains" query parsed to PrefixQuery

2007-06-11 Thread Antony Bowesman
It's a bug in 2.1, fixed by Doron Cohen http://issues.apache.org/jira/browse/LUCENE-813 Antony dontspamterry wrote: Hi all, I was experimenting with queries using wildcard on an untokenized field and noticed that a query with both a starting and trailing wildcard, e.g. *abc*, gets pars

Re: How can I search over all documents NOT in a certain subset?

2007-06-08 Thread Antony Bowesman
evaluating. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: How can I search over all documents NOT in a certain subset?

2007-06-06 Thread Antony Bowesman
n the document ids from the old reader may not represent the same documents in the new reader, so the Filter for the old reader will not be valid for the new search against the new reader and you may get false matches. I don't think there will be a problem if there are no deletion

Re: Does Lucene search over memory too?

2007-05-29 Thread Antony Bowesman
Doron Cohen wrote: Antony Bowesman <[EMAIL PROTECTED]> wrote on 28/05/2007 22:48:41: I read the new IndexWriter Javadoc and I'm unclear about this autocommit. In 2.1, I thought an IndexReader opened in an IndexSearcher does not "see" additions to an index made by an Ind

Re: Does Lucene search over memory too?

2007-05-28 Thread Antony Bowesman
quot; makes me wonder if my assumptions are wrong. Can you clarify what it means by the IndexReader "seeing" changes to the index? Thanks Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: maxDoc and arrays

2007-05-24 Thread Antony Bowesman
need to regenerate you array cache. As Hoss has said, this is pretty much what FieldCache does and it holds the caches keyed by the IndexReader. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: Memory leak (JVM 1.6 only)

2007-05-16 Thread Antony Bowesman
Daniel Noll wrote: On Tuesday 15 May 2007 21:59:31 Narednra Singh Panwar wrote: try using -Xmx option with your Application. and specify maximum/ minimum memory for your Application. It's funny how a lot of people instantly suggest this. What if it isn't possible? There was a situation a wh

Re: Turning PrefixQuery into a TermQuery

2007-04-11 Thread Antony Bowesman
within one document. Use the PerFieldAnalyzerWrapper. http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html It allows different analyzers to be used for different fields. Antony - To

Re: Not able to search on UN_TOKENIZED fields

2007-04-09 Thread Antony Bowesman
the Analyzer for QueryParser. Alternatively, override QueryParser's getFieldQuery() and then choose your Analyzer there based on the field being searched. Antony Ryan O'Hara wrote: Hey Erick, Thanks for the quick response. I need a truly exact match. What I ended up doing w

Re: Range search in numeric fields

2007-04-03 Thread Antony Bowesman
for ints. It converts numbers to a 3 char Unicode representation which is sortable and therefore range searchable. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-02 Thread Antony Bowesman
: digester.addObjectCreate("benchmark/benchmarker", "class", StandardBenchmarker.class); <== Maybe I'm missing something, but isn't the 3rd param to addObjectCreate just a default and the real class is defined by the "class"

Re: Help - FileNotFoundException during IndexWriter.init()

2007-04-01 Thread Antony Bowesman
eekend with no virus checker in the DB directory and haven't managed to reproduce the problem. Thanks for the help Mike. Nothing like an exception never seen before, two days before the product is due to go live, to induce mild panic ;) Antony ---

Re: Help - FileNotFoundException during IndexWriter.init()

2007-03-31 Thread Antony Bowesman
#x27;ll re-run the test a few more times and see if I can re-create the problem. Thanks for the rapid response Mike Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Help - FileNotFoundException during IndexWriter.init()

2007-03-31 Thread Antony Bowesman
popped up at some point, so my suspicions are that it is the cause. I am running the test again, but can any of the gurus give any ideas what can cause this. It did have to happen the day after my deadline :( Antony - To

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Antony Bowesman
to the original object, so I'm using == to locate it. I've not used equals() as I've not yet worked out whether that will cause me any problems with hashing. Antony Peter On 3/29/07, Antony Bowesman <[EMAIL PROTECTED]> wrote: I've got a similar duplicate ca

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Antony Bowesman
hieve 'last wins' as you must presumably remove first from the PQ? Antony Peter Keegan wrote: The duplicate check would just be on the doc ID. I'm using TreeSet to detect duplicates with no noticeable affect on performance. The PQ only has to be checked for a previous value I

  1   2   >