Hi Mike,
Any updates?
Regards,
Antony
On Wed, 11 May 2022 at 01:02, Antony Joseph
wrote:
> Hello Mike,
>
> 1. As requested, the full checkindex log is attached.
>
> 2. We haven't made any changes to the IndexDeletionPolicy - so the
> assumption is the default policy i
as running
fine on the same system.
Thanks for your assistance.
Regards,
Antony
On Thu, 5 May 2022 at 20:06, Michael McCandless
wrote:
> Antony, do you maybe have Microsoft Defender turned on, which might
> quarantine files that it suspects are malicious? I'm not sure if it is on
>
Hi Michael,
Any update?
Regards,
Antony
On Sun, 1 May 2022 at 19:35, Antony Joseph
wrote:
> Hi Michael,
>
> Thank you for your reply. Please find responses to your questions below.
>
> Regards,
> Antony
>
> On Sat, 30 Apr 2022 at 18:59, Michael McCandless <
> lu
Hi Michael,
Thank you for your reply. Please find responses to your questions below.
Regards,
Antony
On Sat, 30 Apr 2022 at 18:59, Michael McCandless
wrote:
> Hi Antony,
>
> Hmm it looks like the root cause is this:
>
> Caused by: java.nio.file.NoSuchFileException: D:\i\
fos.readCommit(SegmentInfos.java:288)
... 2 more
Regards,
Antony
On Sat, 30 Apr 2022 at 10:59, Robert Muir wrote:
> The most helpful thing would be the full stacktrace of the exception.
> This exception should be chaining the original exception and call
> site, and maybe tell us more about
gards,
Antony
On Thu, 28 Apr 2022 at 17:00, Adrien Grand wrote:
> Hi Anthony,
>
> This isn't something that you should try to fix programmatically,
> corruptions indicate that something is wrong with the environment,
> like a broken disk or corrupt RAM. I would suggest running
roblem -
the application logic is the same.
Also, while the application runs on both Linux and Windows, so far we have
observed this situation only on various Windows platforms.
Would really appreciate some assistance. Thanks in advance.
Regards,
Antony
Hi all,
Using: python 2.7.14, pylucene 4.10.0
Index:
xdate = long("20190101183030")
doc.add(LongField('xdate', xdate, Field.Store.YES)) # stored and not
analyzed
Query:
query = NumericRangeQuery.newLongRange("xdate", long("2019010100"),
long("20190101115959"), True, True)
I am getting the
(jnius\jnius.c:17342)
File "jnius\jnius_env.pxi", line 11, in jnius.get_jnienv
(jnius\jnius.c:3162)
File "jnius\jnius_jvm_desktop.pxi", line 55, in
jnius.get_platform_jnienv (jnius\jnius.c:3093)
File "jnius\jnius_jvm_desk
search query is executed, when does the memory
used by the result get free again? Is it after an idle period or when
the JVM hits memory usage limits or what?
4. Could this be caused due to a memory leak in our code? Any "common
mistakes" that we could chec
directory and pass the file
into pool of worker threads using a queue
all of the which share same index writer, How ever there is no any
significant changes in indexing speed
Any hints I am doing wrong or any suggestion
Thanks
Antony
May be that assumption is wrong.
I also haven't understood how search scales :(
-Antony
On Sun, Oct 23, 2011 at 10:18 AM, Erick Erickson wrote:
> "Why would it matter...top 5 matches" Because Lucene has to calculate
> the score of all documents in order to insure that it r
Hi all ,
Finally i resolved my problem msvcp71.dll was missing.
Thanks,
On 25 May 2011 12:27, Antony Joseph wrote:
> Hi,
>
> Please help me to resolve this imort error.
>
> Thanks
> Antony
>
> C:\Documents and Settings\Antony>java -version
>
> java version
Hi,
Please help me to resolve this imort error.
Thanks
Antony
C:\Documents and Settings\Antony>java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)
C:\Documents and Settings\A
Thanks Uwe, I assumed as much.
On 18/04/2011 7:28 PM, Uwe Schindler wrote:
Document d = reader.document(doc)
This is the correct way to do it.
Uwe
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additi
and how the APIs should be used?
Thanks
Antony
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
SortField containing a comparator?
Thanks
Antony
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
I have a test case written for 2.3.2 that tested an index time boost on a field
of 0.0F and then did a search using Hits and got 0 results.
I'm now in the process of upgrading to 2.9.4 and am removing all use of Hits in
my test cases and using a Collector instead. Now the test case fails as it
I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4).
Many of our indexes are 5M+ Documents, however, only a small subset of these are
relevant to any user. As a DocIdSet, backed by a BitSet or OpenBitSet, is
rather inefficient in terms of memory use, what is the recommended
gested path for migrating TopFieldDocCollector usage?
Antony
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
f data tolerance when creating these
caches? At least now the only solution is to delete that Document. Perhaps the
values could then be returned as 0 in the Parser implementations for numeric
failures.
Antony
-
To
Hi Mike,
Thanks for the response.
I looked at that issue, but my case is trivial to fix. I just keep the Set of
terms I have deleted and ignore those during my second interation.
Thanks
Antony
Michael McCandless wrote:
This is known & expected.
Lucene does not update the t
returns > 0 for those terms even though the docs are
deleted.
Should this be the case? I have tried closing the reader between enumerations,
but no difference.
Antony
-
To unsubscribe, e-mail: java-user-unsub
roach for now and will try to get
some performance data, so thanks for your comments Mike.
Antony
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
icult to support something like this in the
IndexWriter API and if not, would it end up being more efficient that using a
reader/terms to check this?
Antony
-
To unsubscribe, e-mail: java-user-unsubscr...@lu
Just wondered which was more efficient under the hood
for (int i = 0; i < size; i++)
terms[i] = new Term("id", doc_key[i]);
This
writer.deleteDocuments(terms);
for (int i = 0; i < size; i++)
writer.addDocument(doc[i]);
Or this
for (int i = 0; i < size; i++)
writer.updateDoc
term id:XXX? Given that opening a reader is expensive, is there
any way to do this efficiently?
I guess what I want is
IndexWriter.addDocumentIfMissing(Term term, Document doc, Analyzer analyzer)
Thanks
Antony
-
To unsubscri
Hi,
In a long running process Lucene get crashed in my application, Is there any
way to diagnose or how can I turn on debug logging / trace logging for
Lucene?
Thanks
Antony
--
DigitalGlue, India
-
To
hi
--
DigitalGlue, India
well as
index it, then you can get the original back from the Document.
Antony
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Thanks Mike, I'm still on 2.3.1, so will upgrade soon.
Antony
Michael McCandless wrote:
This was an attempt on addIndexesNoOptimize's part to "respect" the
maxMergeDocs (which prevents large segments from being merged) you had
set on IndexWriter.
However, the check was t
The javadocs state
"This requires ... and the upper bound* of those segment doc counts not exceed
maxMergeDocs."
Can one of the gurus please explain what that means and what needs to be done to
find out whether an index being merged fits that criteria.
Tha
case for
delete-by-docId is to perform a dBQ and so far, we have been using your
suggestion from last year about how to do delete documents for ALL terms.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
better Javadocs, so it's unclear
which is the 'right' one to use.
Any pointers?
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
use TermEnum + TermDocs to walk the tags / docs and see what tag the hit comes
from. This would be different to walking the Hits/Documents to fetch the tag
from the Document. Not sure if this is very efficient though, depends on the
Document count.
Antony
Is it possible to write a document with different analyzers in different fields?
PerFieldAnalyzerWrapper
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
which I should use -
there seem to be comments in the dev list to avoid MultiSearcher...
Any thoughts or have I spiralled too far into Lucene's depths to see where I
am...?
Antony
-
To unsubscribe, e-mail: [EMAIL P
Thanks Karsten,
I decided first to delete all duplicates from master(iW) and then to insert
all temporary indices(other).
I reached the same conclusion. As your code shows, it's a simple enough
solution. You had a good point with the iW.abort() in the rollback case.
A
e Document from the reader.
Any views?
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
GE_DOCS is deprecated.
Thanks
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
- It's marked as 3.0, but
there was some hope for a 2.4 release. Are there any estimates for when this
might get to a release - this is an exciting development for me.
Thanks
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED
Document.
What would fit my usage would be something like
byte[] b = doc.getPayload("owner", ownerId);
where for the given OID, I can retrieve the payload I associated with it, when
I did
doc.add(new Field("owner", ownerId, accessPayload);
but that's no
resort the scoreDocs by docId order and then loop with
termPositions.skipTo(scoreDoc.doc). The number of hits will be typically small
so it'll be fast enough.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additio
maybe I
misunderstood your use case.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
single index.
We also support sharding across multiple index files for performance/scaling
considerations, via a hash of the ownerId, but in practice have not needed it.
Much will depend on your search usage.
YMMV
Antony
-
To
implications of this method. I will be using caches, but my
volumes are potentially so large that I may never be able to cache everything
(perhaps 500M Docs), so this has to be very quick.
I'll play with both approaches and see which works best.
Thanks for you time an
, Index.NO_NORMS);
doc.add(f);
}
then will the array elements for the corresponding Field arrays returned by
Document.getFields("ownerId")
Document.getFields("accessId")
**guarantee** that the array element order is the same as the order they were
added?
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
nd used in invertField()?
I'd rather stick with core Lucene than start making proprietary changes, but it
seems I can't quite get to where I want to be without some quite cludgy code for
a very simple use case :(
Antony
Doron Cohen wrote:
IIRC first versions of patches that added paylo
ex.UNTOKENIZED);
f.setPayload("B1");
doc.add(f);
and avoid the whole unnecessary Tokenizer/Analyzer overhead and give support for
payloads in untokenized fields.
It looks like it would be trivial to implement in DocumentsWriter.invertField().
Or would this corrupt the Fieldabl
ce the score. If it is part of the
query, the complete document set for other users will influence the hits for
this user.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I
guess they ultimately equate to the same thing - i.e. using a stored field to
hold the document's "payload", but it would be an extra field to load.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Documents, but is it possible to update a payload for an
existing Document?
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
mb , after 3 hours the python consumption shows 140mb . *The *performance of
Indexing become poor and memory leaks.
please help me to solve the problem.
Thanks
Antony
--
Antony Joseph A
DigitalGlue
[EMAIL PROTECTED]
T: +91 22 30601091
would need recreation (I'm assuming the
optimization would muck up the Ids if only the parallel index was optimized).
You'd also need to get the new doc Id for each doc that is added. Are docIds
allocated during addDocument or during the c
27; could not
allow the original docId to be re-used, thus keeping the two parallel indexes in
sync without requiring a rebuild.
If this could be overcome, this would make this parallel index pattern so much
more useful for large volume data sets.
n, apparently CP1251, but there's
a lovely line in the RTFReader class
/* TODO: per-font font encodings ( \fcharset control word ) ? */
Does anyone know if the RTF above is correct - the only place the translation
table is set during the parse is when the 'ansi'
An alternative to Lucene's NumberTools, is Solr's NumberUtils, which is more
space efficient for indexing numbers, but not as pretty to look at
http://lucene.apache.org/solr/api/org/apache/solr/util/NumberUtils.html
Dan Hardiker wrote:
> Hi,
>
> I've got an application which stores ratings fo
arsing framework and am using it in our product and
have tested all of the above and the priority for Word parsing is TextMining
v0.4, before POI and then the other two which I plugged in via the parse-ext
parser.
HTH
Antony
Lukas Vlcek wrote:
> Hi,
>
> I need to find a reliable w
ed a
1:1 model of Solaris threads to LWPs. That new library had dramatic performance
improvements over the old.
Some background info for Java and threading
http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
Antony
Glen Newton wrote:
I realised that not everyone on this lis
mer: all of this is purely brainstorming, i've never actually tried
anything like this, it may be more trouble then it's worth.
:) Thanks for the sounding board - it's always useful to get new ideas!
Antony
-
T
multiple field, and using
stored fields can 'modify' that Document. However, what happens to the DocId
when the delete+add occurs and how do I ensure it stays the same.
I'm on 2.3.1. I seem to recall a discussion on this in another thread, but
cannot find it.
Antony
Chris
he
RO data can easily be re-created, which means I can't just create the filter as
part of the base search.
Regards
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
eadache of scaling just
Lucene, which is a simple beast, than the whole bundle of 'stuff' that comes
with the database as well.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
fer some support , but couldn't
find much documentation about it.
Apache James' MIME4J is one parser and Javamail also can parse mail. I found
Javamail more intuitive, but have not tested either against a large mail set for
reliability and per
We're about to embark on a 25-40M documents (email data) per annum, no deletes
over 10 years. Planning for index distribution, but haven't decided on the
partitioning yet.
Antony
-
To unsubscribe, e-mail: [EMAIL
vivek sar wrote:
I've a field as NO_NORM, does it has to be untokenized to be able to
sort on it?
NO_NORMS is the same as UNTOKENIZED + omitNorms, so you can sort on that.
Antony
-
To unsubscribe, e-mail: [EMAIL PROT
even slower than 2.3 with 2.1 index. It catches up in the longer result set.
Any ideas why that might be. The shared searcher multiple threads is probably
quite a common use case.
Antony
-
To unsubscribe, e-mail: [EMAIL
Hi,
I just noticed that although the Javadocs for Lucene 2.2 state that the dates
for DateTools use UTC as a timezone, they are actually using GMT.
Should either the Javadocs be corrected or the code corrected to use UTC
instead.
Antony
vivek sar wrote:
I need to be able to sort on optime as well, thus need to store it .
Lucene's default sorting does not need the field to be stored, only indexed as
untokenized.
Antony
-
To unsubscribe, e-mail: [
he original and indexed as lower case into multiple tokens, you will
get the RuntimeException from FieldCache.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
termDocs.close();
termEnum.close();
}
return retArray;
I do allow for a partial cache, in which case, as you suggest, the searcher uses
a FieldSelector to get the external Id from the document which then is stored to
cache.
Antony
-Original Message-
From:
minScore = pq.peek().score;
}
else
remaining++;
}
}
HTH
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://sourceforge.net/forum/message.php?msg_id=3947448
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
cument and most are not stored
Antony
Otis
-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message From: Antony Bowesman <[EMAIL PROTECTED]> To:
java-user@lucene.apache.org Sent: Tuesday, January 8, 2008 12:47:05 AM
Subject: Deleting a single TermPo
dexed.
Is this something other's have wanted or are there other solutions to this
problem?
Thanks
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Looks like I got myself into a twist for nothing - the reader will see a
consistent view, despite what the writer does as long as the reader remains open.
Appologies for the noise...
Antony
-
To unsubscribe, e-mail: [EMAIL
Using Lucene 2.1
Antony Bowesman wrote:
My application batch adds documents to the index using
IndexWriter.addDocument. Another thread handles searchers, creating new
ones as needed, based on a policy. These searchers open a new
IndexReader and there is currently no synchronisation between
following logic
if (!reader.isDeleted(n))
doc = reader.document(n)
can fail with an IllegalArgumentException if the concurrent writer flushes in
between the test and read.
Thanks
Antony
-
To unsubscribe, e-mail: [EMAIL
Thanks Mike, just what I was after.
Antony
Michael McCandless wrote:
You can just create a query with your and'd terms, and then do this:
Weight weight = query.weight(indexSearcher);
IndexReader reader = indexSearcher.getIndexReader();
Scorer scorer = weight.scorer(reader);
o with the searcher
TopDocs mechanism and do that also in batches to avoid the risk of a large
memory hit.
I know there's lots of clever 'expert-mode' stuff under the Lucene API hood, but
does anyone know any good way to do this or have
operator is set to AND
Is this a bug. Can some one point me to a bug if it is or help me
understand so I can explain this behavior.
-Antony Sequeira
Tets code output follows:
Testing with default operator set to OR
(fo AND ba OR "fo ba") -> +:fo +:ba
using the docid. Then just check the external Id of the
matched document against the exclusion list.
As long as you have your searcher open, the cache will remain valid.
Antony
Thanks again for your help.
Jay
Sawan Sharma wrote:
Hello Jay,
I am not sure up to what level I understood
l negative.
Writing the above paragraph I am beginning to realize that although my
example shows the problem, it might be a wrong example in terms of me
getting a solution to it :)
Thanks in advance for any feedback and help.
-Antony
It's a bug in 2.1, fixed by Doron Cohen
http://issues.apache.org/jira/browse/LUCENE-813
Antony
dontspamterry wrote:
Hi all,
I was experimenting with queries using wildcard on an untokenized field and
noticed that a query with both a starting and trailing wildcard, e.g. *abc*,
gets pars
evaluating.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
n the document ids from the old reader may
not represent the same documents in the new reader, so the Filter for the old
reader will not be valid for the new search against the new reader and you may
get false matches.
I don't think there will be a problem if there are no deletion
Doron Cohen wrote:
Antony Bowesman <[EMAIL PROTECTED]> wrote on 28/05/2007 22:48:41:
I read the new IndexWriter Javadoc and I'm unclear about this
autocommit. In
2.1, I thought an IndexReader opened in an IndexSearcher does not "see"
additions to an index made by an Ind
quot;
makes me wonder if my assumptions are wrong. Can you clarify what it means by
the IndexReader "seeing" changes to the index?
Thanks
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
need to
regenerate you array cache.
As Hoss has said, this is pretty much what FieldCache does and it holds the
caches keyed by the IndexReader.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e
Daniel Noll wrote:
On Tuesday 15 May 2007 21:59:31 Narednra Singh Panwar wrote:
try using -Xmx option with your Application. and specify maximum/ minimum
memory for your Application.
It's funny how a lot of people instantly suggest this. What if it isn't
possible? There was a situation a wh
within
one document.
Use the PerFieldAnalyzerWrapper.
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html
It allows different analyzers to be used for different fields.
Antony
-
To
the Analyzer for
QueryParser. Alternatively, override QueryParser's getFieldQuery() and then
choose your Analyzer there based on the field being searched.
Antony
Ryan O'Hara wrote:
Hey Erick,
Thanks for the quick response. I need a truly exact match. What I
ended up doing w
for ints. It converts numbers to a
3 char Unicode representation which is sortable and therefore range searchable.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
:
digester.addObjectCreate("benchmark/benchmarker", "class",
StandardBenchmarker.class); <==
Maybe I'm missing something, but isn't the 3rd param to addObjectCreate just a
default and the real class is defined by the "class"
eekend with no virus checker in the DB
directory and haven't managed to reproduce the problem.
Thanks for the help Mike. Nothing like an exception never seen before, two days
before the product is due to go live, to induce mild panic ;)
Antony
---
#x27;ll
re-run the test a few more times and see if I can re-create the problem.
Thanks for the rapid response Mike
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
popped up at some point, so my suspicions are
that it is the cause. I am running the test again, but can any of the gurus
give any ideas what can cause this.
It did have to happen the day after my deadline :(
Antony
-
To
to the original object, so I'm using == to locate it. I've
not used equals() as I've not yet worked out whether that will cause me any
problems with hashing.
Antony
Peter
On 3/29/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:
I've got a similar duplicate ca
hieve 'last wins' as you must presumably remove first from
the PQ?
Antony
Peter Keegan wrote:
The duplicate check would just be on the doc ID. I'm using TreeSet to
detect
duplicates with no noticeable affect on performance. The PQ only has to be
checked for a previous value I
1 - 100 of 175 matches
Mail list logo