Hi there,
Is Java7 now safe to use with Lucene? If so, is there a minimum Lucene version
I must use with it?
Thanks,
- Chris
Hi
Can anyone tell me what happens to the memory when Lucene opens an index? Is
it loaded into the JVM's heap or is it mapped into virtual memory outside of it?
I am running on Linux and if I use pmap on the PID of my JVM, I can see lots of
entries for index cfs files.
Does this mean that inde
t the O/S cache.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Chris Bamford [mailto:chris.bamf...@talktalk.net]
> Sent: Tuesday, May 15, 2012 4:47 PM
> To: java-user@lucene.apache.org
ory
as well.
John
On 5/15/12 3:38 PM, "Chris Bamford" wrote:
>Thanks Uwe.
>
>What I'd like to understand is the implications of this on a server which
>opens a large number of indexes over a long period. Will this non-heap
>memory continue to grow? Will gc be
or apps that are sensitive (from a user
>experience) from hanging during GC time.
>
>See http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html
>
>Best Regards
>
>Lutz
>
>-Original Message-
>From: Chris Bamford [mailto:chris.bamf...@
This is a progress update on the issue:
I have tried several things and they all gave improvements. In order of
magnitude they are
1) Reduced heap space from 6GB to 3GB.
This on it's own has so far been the biggest win as swapping almost completely
stopped after this step.
2) Began limiting t
Hi
Can anyone explain to me how to use ToParentBlockJoinCollector in Lucene 4.0.0?
I can successfully query with a ToParentBlockJoinQuery, but the results come
back are not grouped by parent doc. I believe that ToParentBlockJoinCollector
is the way to go, but I cannot make it work.
Is there s
Hi Mike,
I have a question about your post "Searching relational content with Lucene's
BlockJoinQuery"
(http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html).
I am
actually trying to use Lucene 4.0.0, so am having to translate your example to
the newer
ToParentBlockJo
Hi Mike,
> Could you please send this to the java-user@lucene.apache.org list?
I thought I did! :-) Here it is again:
I have a question about your post "Searching relational content with Lucene's
BlockJoinQuery"
(http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html).
To: java-user@lucene.apache.org
Sent: Tue, 12 Feb 2013 15:17
Subject: Re: More questions on BlockJoinQuery
On Tue, Feb 12, 2013 at 7:43 AM, Chris Bamford
wrote:
>> Could you please send this to the java-user@lucene.apache.org list?
>
> I thought I did! :-) Here it is again:
issues, see dev-tools/maven/README.maven.
Steve
On Feb 20, 2013, at 10:48 AM, Chris Bamford wrote:
>
> Thanks Mike.
> I have downloaded the source tarball for 4.1.0 and have tried to get it
working, but am having a few problems getting it to fit with my environment
(intelliJ /
svn checkout
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_1_0
Steve
You can get the required files by downloading
On Feb 20, 2013, at 11:45 AM, Chris Bamford wrote:
>
> Thanks Steve, sounds very useful. These are my steps:
>
> tar xzvf ~/Downloads/lucen
was File > Open Project), navigate in the directory navigation dialog that
comes
up to the *directory* containing Lucene and Solr (*not* a project file).
>
> I see on the wiki page this could be clearer - I'll try a reword there.
>
> Steve
>
> On Feb 28, 2013, at 5:
t: Re: Loading lucene_solr_4_1_0 into IntelliJ
Hi Chris,
Those steps sound correct to me.
On Mar 5, 2013, at 9:58 AM, Chris Bamford wrote:
> Thanks for all your help here. I just tried it all again and this time I get
"Cannot Open Project /Users/cbamford/projects/lucene_solr_4_1_0 contains
Hi,
If I index several similar values in a multivalued field (e.g. many authors to
one book), is there any way to know which of these matched during a query?
e.g.
Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda Bootstrap"
If we queried for +(author:Be*) and matched this doc
Hi
I am trying to speed up access to the data in my results Documents and
was wondering if FieldSelector might be the way forward?After my
search, I end up with an ArrayList of Documents, from each of which I
need to extract certain fields and their values as key/value pairs in a
HashMap.
, Chris Bamford wrote:
> Hi
>
> I am trying to speed up access to the data in my results Documents
> and was wondering if FieldSelector might be the way forward?
> After my search, I end up with an ArrayList of Documents, from each
> of which I need to extract certain field
Hi,
I recently discovered that I need to add a single field to every document in an
existing (very large) index. Reindexing from scratch is not an option I want
to consider right now, so I wrote a utility to add the field by rewriting the
index - but this seemed to lose some of the fields (in
issing data
via TermFreqVector but that has always sounded dodgy and lossy to me.
The safest way is to reindex, however painful it might be. Maybe you
could take the opportunity to upgrade lucene at the same time!
--
Ian.
On Fri, Apr 8, 2011 at 3:44 PM, Chris Bamford
wrote:
> Hi,
>
> I recentl
Hi,
I need to load a huge amount of TermPositions in a short space of time
(millions of Documents, sub-second).
Does the IndexReader's API support multiple accesses to allow several
parallel threads to consume a chunk each?
Thanks for any ideas / pointers.
- Chris
Hi,
I have been experimenting with using a int payload as a unique identifier, one
per Document. I have successfully loaded them in using the TermPositions API
with something like:
public static void loadPayloadIntArray(IndexReader reader, Term term, int[]
intArray, int from, int to) thro
Hi there,
Is there something special I should be doing here? This is my sequence:
open writer
add doc #1
add doc #2
get reader from writer
do a search on reader - matches doc #1
delete doc #1 from writer
commit writer
add doc #3
optimise writer
close writer
So by my reckoning, my index should
docs when using NRT?
That should have worked.
There's nothing special about deleting docs when using NRT reader.
Can you boil it down to a test case?
Mike
http://blog.mikemccandless.com
On Fri, May 20, 2011 at 11:30 AM, Chris Bamford
wrote:
> Hi there,
>
> Is there something sp
Hi
I was wondering how to improve search performance over a set of indexes like
this:
27GK1-1/index
19GK1-2/index
24GK1-3/index
15GK1-4/index
19GK1-5/index
31GK2-1/index
16GK2-2/index
8.1G K2-3/index
12GK2-4/index
15GK2-5/index
In total it is
Hi there,
I have my own Collector implementation which I use for searching, something
like this skeleton:
public class LightweightHitCollector extends Collector {
private int maxHits;
private int numHits;
private int docBase;
private boolean collecting;
private Scorer scorer
ell wrote:
> 2011/7/20 Chris Bamford :
>> Hi there,
>>
>> I have my own Collector implementation which I use for searching, something
like this skeleton:
>
> [snip]
>
>> Question: is there a way to prevent collect() being called after it has
collected its qu
Hi
I think I must be doing something wrong, but not sure what.
I have some long running indexing code which sometimes needs to be shutdown in
a hurry. To achieve this, I set a shutdown flag which causes it to break from
the loop and call first abort() and then close(). The problem is that w
Hi,
Does anyone know a way of identifying which entry in a multi-value field
actually matches during a search?
e.g. in this example:
String[] entryList = { "entry one", "entry two", "entry three", "entry four"
};
Document doc = new Document();
for (String entry : Arrays.asList(entryList))
ou
could re-analyze to figure out which field index is your hit. I have some
helper code that I built as part of LUCENE-5317 that might be of use, but I
haven't submitted that updated patch yet.
Stay tuned, though, for the demise of SpanQueries...
Best,
Tim
_
ilter >> " + matchTag + "
matched " + i + " [" + strval + "]");
}
}
}
}
return new DVDocSetId(bitSet);// just wraps a FixedBitSet
}
}
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
nedFilter in 4.x) alongside your
>> BooleanQuery? Seems more logical and I suspect would solve the problem.
>> Caching filters can be good too, depending on how often your data changes.
>> See CachingWrapperFilter.
>>
>> --
>> Ian.
>>
>>
>> On
Additional -
I'm on lucene 4.10.2
If I use a BooleanFilter as per Ian's suggestion I still get a null acceptDocs
being passed to my NDV filter.
Sent from my iPhone
> On 11 Mar 2015, at 17:19, Chris Bamford wrote:
>
> Hi Shai
>
> I thought that might be what acce
Hi Uwe
Thanks for the suggestion, I tried to use a BooleanQuery with clause1 =
termquery and clause2 = ConstantScoreQuery(MyNDVFilter), joined by SHOULD. I
also applied the term filter at the top level (as before). Unfortunately it
doesn't work in that the MyNDVFilter still receives null accep
Hi
I recently heard about an alternative (API?) to Luke for examining indexes.
Can someone please point me to it?
Thanks
Chris
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
axsearch/marple <https://github.com/flaxsearch/marple>.
>> Very much a project in development, but more testers and contributors are
>> always welcome!
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>>> On 10 Nov 2016, at 13:23, Chri
Hello
I am in the process of moving from indexing with 3.6.0 to 4.10.3 (albeit in
3.6.0 compatibility mode). Examination of the resulting indexes with Luke shows
that text fields now contain null markers where stop words have been removed
whereas the previous indexes had nothing:
Indexed phrase
Hello
I have observed that sometimes my index size temporarily increases by a large
amount, presumably while it it merges segments.
Is there some documentation on this subject? I am trying to estimate total disk
space I'll need for a project.
Thanks
Chris
-
he index size.
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>> -Original Message-
>> From: Chris Bamford [mailto:ch...@bammers.net]
>> Sent: Friday, March 3, 2017 7:24 P
Thanks Mike, looking forward to it! Great work folks.
Chris
Sent from my iPhone
> On 15 Mar 2017, at 21:19, Adrien Grand wrote:
>
> Excellent!
>
> Le mer. 15 mars 2017 à 15:46, Michael McCandless
> a écrit :
>
>> Hi all,
>>
>> I just posted a blog post describing the changes coming in our
://blog.mimecast.com/ ]
Chris Bamford
Lead Software Engineer
c: +44 7860 405292
p: +44 207 847 8700
http://www.mimecast.com
Johannesburg Map
GPS: 26' 05.940" S, 18o 28' 04.278" E
(http://maps.google.com/maps/ms?hl=en&ie=UTF8&msa=0&msid=104153695170153523925.000469102c7
Hello Adrien,
>
> There is no way to compute the byte size of a document.
I feared that!
> Also note that the
> relationship between the size of a document and how much space it will use
> in the Lucene index is quite complex.
>
I understand. I was wondering if there was maybe some sneaky way
> IndexWriter.ramBytesUsed() gives you access to the current memory usage of
> IndexWriter's buffers, but it can't tell you by how much it increased for a
> given document assuming concurrent access to the IndexWriter.
>
Thanks, although I can’t find that API. Is there an equivalent call for Lucen
Hi Erick
Yes, size on disk is what I’m after as it will feed into an eventual
calculation regarding actual bytes written (not interested in the source data
document size, just real disk usage).
Thanks
Chris
Sent from my iPhone
> On 4 Jul 2018, at 17:08, Erick Erickson wrote:
>
> But does s
into the index from that document. Not to even
> mention that you could, for instance, choose to index only the title
> and throw everything else away so the size of the raw document on disk
> doesn't seem useful for your case.
>
> Best,
> Erick
>
>> On Wed, Jul 4, 2
Can you combine these two queries somehow so that they behave like a
PhraseQuery?
I have a custom query parser which takes a phrase like "*at sat" and
produces a BooleanQuery consisting of a WildcardQuery ('*at') and a
TermQuery ('sat'). This works, but matches more widely than expected
(by
nstag, 26. August 2008, Chris Bamford wrote:
Can you combine these two queries somehow so that they behave like a
PhraseQuery?
You can use MultiPhraseQuery, see
http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/search/MultiPhraseQuery.html
Regards
D
Hi
I have not been following Lucene developments for a while ... can
someone please tell me if you can now modify indexes - or is it still
have the "delete old" / "add new" model?
Thanks,
- Chris
--
Chris Bamford
Senior Development Engineer
*Scalix*
[EMAIL PROTECTED]
Tel
Hi
Can anyone tell me what the latest stable release is?
http://lucene.apache.org/java/docs/index.html doesn't say.
Thanks,
- Chris
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECT
Thanks Ian.
Is that the convention - the top of the list on
http://lucene.apache.org/java/docs/index.html is always the latest
stable release - or do you know that by some other means?
Cheers,
- Chris
Ian Lea wrote:
2.4.0.
--
Ian.
On Fri, Nov 28, 2008 at 11:16 AM, Chris Bamford
Hi
Can someone guide me please?
I have inherited a Lucene application and am attempting to update the
API from 2.0 to 2.4.
I note that the 2.4 CHANGELOG talks of opening an IndexReader with
read-only=true to improve performance. Does anyone know how to do this?
I have been combing my predeces
ones -
FilterIndexReader, InstantiatedIndexReader, MultiReader, ParallelReader
- all seem too complicated for what I need. My only requirement is to
open it read-only!
Am I missing something?
Mark Miller wrote:
Chris Bamford wrote:
So does that mean if you don't explicitly op
der();
Put another way, how do I associate the static IndexReader with an
IndexSearcher object so I can use getIndexReader() to get it again?
Thanks for your continued help with this :-)
Chris
Mark Miller wrote:
Look for the static factory methods on IndexReader.
- Mark
Chris Bamford wr
Thanks Mark, worked a treat.
Mark Miller wrote:
Chris Bamford wrote:
Mark
> Look for the static factory methods on IndexReader.
I take it you mean IndexReader.open (dir, true) ?
Yeah.
If so, how do I then pass that into DelayCloseIndexSearcher() so that
I can continue to rely on all
Hi
I have a situation where I have two related indexes which are logically
linked by a common field called INDEXID. All other fields differ between
the two indexes. For any given INDEXID I would like to be able to
retrieve the matching pair of documents, one from each index. (Logically
this i
keep them
separate. If they total 100G, that's another story. Some more
details would be helpful.
5> I almost guarantee that if you've merely translated database tables
into Lucene indexes on a one-for-one basis, you won't be very
satisfied with the result
sDoc :collector.topDocs().scoreDocs) {
result.add(contentSearcher.doc(sDoc.doc));
}
And use result.
On Wed, Dec 17, 2008 at 13:36, Chris Bamford wrote:
Hi
In a search I am doing, there may be thousands of hits, of which I only want
the 10 with the highest score. Will the following code do this for m
Hi
In a search I am doing, there may be thousands of hits, of which I only
want the 10 with the highest score. Will the following code do this for
me, or will it simply return the first 10 it finds?
TopDocCollector collector = new TopDocCollector(10);
contentSearcher.search(q, collector);
If
left open and no matter how
high ulimit is set, you will run out? Or is there a policy of recycling that
we are failing to utilise properly?
I am happy to provide more information, just don't know what at this point!
Please ask
Thanks in advance
- Chris
Chris Bamford
S
ptors
This is not normal. As long as you are certain you close every
IndexReader/Searcher that you opened, the number of file descriptors
should stay "contained".
Though: how many files are there in your index directory?
Mike
On Wed, Aug 26, 2009 at 9:18 AM, Chris Bamford wrote:
&g
Hi,
Since moving our app to Java 6 and Tomcat 6, we have started getting occasional
exceptions of the form:
java.io.IOException: Stream closed
at sun.nio.cs.StreamDecoder.ensureOpen(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Un
n to
the underlying Reader you are using? Can you share that little bit of
indexing code?
On Aug 27, 2009, at 10:11 AM, Chris Bamford wrote:
> Hi,
>
> Since moving our app to Java 6 and Tomcat 6, we have started getting
> occasional exceptions of the form:
>
> java.io.IOExce
quot;IOException on doc: " +
doc.toString() +
" - " + ex.toString());
}
Thanks,
- Chris
Chris Bamford
Senior Development Engineer
Scalix
chris.bamf...@scalix.com
Tel: +44 (0)1344 381814
www.scalix.com
- Original Message -
From: Grant Ingersoll
Sent: Sat, 29/8/2009
ders and NOT open/close for every request.
FWIW
Erick
On Thu, Aug 27, 2009 at 9:10 AM, Chris Bamford wrote:
> I'm glad its not normal. That means we can fix it! I will conduct a
> review of IndexReader/Searcher open/close ops.
>
> Thanks!
>
> Chris
>
> - Original Me
null) {
currentSearcher.getIndexReader().setUseCompoundFile(true);
}
However, the setUseCompoundFile() is not available :-(
Thanks again,
- Chris
Chris Bamford
Senior Development Engineer
Scalix
chris.bamf...@scalix.com
Tel: +44 (0)1344 381814
www.scalix.com
- Original Message -
From: Michael McCan
eady using compound file format. If
you look in your index directory and see only *.cfs (plus segments_N
and segments.gen) then you are using compound file format.
Mike
On Tue, Sep 1, 2009 at 8:20 AM, Chris Bamford wrote:
> Hi Mike,
>
> Thanks for the suggestions, very useful. I wo
t;CorruptIndexException on doc: " + doc.toString(),
ex);
Daniel Shane
Chris Bamford wrote:
> Hi Grant,
>
>
>>> I think you code there needs to show the underlying exception, too, so
>>> we can see that stack trace.
>>>
On Sep 2, 2009, at 7:45 AM, Chris Bamford wrote:
> Hi Grant,
>
> I have now followed Daniel's advice and catch the exception with:
>
>try {
>indexWriter.addDocument(doc);
What does your Document/Field creation code look like? In other
words, how do you const
Thanks for your input Mark and Chris. I will take all into account
Chris
- Original Message -
From: Mark Miller
Sent: Tue, 8/9/2009 8:06pm
To: java-user@lucene.apache.org
Subject: Re: New "Stream closed" exception with Java 6
Chris Hostetter wrote:
> : I'm coming to the same conclusio
Hi Hoss,
I have been thinking more about what you said (below) - could you please expand
on the indented part of this sentence:
"it's possibly you just have a simple bug where you are closing the reader
before you pass it to Lucene,
or maybe you are mistakenly adding the same field twi
Mark
It appears you are right - it *IS* something tricky. My code is single
threaded, so there is no contention. I still get intermittent "Stream Close"
exceptions (about 1 in every 800 indexWriter.addDocument() calls) which I
cannot explain. By moving code around / recompiling, I have manag
Hoss,
It turns out that the cause of the exceptions is in fact adding an item twice -
so you were correct right at the start :-) I ran a test where I attempt to
insert the same item twice and guess what ... I get a "Stream closed"
exception on the 2nd attempt.
Understanding this is a great r
Hoss,
> not really ... adding a document multiple times is a perfectly legal use
> case, adding a document with a "Reader" based field where the reader is
> already closed ... that's not legal (And Lucene doesn't really have any
> way of knowing if the Reader is closed because *it* closed it.
>
Understood. Thanks Hoss.
- Chris
- Original Message -
From: Chris Hostetter
Sent: Fri, 18/9/2009 5:58pm
To: java-user@lucene.apache.org
Subject: RE: New "Stream closed" exception with Java 6 - solved
: > not really ... adding a document multiple times is a perfectly legal use
: > ca
Hi,
In an attempt to balance searching efficiency against the number of open file
descriptors on my system, I cache IndexSearchers with a "last used" timestamp.
A background cache manager thread then periodically checks the cache for any
that haven't been used in a while and removes them from
Hi,
I was researching LockObtainFailedExceptions and came across this thread.
I
don't use Solr, just regular Lucene deployed via Tomcat - but I have
started getting these exceptions which coincides with our recent
upgrade from 2.0.0 to 2.4.0.
I have found that just removing the lock file
seems to
ukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Mon, Nov 2, 2009 at 3:27 PM, Chris Bamford wrote:
> Hi,
>
> I was researching LockObtainFailedExceptions and came across this threa
Hi,
Can someone point me in the right direction please?
How can I trap this situation correctly? I receive user queries like
this (quotes included):
/from:"fred flintston*"/
Which produces a query string of
/+from:fred body:flintston/ (where /body/ is the default field)
What I
oleanQuery API
docs).
Use the toString() method on the BooleanQuery after it's created to make
sure you did it correctly.
John G.
-Original Message-----
From: Chris Bamford [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 03, 2008 7:39 AM
To: java-user@lucene.apache.org
Subject:
Hi John,
Just continuing from an earlier question where I asked you how to handle
strings like "from:fred flintston*" (sorry I have lost the original email).
You advised me to write my own BooleanQuery and add to it Prefix- /
Term- / Phrase- Querys as appropriate. I have done so, but am having
you explain why? I would expect the first
test to deliver 2 hits.
I have tried with Lucene 2.0 and 2.3.2 jars and both fail.
Thanks again,
- Chris
Chris Bamford wrote:
Hi John,
Just continuing from an earlier question where I asked you how to
handle strings like "from:fred flin
Hi John,
Please ignore my earlier questions on this subject, as I have got to the
bottom of it.
I was not passing each word in the phrase as a separate Term to the
query; instead I was passing the whole string (doh!).
Thanks.
- Chris
Chris Bamford wrote:
Hi John,
Further to my question
and 2.3 jars.
Please advise!
Thanks,
-Chris
BTW thanks for the tip about Luke
John Griffin wrote:
Chris,
-Original Message-
From: Chris Bamford [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 10, 2008 9:15 AM
To: java-user@lucene.apache.org
Subject: Re: newbie question (for John
hen it sees input of this sort.)
The way that it worked for you - adding terms one at a time, with no quotes and
no spaces - is the correct usage pattern.
Steve
On 07/15/2008 at 8:20 AM, Chris Bamford wrote:
Hi John
Thanks for your continued interest in my travails!
==I'm not sure
Hi.
I am using the SnowballAnalyzer because of it's multi-language stemming
capabilities - and am very happy with that.
There is one small glitch which I'm hoping to overcome - can I get it to
split up internet domain names in the same way that StopAnalyzer does?
i.e. for the sentence "This is
84 matches
Mail list logo