Cloud
* Hadoop integration
Thanks,
Jason Rutherglen, Jack Krupansky, and Ryan Tabora
http://shop.oreilly.com/product/0636920028765.do
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-ma
t. Is that right?
>
> What about the ByteBufferDirectory? Can this specific directory utilize the
> 2GB memory I grant to the app?
>
> On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> If you want the index to be stored
If you want the index to be stored completely in RAM, there is the
ByteBuffer directory [1]. Though I do not see the point in putting an
index in RAM, it will be cached in RAM regardless in the OS system IO
cache.
1.
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/ap
red
> SUM, stats would do it.
>
> Erick
>
> On Thu, Jan 5, 2012 at 7:23 PM, Jason Rutherglen
> wrote:
>>> Short answer is that no, there isn't an aggregate
>>> function. And you shouldn't even try
>>
>> If that is the case why does a 'st
> Short answer is that no, there isn't an aggregate
> function. And you shouldn't even try
If that is the case why does a 'stats' component exist for Solr with
the SUM function built in?
http://wiki.apache.org/solr/StatsComponent
On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson wrote:
> You will
Even though the NumericRangeQuery.new* methods do not support
BigInteger, the underlying recursive algorithm supports any sized
number.
Has this been explored?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For
The docs are slim on examples.
On Wed, Nov 16, 2011 at 3:35 PM, Peter Karich wrote:
>
>>> even high complexity as ES supports lucene-like query nesting via JSON
>> That sounds interesting. Where is it described in the ES docs? Thanks.
>
> "Think of the Query DSL as an AST of queries"
> http://w
> even high complexity as ES supports lucene-like query nesting via JSON
That sounds interesting. Where is it described in the ES docs? Thanks.
On Wed, Nov 16, 2011 at 1:36 PM, Peter Karich wrote:
> Hi,
>
> its not really fair to compare NRT of Solr to ElasticSearch.
> ElasticSearch provides
> deletions made by readers merely mark it for
> deletion, and once a doc has been marked for deletions it is deleted for all
> intents and purposes, right?
There's the point-in-timeness of a reader to consider.
> Does the N in NRT represent only the cost of reopening a searcher?
Aptly put, and
> I don't think we'd do the post-filtering solution, but instead maybe
> resolve the deletes "live" and store them in a transactional data
I think Michael B. aptly described the sequence ID approach for 'live' deletes?
On Mon, Jun 13, 2011 at 3:00 PM, Michael McCandless
wrote:
> Yes, adding dele
Is http://code.google.com/a/apache-extras.org/p/luceneutil/ designed
to replace or augment the contrib benchmark? For example it looks
like SearchPerfTest would be useful for executing queries over a
pre-built index. Though there's no indexing tool in the code tree?
-
I think Solr has a HashDocSet implementation?
On Tue, Apr 5, 2011 at 3:19 AM, Michael McCandless
wrote:
> Can we simply factor out (poach!) those useful-sounding classes from
> Nutch into Lucene?
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman
> w
I'm seeing an error when using the misc Append codec.
java.lang.AssertionError
at
org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107)
at
org.apache.lucene.index.codecs.BlockTermsReader$FieldReader$SegmentTermsEnum._next(BlockTermsReader.java:661)
at
org.apache.luce
ConcurrentMergeScheduler is tied to a specific IndexWriter, however if
we're running in an environment (such as Solr's multiple cores, and
other similar scenarios) then we'd have a CMS per IW. I think this
effectively disables CMS's max thread merge throttling feature?
---
ordered IDs stored in the index, so that
remaining documents (that lets say were left in RAM prior to process
termination) can be indexed. It's an inferred transaction checkpoint.
On Mon, Feb 21, 2011 at 5:31 AM, Michael McCandless
wrote:
> On Sun, Feb 20, 2011 at 8:47 PM, Jason Rutherglen
&
rd. How
would I seek to the last term in the index using VarGaps? Or do I
need to interact directly with the FST class (and if so I'm not sure
what to do there either).
Thanks Mike.
On Sun, Feb 20, 2011 at 2:51 PM, Michael McCandless
wrote:
> On Sat, Feb 19, 2011 at 8:42 AM, Jason Rutherg
that supports ord (eg FixedGap).
>
> Mike
>
> On Fri, Feb 18, 2011 at 9:24 PM, Jason Rutherglen
> wrote:
>> This could be a rhetorical question. The way to find the last/max
>> term that is a unique per document is to use TermsEnum to seek to the
>> first term of a
This could be a rhetorical question. The way to find the last/max
term that is a unique per document is to use TermsEnum to seek to the
first term of a field, then call seek to the docFreq-1 for the last
ord, then get the term, or is there a better/faster way?
> there is a entire RAM resident part and a Iterator API that reads /
> streams data directly from disk.
> look at DocValuesEnum vs, Source
Nice, thanks!
On Thu, Feb 3, 2011 at 12:20 AM, Simon Willnauer
wrote:
> On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen
> wrote:
>>
s branch)
>
> -Yonik
> http://lucidimagination.com
>
>
> On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen > wrote:
>
>> I'm curious if there's a new way (using flex or term states) to store
>> IDs alongside a document and retrieve the IDs of the top N resul
I'm curious if there's a new way (using flex or term states) to store
IDs alongside a document and retrieve the IDs of the top N results?
The goal would be to minimize HD seeks, and not use field caches
(because they consume too much heap space) or the doc stores (which
require two seeks). One pos
Yeah that's customizing the Lucene source. :) I should have gone into
more detail, I will next time.
On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless
wrote:
> Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis
> file, just that it only contains every 128th term.
>
> If you
In a word, no. You'd need to customize the Lucene source to accomplish this.
On Wed, Nov 10, 2010 at 1:02 PM, Burton-West, Tom wrote:
> Hello all,
>
> We have an extremely large number of terms in our indexes. I want to be able
> to extract a sample of the terms, say something like every 128th
egment is given the same name as the first segment that
> shares it. However, unfortunately, because of merging, it's possible
> that this mapping is not easy (maybe not possible, depending on the
> merge policy...) to reconstruct. I think this'll be the hardest part
> :)
>
&
Lets say the segment infos file is missing, and I'm aware of
CheckIndex, however is there a tool to recreate a segment infos file?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
Grant,
I can probably do the 3 billion document one from Prague, or a
realtime search one... I spaced on submitting for ApacheCon.
Are there cool places in the Carolinas to hang?
Cheers bro,
Jason
On Tue, Jun 22, 2010 at 10:51 AM, Grant Ingersoll
wrote:
> Lucene Revolution Call For Particip
This is more of a unix related question than Lucene specific
however because Lucene is being used, I'm asking here as perhaps
other people have run into a similar issue.
On an Amazon EC2 merge, read, and write operations are possibly
blocking due to underlying IO. Is there a tool that you have
use
long - whatever
> happened to CSF? That feature is so 2006, and we still
> don't have it? I'm completely disturbed about the whole situation myself.
>
> Who the heck is in charge here?
>
> On 02/25/2010 12:51 PM, Jason Rutherglen wrote:
>>
>> It'd be great to
Peter,
Perhaps other concurrent operations?
Jason
On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan wrote:
> Using Lucene 2.9.1, I have the following pseudocode which gets repeated at
> regular intervals:
>
> 1. FSDirectory dir = FSDirectory.open(java.io.File);
> 2. dir.setLockFactory(new SingleIn
Answering my own question... PatternReplaceFilter doesn't output
multiple tokens...
Which means messing with capture state...
On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen
wrote:
> Transferred partially to solr-user...
>
> Steven, thanks for the reply!
>
> I wonder if
wrote:
> Hi Jason,
>
> Solr's PatternReplaceFilter(ts, "\\P{Alnum}+$", "", false) should work,
> chained after an appropriate tokenizer.
>
> Steve
>
> On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote:
>> Is there an anal
Is there an analyzer that easily strips non alpha-numeric from the end
of a token?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
246663 /var/index/vol201001/_5q5.cfs (deleted)
>
> On 2010/01/26 10:09 PM, Jamie wrote:
>>
>> HI Jason
>>
>> Thanks a ton. Problem solved. No more stray file handles!
>>
>> Jamie
>>
>> On 2010/01/26 10:03 PM, Jason Rutherglen wrote:
>>>
st switched over to
> using the the writer.getReader() method and was worried if I closed the
> Reader that the Writer would be closed too. Is this misguided?
>
> Jamie
>
>
> On 2010/01/26 09:40 PM, Jason Rutherglen wrote:
>>
>> Jamie,
>>
>> Are you calling c
Jamie,
Are you calling close on the reader?
Jason
On Tue, Jan 26, 2010 at 11:23 AM, Jamie wrote:
> Hi Erick
>
> Our app is a long running server. Is it a problem if indexes are never
> closed? Our searchers
> do see the latest snapshot as we use writer.getReader() method for fast
> searches.
>
E-1879 stuff), then do I need to manually create two indexes, one
> for my static fields and one for my tags? (I would need to be careful
> about how I coordinated these indexes, so I could use a ParallelReader
> with them.) Or is there only one index, and the tag fields are
> updat
Hi Chris,
It's not actively being worked on. Are you interested in working on it?
Jason
On Tue, Jan 19, 2010 at 4:42 PM, Chris Harris wrote:
> I'm interested in the Tag Index patch (LUCENE-1292), in particular
> because of how it enables you to modify certain fields without
> reindexing a whol
m/ -- Solr - Lucene - Nutch
>
>
>
>
> ____
> From: Jason Rutherglen
> To: java-user@lucene.apache.org
> Sent: Wed, January 13, 2010 5:54:38 PM
> Subject: Re: Max Segmentation Size when Optimizing Index
>
> Yes... You could hack LogMergePolicy to do something else.
______
> From: Jason Rutherglen
> To: java-user@lucene.apache.org
> Sent: Wed, January 13, 2010 5:54:38 PM
> Subject: Re: Max Segmentation Size when Optimizing Index
>
> Yes... You could hack LogMergePolicy to do something else.
>
> I use optimise(numse
Chavalittumrong wrote:
> Seems like optimize() only cares about final number of segments rather than
> the size of the segment. Is it so?
>
> On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> There's a different method
is only used during index time and will be ignored
> by by the Optimize() process?
>
>
> On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> Oh ok, you're asking about optimizing... I think that's a different
>&g
olicy.setMaxMergeMB(100)
> will prevent
> merging of two segments that is larger than 100 Mb each at the optimizing
> time?
>
> If so, why do think would I still see segment that is larger than 200 MB?
>
>
>
> On Wed, Jan 13, 2010 at 1:43 PM, Jason Rutherglen <
Hi Trin,
There was recently a discussion about this, the max size is
for the before merge segments, rather than the resultant merged
segment (if that makes sense). It'd be great if we had a merge
policy that limited the resultant merged segment, though that'd
by a rough approximation at best.
Jas
I'm not going to go into too much code level detail, however I'd index
the phrases using tri-gram shingles, and as uni-grams. I think
this'll give you the results you're looking for. You'll be able to
quickly recall the count of a given phrase aka tri-gram such as
"blue_shorts_burough"
On Fri, J
The naming is unclear, when I looked at this I had to thumb
through the code a fair bit before discerning if it was the
input segments or the output segment of a merge (it's the
former). Though I find the current functionality somewhat odd
because it will inherently exceed the given size with a mer
Does CJK support phrase slop? (I'm assuming no)
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
f it already fell on a prior span.
>
> Mike
>
> On Wed, Dec 9, 2009 at 11:25 AM, Jason Rutherglen
> wrote:
>> Right we're getting the spans, however it's just the payloads that are
>> missing, randomly...
>>
>> On Wed, Dec 9, 2009 at 2:23 AM, Michae
if that included sometimes
> missing payloads...
>
> Mike
>
> On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen
> wrote:
>> Howdy,
>>
>> I am wondering if anyone has seen
>> NearSpansUnordered.getPayload() not return payloads that are
>> verifiably ac
Howdy,
I am wondering if anyone has seen
NearSpansUnordered.getPayload() not return payloads that are
verifiably accessible via IR.termPositions? It's a bit confusing
because most of the time they're returned properly.
I suspect the payload logic gets tripped up in
NearSpansUnordered. I'll put to
m I mentioned above (but
>> I haven't looked at the code yet).
>>
>> It's an apache license - but you mentioned something about no third party
>> libraries. Is that a policy for Lucene?
>>
>> Thanks,
>>
>> Tom
>>
>>
>> On Mon, Dec 7
> Thanks,
>
> Tom
>
>
> On Mon, Dec 7, 2009 at 4:44 PM, Jason Rutherglen > wrote:
>
>> I wonder if Google Collections (even though we don't use third party
>> libraries) concurrent map, which supports weak keys, handles the
>> removal of weakly referenc
RB,
That's expected behavior, each .cfs corresponds to all of a
segment's files. You could write your own directory
implementation that underneath writes to a single file. It's
usually good to present what you're trying to accomplish (i.e.
the why).
Jason
On Mon, Dec 7, 2009 at 10:25 PM, Cool Th
I wonder if Google Collections (even though we don't use third party
libraries) concurrent map, which supports weak keys, handles the
removal of weakly referenced keys in a more elegant way than Java's
WeakHashMap?
On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill wrote:
> Hi -
>
> If I understand correct
Siraj,
You could estimate the maximum size used during optimization at 2.5 (a
sort of rough maximum) times your current index size, and not optimize
if your index (at 2.5 times) would exceed your allowable disk space.
Jason
On Mon, Nov 30, 2009 at 2:50 PM, Siraj Haider wrote:
> Index optimizati
I don't mind adding the "positions" of the payloads in them. However,
maybe we can be little more clear in the javadocs what's going on
underneath?
On Wed, Nov 25, 2009 at 5:36 AM, Mark Miller wrote:
> Grant Ingersoll wrote:
>> On Nov 20, 2009, at 6:49 PM, Jason Ru
A sharded architecture (i.e. smaller indexes) used by Google for
example and implemented by open source in the Katta project may be
best for scaling to sizable levels. Katta is also useful for
redundancy and fault tolerance.
On Mon, Nov 23, 2009 at 6:35 PM, fulin tang wrote:
> We are going to ad
Teruhiko,
The index remains consistent even when a background merge fails,
meaning commit truly represents a valid index after it's called.
You can share merge schedulers, though in practice it's not
going to improve anything.
Jason
2009/11/20 Teruhiko Kurosaka :
> I was experimenting how Lucene
I'm interested in getting the payload information from the
matching span, however it's unclear from the javadocs why
NearSpansUnordered is different than NearSpansOrdered in this
regard.
NearSpansUnordered returns payloads in a hash set that's
computed each method call by iterating over the SpanCe
gt; Raise -Xmx, there is a setting in common-build.xml or buidl.xml
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>> -Original Message-
>>> From: Jason R
Is there a setting to fix this?
[junit] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
[junit] at java.util.Arrays.copyOf(Arrays.java:2882)
[junit] at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
[junit] at
java.lang
If there's a bug you're seeing, it's helpful to open an issue and post
code reproducing it.
On Wed, Nov 11, 2009 at 3:41 AM, Albert Juhe wrote:
>
> I think that this is the best way to proceed.
>
> thank you Mike
>
>
>
> Michael McCandless-2 wrote:
>>
>> Can you narrow the leak down to a small se
Hi Cedric,
There is a wiki page on NRT at:
http://wiki.apache.org/lucene-java/NearRealtimeSearch
Feel free tp ask questions if there's not enough information.
-J
On Mon, Oct 12, 2009 at 2:24 AM, melix wrote:
>
> Hi,
>
> I'm going to replace an old reader/writer synchronization mechanism we had
ust plain
> disappointing.*
>
> Thanks Jake for the clarification, and Eric, let me know if you to
> know more in detail with how we are dealing with realtime indexing/search
> with Zoie here at linkedin in a production environment powering a real
> internet company with real
variety of configurations. The best way to go about
>> this is to post benchmarks that others may run in their
>> environment which can then be tweaked for their unique edge
>> cases. I wish I had more time to work on it.
>>
>> -J
>>
>> On Thu, Oct 8, 2009
on it.
-J
On Thu, Oct 8, 2009 at 8:18 PM, Jake Mannix wrote:
> Jason,
>
> On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen > wrote:
>
>> Today near realtime search (with or without SSDs) comes at a
>> price, that is reduced indexing speed due to continued in RAM
>&g
Eric,
Katta doesn't require HDFS which would be slow to search on,
though Katta can be used to copy indexes out of HDFS onto local
servers. The best bet is hardware that uses SSDs because merges
and update latency will greatly decrease and there won't be a
synchronous IO issue as there is with har
Out of curiousity and perhaps for practical purposes, how does one
handle mixed language documents? I suppose one could extract the
words of a particular language and place it in a lang specific field?
Are there libraries to perform this (yet)?
On Thu, Oct 8, 2009 at 6:32 AM, Christian Reuschling
We have a way to merges indexes together with IW.addIndexes,
however not the opposite, split up an index with multiple
segments. I think I can simply manufacture a new segmentinfos in
a new directory, copy over the segments files from those
segments, delete the copied segments from the source, and
Maarten,
Depending on the hardware available you can use a Hadoop cluster
to reindex more quickly. With Amazon EC2 one can spin up several
nodes, reindex, then tear them down when they're no longer
needed. Also you can simply update in place the existing
documents in the index, though you'd need t
Chris,
It sounds like you're on the right track. Have you looked at
Solr which uses the rsync/Java replication method you mentioned?
Replication and near realtime in Solr aren't quite there yet,
however it wouldn't be too hard to add it.
-J
On Tue, Oct 6, 2009 at 3:57 PM, Chris Were wrote:
> Hi
I'm not sure I understand the question. You're trying to reopen
the segments that you're replicated and you're wondering what's
changed in Lucene?
On Mon, Oct 5, 2009 at 5:30 PM, Nigel wrote:
> Anyone have any ideas here? I imagine a lot of other people will have a
> similar question when trying
It depends on whether or not the commit completes before the
reopen. Lucene 2.9 adds an IndexWriter.getReader method that
will always return with the latest modifications to your index.
So if you're adding many documents, you can at anytime, call
IW.getReader and you will be able to search the cha
he fdx file
> size is 3748 (= 4 + 468*8), yet the file size is far larger than that
> (298404).
>
> How repeatable is it? Can you turn on infoStream, get the exception
> to happen, then post the resulting output?
>
> Mike
>
> On Thu, Sep 10, 2009 at 7:19 PM, Jason Ruther
I'm seeing a strange exception when indexing using the latest Solr rev on EC2.
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.SolrServerException:
java.lang.RuntimeException: after flush: fdx size mismatch: 468 docs
vs 298404 length in bytes of _0.fdx
at
or
I think CSF hasn't been implemented because it's only marginally
useful yet requires fairly significant rewrites of core code
(i.e. SegmentMerger) so no one's picked it up including myself.
An interim solution that fulfills the same function (quickly
loading field cache values) using what works rel
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Jason Rutherglen wrote:
>> While indexing with the latest nightly build of Solr on Amazon EC2 the
>> following JVM bug has occurred twice on two different servers.
>>
>> Post the log to a Jira issue?
>>
While indexing with the latest nightly build of Solr on Amazon EC2 the
following JVM bug has occurred twice on two different servers.
Post the log to a Jira issue?
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed
Daniel,
You may want to look at SOLR-1375 which enables ID checking
using a BloomFilter (with a specified errorrate of false
positives). Otherwise for what you're trying to do, you'd need
to create a hash map?
-J
On Thu, Aug 13, 2009 at 7:33 AM, Daniel Shane wrote:
> Hi all!
>
> I'm currently ru
even hits.
>
> Is there no way to limit the sorting to only the documents that were found
> in the query?
>
> Thanks
>
>
>
> Jason Rutherglen-2 wrote:
>>
>> Take a look at contrib/spatial.
>>
>> On Fri, Aug 21, 2009 at 7:00 AM, javaguy44 wrot
Take a look at contrib/spatial.
On Fri, Aug 21, 2009 at 7:00 AM, javaguy44 wrote:
>
> Hi,
>
> I'm currently looking at sorting in lucene, and to get started I took a look
> at the distance sorting example from the Lucene in Action book.
>
> Working through the test DistanceSortingTest, I've notice
Micah,
If you can post some of your code, it may be easier to identify the
problem you're experiencing.
-J
On Tue, Aug 18, 2009 at 9:55 AM, Micah Jaffe wrote:
> Hi, thanks for the response! The (custom) searchers that are falling out of
> cache are indeed calling close on their IndexReader in f
In trying to calculate the cost of various slop settings for phrase
queries, what's the time complexity? O(n) or O(n^2)?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user
http://arstechnica.com/hardware/news/2009/07/intels-new-34nm-ssds-cut-prices-by-60-percent-boost-speed.ars
For me the price on the 80GB is now within reason for a $1300
SuperMicro quad-core 12GB RAM type of server.
-
To unsubscri
be honest, I do not know is anyone today runs high volume search from disk
> (maybe SSD), even than, significant portion has to be in RAM...
>
> One day we could throw many CPUs at Query... but this is not an easy one...
>
>
>
>
>
> - Original Message
>> F
Do we think that we'll be able to support indexing stop words
using PFOR (with relaxation on the compression to gain
performance?) Today it seems like the best approach to indexing
stop words is to use shingles? However this blows up the term
dict because shingles concatenates phrases together.
On
Just wondering if it works and if it's a good fit for autosuggest?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Ah ok, I was thinking we'd wait for the new flex indexing patch.
I had started working along these lines before and will take it
on as a project (which is I believe reducing the memory
consumption of the term dictionary).
I plan to segue it into the tag index at some point.
On Tue, Jul 7, 2009 at
This requires tracking the genealogy of docids as they are merged inside
IndexWriter. It's doable, so if you're particularly interested feel free to
open a jira issue.
On Sun, Jun 28, 2009 at 2:21 AM, Shay Banon wrote:
>
> Hi,
>
> I have a case where deleting documents by doc id make sense (I
On the topic of RAM consumption, it seems like field caches
could return estimated RAM usage (given they're arrays of
standard Java types)? There's methods of calculating per
platform (I believe relatively accurately).
On Fri, Jun 19, 2009 at 12:11 PM, Michael McCandless <
luc...@mikemccandless.co
> As I understand it, the user won't see any changes to the
index until a new Searcher is created.
Correct.
> How much memory will caching the searcher cost? Are there
other tradeoff's I need to consider?
If you're updating the index frequently (every N seconds) and
the searcher/reader is closed
d
>terms, and is slurped into the arrays on init.
>
> This is a sizable RAM savings over what's done now because you save 2
> objects, 3 pointers, 2 longs, 2 ints (I think), per indexed term.
>
> Mike
>
> On Wed, Jun 10, 2009 at 2:02 PM, Jason
> Rutherglen wrote:
&
> LUCENE-1458 (flexible indexing) has these improvements,
Mike, can you explain how it's different? I looked through the code once
but yeah, it's in with a lot of other changes.
On Wed, Jun 10, 2009 at 5:40 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> This (very large number of
On the topic of user groups, is there a Bay Area Lucene users group?
Hi Dan,
You are looking to throttle the merging? I'd recommend setting
ConcurrentMergeScheduler.setMaxThreadCount(1). This way IW.addDocument
doesn't wait while a merge occurs (like SerialMergeScheduler) however it
should not use as much CPU as only one merge will occur at a time.
In regards to
Hi Shay,
I think IndexWriter.getReader from LUCENE-1516 in trunk is what
you're talking about? It pools readers internally so there's no
need to call IndexReader.reopen, one simply calls IW.getReader
to get new readers containing recent updates.
-J
BTW I replied to the message on java-u...@lucen
John,
We looked at implementing delete by doc id for LUCENE-1516, however it
seemed to be something that if enough people wanted we could implement it at
as a later patch.
The implementation involves maintaining a genealogy of SegmentReaders within
IndexWriter so that deletes to a reader that has
e segments with enough deletes need to merged
away in 1-2 hours. Meaning optimizing may not be best as it requires later
large merges. Also an interleaving system that does not perform merges if a
flush is occurring could useful for minimizing disk trash.
On Wed, Mar 25, 2009 at 3:39 PM, J
LuceneError when executed should reproduce the failure. The
contrib/benchmark libraries are required. MultiThreadDocAdd is a
multithreaded indexing utility class.
On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> Each document is being created in
It looks like you are reusing a Field (the f.setValue(...) calls); are
> you sure you're not changing a Document/Field while another thread is
> adding it to the index?
>
> If you can post the full code, then I can try to run it on my
> wikipedia dump locally.
>
> Mi
12:25 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> H.
>
> Jason is this easily/compactly repeated? EG, try to index the N docs
> before that one.
>
> If you remove the SinglePayloadTokenStream field, does the exception
> still happen?
>
> Mike
1 - 100 of 150 matches
Mail list logo