Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-11 Thread Vitaly Funstein
ingly impossible... but this is a separate issue. On Wed, Sep 10, 2014 at 6:35 PM, Robert Muir wrote: > Yes, there is also a safety check, but IMO it should be removed. > > See the patch on the issue, the test passes now. > > On Wed, Sep 10, 2014 at 9:31 PM, Vitaly Funstein > wro

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein
i think you should be able to do this, we just have to > add the hasDeletions check to #2 > > On Wed, Sep 10, 2014 at 7:46 PM, Vitaly Funstein > wrote: > > One other observation - if instead of a reader opened at a later commit > > point (T1), I pass in an NRT reader *w

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein
new segment files, as well... unfortunately, our system can't make either assumption. On Wed, Sep 10, 2014 at 4:30 PM, Vitaly Funstein wrote: > Normally, reopens only go forwards in time, so if you could ensure >> that when you reopen one reader to another, the 2nd one is always &g

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein
> > Normally, reopens only go forwards in time, so if you could ensure > that when you reopen one reader to another, the 2nd one is always > "newer", then I think you should never hit this issue Mike, I'm not sure if I fully understand your suggestion. In a nutshell, the use here case is as follo

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread Vitaly Funstein
er versions. > > > > But that being said, I think the bug is real: if you try to reopen > > from a newer NRT reader down to an older (commit point) reader then > > you can hit this. > > > > Can you open an issue and maybe post a test case showing it? Thanks. > &

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-08 Thread Vitaly Funstein
ince merge timings aren't deterministic. On Mon, Sep 8, 2014 at 11:45 AM, Vitaly Funstein wrote: > UPDATE: > > After making the changes we discussed to enable sharing of SegmentReaders > between the NRT reader and a commit point reader, specifically calling > through to DirectoryReader.o

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-08 Thread Vitaly Funstein
t; On Thu, Aug 28, 2014 at 5:38 PM, Vitaly Funstein > wrote: > > On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> > >> The segments_N file can be different, that's fine: after that, we then &g

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein wrote: > > Looks like this is used inside Lucene41PostingsFormat, which simply passes > in those defaults - so you are effectively saying the minimum (and > therefore, maximum) block size can be raised to reuse the size of the terms &g

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > The segments_N file can be different, that's fine: after that, we then > re-use SegmentReaders when they are in common between the two commit > points. Each segments_N file refers to many segments... > > Y

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
block sizes uses by the terms index (see > BlockTreeTermsWriter). Larger blocks = smaller terms index (FST) but > possibly slower searches, especially MultiTermQueries ... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Aug 28, 2014 at 2:50 PM, Vitaly Fun

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
there > are 88 fields totaling ~46 MB so ~0.5 MB per indexed field ... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Aug 28, 2014 at 1:56 PM, Vitaly Funstein > wrote: > > Here's the link: > > > https://drive.google.com/file/d/0B5e

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein
> N commit points that you have readers open for, they will be sharing > SegmentReaders for segments they have in common. > > How many unique fields are you adding? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Aug 27, 2014 at 7:41 PM, Vitaly Fun

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-27 Thread Vitaly Funstein
que fields? > > Can you post screen shots of the heap usage? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Aug 26, 2014 at 3:53 PM, Vitaly Funstein > wrote: > > This is a follow up to the earlier thread I started to understand memory > &g

BlockTreeTermsReader consumes crazy amount of memory

2014-08-26 Thread Vitaly Funstein
This is a follow up to the earlier thread I started to understand memory usage patterns of SegmentReader instances, but I decided to create a separate post since this issue is much more serious than the heap overhead created by use of stored field compression. Here is the use case, once again. The

SegmentReader heap usage with stored field compression on

2014-08-23 Thread Vitaly Funstein
Is it reasonable to assume that using stored field compression with a lot of stored fields per document in a very large index (100+ GB) could potentially lead to a significant heap utilization? If I am reading the code in CompressingStoredFieldsIndexReader correctly, there's a non-trivial accounti

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Vitaly Funstein
y sure calling indexWriterConfig.clone() in the middle > of indexing documents used to work for my code(same Lucene 4.7). It is > since recently I had to do faceted indexing as well that this problem > started to emerge. Is it related? > > > On Mon, Aug 11, 2014 at 11:31 PM, Vit

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Vitaly Funstein
mean whenever the indexWriter gets called for > commit/prepareCommit, etc., the corresponding indexWriterConfig object > cannot be called with .clone() at all? > > > On Mon, Aug 11, 2014 at 9:52 PM, Vitaly Funstein > wrote: > > > Looks like you have to clone it

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Vitaly Funstein
Looks like you have to clone it prior to using with any IndexWriter instances. On Mon, Aug 11, 2014 at 2:49 PM, Sheng wrote: > I tried to create a clone of indexwriteconfig with > "indexWriterConfig.clone()" for re-creating a new indexwriter, but I then I > got this very annoying illegalstateex

Re: Custom Sorting

2014-06-25 Thread Vitaly Funstein
As a compromise, you can base your custom sort function on values of stored fields in the same index - as opposed to fetching them from an external data store, or relying on internal sorting implementation in Lucene. It will still be relatively slow, but not nearly as slow as going out to a DB... t

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
on. The same version maintained in lucene as one document. During > startup these numbers define what has to be syncd up. Unfortunately lucene > is used in a webapp, so this happens "only" during a jetty restart. > > - Vidhya > > > > On 21-Jun-2014, at 11:08 am, &qu

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
so verify if the time taken for commit() is longer when more > data piled up to commit. But definitely should be better than committing > for every thread.. > > Will post back after tests. > > - Vidhya > > > > On 21-Jun-2014, at 10:28 am, "Vitaly Funstein"

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
the auto commit parameters appropriately, do i still need the > committer thread ? Because it's job is to call commit. Anyway > add/updateDocument is already done in my writer threads. > > Thanks for your time and your suggestions! > > - Vidhya > > > > On 21-Jun-2014, a

Re: search performance

2014-06-20 Thread Vitaly Funstein
If you are using stored fields in your index, consider playing with compression settings, or perhaps turning stored field compression off altogether. Ways to do this have been discussed in this forum on numerous occasions. This is highly use case dependent though, as your indexing performance may o

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
You could just avoid calling commit() altogether if your application's semantics allow this (i.e. it's non-transactional in nature). This way, Lucene will do commits when appropriate, based on the buffering settings you chose. It's generally unnecessary and undesirable to call commit at the end of

Re: search performance

2014-06-03 Thread Vitaly Funstein
wrote: > Vitaly > > See below: > > > On 2014/06/03, 12:09 PM, Vitaly Funstein wrote: > >> A couple of questions. >> >> 1. What are you trying to achieve by setting the current thread's priority >> to max possible value? Is it grabbing as much CPU t

Re: search performance

2014-06-03 Thread Vitaly Funstein
A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is at best futile, and at worst quite detrimental to responsiveness and o

Re: search performance

2014-06-03 Thread Vitaly Funstein
Something doesn't quite add up. TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true, > false, false, true); > > We use pagination, so only returning 1000 documents or so at a time. > > You say you are using pagination, yet the API you are using to create your collector isn't

Re: NewBie To Lucene || Perfect configuration on a 64 bit server

2014-05-23 Thread Vitaly Funstein
At the risk of sounding overly critical here, I would say you need to scrap your entire approach of building one small index per request, and just build your entire searchable data store in Lucene/Solr. This is the simplest and probably most maintainable and scalable solution. Even if your index co

Searching fields whose names match a pattern

2014-04-14 Thread Vitaly Funstein
Does Lucene have support for queries that operate on fields that match a specific name pattern? Let's say that I am modeling an indexed field that can have a collection of values, but don't want the default behavior of these values appended together within the field, for the purposes of search. So

Re: Stored fields and OS file caching

2014-04-04 Thread Vitaly Funstein
e > filesystem cache that likely contains other fields' values that you > are not interested in. > > > > On Sat, Apr 5, 2014 at 12:23 AM, Vitaly Funstein > wrote: > > I use stored fields to load values for the following use cases: > > - to return per-document v

Re: Stored fields and OS file caching

2014-04-04 Thread Vitaly Funstein
Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > > Sent: Friday, April 04, 2014 9:44 PM > > To: java-user@lucene.apache.org > > Subject: Stored fields and OS file caching &g

Stored fields and OS file caching

2014-04-04 Thread Vitaly Funstein
I have heard here that stored fields don't work well with OS file caching. Could someone elaborate on why that is? I am using Lucene 4.6 and we do use stored fields but not doc values; it appears most of the benefit from the latter comes as improvement in sorting performance, and I don't actually u

Re: Segments reusable across commits?

2014-03-21 Thread Vitaly Funstein
ne between two snapshots, of course > more files can change, because smaller segments may got combined with other > ones on newer snapshots. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de &

Segments reusable across commits?

2014-03-20 Thread Vitaly Funstein
I have a usage pattern where I need to package up and store away all files from an index referenced by multiple commit points. To that end, I basically call IndexWriter.commit(), followed by SnapshotDeletionPolicy.snapshot(), followed by something like this: List files = new ArrayList(dir.li

NRT index readers and new commits

2014-01-30 Thread Vitaly Funstein
Suppose I have an IndexReader instance obtained with this API: DirectoryReader.open(IndexWriter, boolean); (I actually use a ReaderManager in front of it, but that's beside the point). There is no manual commit happening prior to this call. Now, I would like to keep this reader around until no l

SnapshotDeletionPolicy API changes

2014-01-24 Thread Vitaly Funstein
I see that SnapshotDeletionPolicy no longer supports snapshotting by an app-supplied string id, as of Lucene 4.4. However, my use case relies on the policy's ability to maintain multiple snapshots simultaneously to provide index versioning semantics, of sorts. What is the new recommended way of doi

Re: Cost of keeping around IndexReader instances

2013-11-22 Thread Vitaly Funstein
, Oct 10, 2013 at 7:01 PM, Vitaly Funstein wrote: > Hello, > > I am trying to weigh some ideas for implementing paged search > functionality in our system, which has these basic requirements: > >- Using Solr is not an option (at the moment). >- Any Lucene 4.x version can b

RE: Lucene Empty Non-empty Fields

2013-11-04 Thread Vitaly Funstein
Or FieldValueFilter - that's probably easier to use. > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, November 04, 2013 4:37 AM > To: Lucene Users > Subject: Re: Lucene Empty Non-empty Fields > > You can also use FieldCache.getDocsWithFiel

Cost of keeping around IndexReader instances

2013-10-10 Thread Vitaly Funstein
Hello, I am trying to weigh some ideas for implementing paged search functionality in our system, which has these basic requirements: - Using Solr is not an option (at the moment). - Any Lucene 4.x version can be used. - Result pagination is driven by the user application code. - User

Re: Lucene in-memory index

2013-10-09 Thread Vitaly Funstein
I don't think you want to load indexes of this size into a RAMDirectory. The reasons have been listed multiple times here... in short, just use MMapDirectory. On Wed, Oct 9, 2013 at 3:17 PM, Igor Shalyminov wrote: > Hello! > > I need to perform an experiment of loading the entire index in RAM an

Re: Query performance in Lucene 4.x

2013-10-02 Thread Vitaly Funstein
to focus on parallelism within > queries rather than across many queries. Batch processing performance is > still important, but we cannot sacrifice quick "online" responses. It would > be much easier to avoid this whole mess, but we cannot meet our performance > requirements w

Re: Query performance in Lucene 4.x

2013-10-02 Thread Vitaly Funstein
Matt, I think you are mostly on track with suspecting thread pool task overload as the possible culprit here. First, the old school (prior to Java 7) ThreadPoolExecutor only accepts a BlockingQueue to use internally for worker tasks, instead of a concurrent variant (not sure why). So this internal

Re: Calling IndexWriter.commit() immediately after creating the writer

2013-05-31 Thread Vitaly Funstein
n, might be to list the > > directory: if there is only one file, segments_1, then it's a corrupt > > first commit and you can recreate the index. > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > On Wed, May 29, 2013 at 8:09 PM, Vita

Calling IndexWriter.commit() immediately after creating the writer

2013-05-29 Thread Vitaly Funstein
I have encountered a strange issue, that appears to be pretty hard to hit, but is still a serious problem when it does occur. It seems that if the JVM crashes in a racy fashion with instantiation of IndexWriter, the index may be left in an inconsistent state. An attempt to reload such an index on r

Re: Toggling compression for stored fields

2013-05-15 Thread Vitaly Funstein
ustered in the codec > manager by adding META-INF files to your JAR and not using anonymous > subclasses. > > > > Vitaly Funstein schrieb: > > >Uwe, > > > >I may not be doing this correctly, but I tried to see what would happen > >if > >I were to a reo

Re: Toggling compression for stored fields

2013-05-15 Thread Vitaly Funstein
remen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > > Sent: Wednesday, May 15, 2013 11:36 PM > > To: java-user@lucene.apache.org > > Subject: Re: Toggling compression

Re: Toggling compression for stored fields

2013-05-15 Thread Vitaly Funstein
nothing to do with "reindexing" as you are just changing > the encoding of the exact same data on disk. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Orig

Toggling compression for stored fields

2013-05-15 Thread Vitaly Funstein
Is it possible to have a mix of compressed and uncompressed documents within a single index? That is, can I load an index created with Lucene 4.0 into 4.1 and defer the decision of whether or not to use CompressingStoredFieldsFormat until a later time, or even go back and forth between compressed a

Re: Types of Queries

2013-03-29 Thread Vitaly Funstein
Something like this will work: BooleanQuery query = new BooleanQuery(); query.add(new MatchAllDocsQuery(), Occur.MUST); query.add(new BooleanClause(termQuery, Occur.MUST_NOT)); On Fri, Mar 29, 2013 at 1:06 PM, Paul Bell wrote: > Hi, > > I've done a few experiments in Lucene 4.2 wit

Simplifying Lucene 4 storage formats

2013-03-26 Thread Vitaly Funstein
This is probably a pretty general inquiry, but I'm just exploring this as an option at the moment. It seems that Lucene 4 adds some freedom to define how data is actually written to underlying storage by exposing the codec API. However, I find the learning curve for understanding what bits to chan

Min/max support in Lucene

2013-02-20 Thread Vitaly Funstein
I know that general questions about aggregate functions have been asked here before a number of times, but I would like to figure out how to solve at least one specific subset of this issue. Namely, given a specific indexed field, how do I efficiently get the min/max value of the field in the index

Re: Difference in behaviour between LowerCaseFilter and String.toLowerCase()

2012-12-03 Thread Vitaly Funstein
If you don't need to support case-sensitive search in your application, then you may be able to get away with adding string fields to your documents twice - lowercase version for indexing only, and verbatim to store. For example (this is Lucene 4 code, but same idea), // indexed - not stored d

Re: Lucene API

2012-11-05 Thread Vitaly Funstein
this API and new users have a steeper > learning curve because of it. > > > Igal > > > > On 11/5/2012 11:38 AM, Vitaly Funstein wrote: > >> Are you critiquing CharTermAttribute in particular, or Lucene in general? >> It appears CharTermAttribute is DSL-style builder

Re: Lucene API

2012-11-05 Thread Vitaly Funstein
Are you critiquing CharTermAttribute in particular, or Lucene in general? It appears CharTermAttribute is DSL-style builder API, just like its superinterface Appendable - does that not appear intentional and self-explanatory? Further, I believe Term instances are meant to be immutable hence no dire

Re: Lucene 3.6.0 Index Size

2012-10-26 Thread Vitaly Funstein
One thing to keep in mind is that the default merge policy has changed in 3.6 from 2.3.2 (I'm almost certain of that). So it's just a hunch but you may have some unmerged segments left over at the end. Try calling IndexWriter.close(true) after you're done indexing. On Fri, Oct 26, 2012 at 10:50 AM

Re: query for documents WITHOUT a field?

2012-10-25 Thread Vitaly Funstein
ld be "OR (*:* -allergies:[* TO *])" in > Lucene/Solr. > > -- Jack Krupansky > > -Original Message- From: Vitaly Funstein > Sent: Thursday, October 25, 2012 8:25 PM > To: java-user@lucene.apache.org > Subject: Re: query for documents WITHOUT a field? > >

Re: query for documents WITHOUT a field?

2012-10-25 Thread Vitaly Funstein
Sorry for resurrecting an old thread, but how would one go about writing a Lucene query similar to this? SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL An AND case would be easy since one would just use a simple TermQuery with a FieldValueFilter added, but what about other bo

Re: Multiple Blocking Threads with search during an Index reload

2012-10-24 Thread Vitaly Funstein
Just curious - why not take your search feature offline during the reindexing? That would seem sensible from an operational perspective, I think. On Tue, Oct 23, 2012 at 2:03 PM, Raghavan Parthasarathy < raghavan8...@gmail.com> wrote: > Hi, > > We are using Lucene-core and we reindex once a day a

Re: Lucene 4.0-BETA : MultiReader isCurrent openIfChanged

2012-10-09 Thread Vitaly Funstein
You have probably figured it out by now, but my suggestion would be to use SearcherManager the way it is documented for maintaining a searcher backed by an NRT reader. On Sun, Aug 26, 2012 at 2:03 PM, Mossaab Bagdouri wrote: > Thanks for the quick reply. > > I've changed my code to the following

Re: Lucene index on NFS

2012-10-01 Thread Vitaly Funstein
How tolerant is your project of decreased search and indexing performance? You could probably write a simple test that compares search and write speeds of local and NFS-mounted indexes and make the decision based on the results. On Mon, Oct 1, 2012 at 3:06 PM, Jong Kim wrote: > Hi, > > According

Re: ReferenceManager.maybeRefreshBlocking() should not be declared throwing InterruptedException

2012-07-21 Thread Vitaly Funstein
at an older 4.x/5.x version? We recently > removed declaration of this (unchecked) exception... (LUCENE-4172). > > Mike McCandless > > http://blog.mikemccandless.com > > On Fri, Jul 20, 2012 at 11:26 PM, Vitaly Funstein > wrote: > > This probably belongs in the JIRA, an

ReferenceManager.maybeRefreshBlocking() should not be declared throwing InterruptedException

2012-07-20 Thread Vitaly Funstein
This probably belongs in the JIRA, and is related to https://issues.apache.org/jira/browse/LUCENE-4025, but java.util.Lock.lock() doesn't throw anything. I believe the author of the change originally meant to use lockInterruptibly() inside but forgot to adjust the method sig after changing it back

Re: RAM or SSD...

2012-07-18 Thread Vitaly Funstein
I was referring to *RAMDirectory*. On Wed, Jul 18, 2012 at 11:04 PM, Lance Norskog wrote: >> You do not want to store 30 G of data in the JVM heap, no matter what library does this. > MMapDirectory does not store data in the JVM heap. It lets the > operating system manage the disk buffer space. E

Re: RAM or SSD...

2012-07-18 Thread Vitaly Funstein
You do not want to store 30 G of data in the JVM heap, no matter what library does this. On Wed, Jul 18, 2012 at 10:44 AM, Paul Jakubik wrote: > If only 30GB, go with RAM and MMAPDirectory (as long as you have the budget > for that hardware). > > My understanding is that RAMDirectory is intended

Re: In memory Lucene configuration

2012-07-15 Thread Vitaly Funstein
Have you tried sharding your data? Since you have a fast multi-core box, why not split your indices N-ways, say the smaller one into 4, and the larger into 8. Then you can have a pool of dedicated search threads, executing the same query against separate physical indices within each "logical" one i

Re: Direct memory footprint of NIOFSDirectory

2012-07-13 Thread Vitaly Funstein
, Jul 12, 2012 at 2:34 PM, Lance Norskog wrote: > You can choose another directory implementation. > > On Thu, Jul 12, 2012 at 1:42 PM, Vitaly Funstein wrote: >> Just thought I'd bump this. To clarify - for reasons outside my >> control, I can't just run the JVM hos

Re: Direct memory footprint of NIOFSDirectory

2012-07-12 Thread Vitaly Funstein
parameter has to be fairly close to the actual size used by the app (padded for Lucene and possibly other consumers). On Mon, Jul 9, 2012 at 7:59 PM, Vitaly Funstein wrote: > > Hello, > > I have recently run into the situation when there was not a sufficient amount > of di

Direct memory footprint of NIOFSDirectory

2012-07-09 Thread Vitaly Funstein
Hello, I have recently run into the situation when there was not a sufficient amount of direct memory available for IndexWriter to work. This was essentially caused by the embedding application making heavy use of JVM's direct memory buffers and not leaving enough headroom for NIOFSDirectory to op

Re: Deferring merging of index segments

2012-06-04 Thread Vitaly Funstein
, Michael McCandless < luc...@mikemccandless.com> wrote: > On Fri, Jun 1, 2012 at 8:09 PM, Vitaly Funstein > wrote: > > Yes, I am only calling IndexWriter.addDocument() > > OK. > > > Interestingly, relative performance of either approach seems to greatly > >

Re: Deferring merging of index segments

2012-06-01 Thread Vitaly Funstein
indexing > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, May 29, 2012 at 9:42 PM, Vitaly Funstein wrote: >> Hello, >> >> I am trying to optimize the process of "warming up" an index prior to >> using the search subsystem, i.e. it is guarant

Re: Deferring merging of index segments

2012-06-01 Thread Vitaly Funstein
Any takers on this one or is my inquiry a bit too broad? I can post my test code if that helps... On Tue, May 29, 2012 at 6:42 PM, Vitaly Funstein wrote: > Hello, > > I am trying to optimize the process of "warming up" an index prior to > using the search subsystem, i.e. it

Deferring merging of index segments

2012-05-29 Thread Vitaly Funstein
Hello, I am trying to optimize the process of "warming up" an index prior to using the search subsystem, i.e. it is guaranteed that no other writes or searches can take place in parallel with with the warmup. To that end, I have been toying with the idea of turning off segment merging altogether u

Impact of max merged segment setting

2012-02-22 Thread Vitaly Funstein
Hello, I am currently experimenting with tuning of max merged segment MB parameter on TieredMergePolicy in Lucene 3.5, and seeing significant gains in index writing speed from values dramatically lower than the default (5 Gb). For instance, when setting it to 5 or 10 MB, I can see my writing tests

Re: Index writing performance of 3.5

2012-02-10 Thread Vitaly Funstein
olicy by default. can you try to use the same merge policy > on both 3.0.3 and 3.5 and report back? ie LogByteSizeMergePolicy or > whatever you are using... > > simon > > On Thu, Feb 9, 2012 at 5:28 AM, Vitaly Funstein wrote: >> Hello, >> >> I am currently evalua

Index writing performance of 3.5

2012-02-08 Thread Vitaly Funstein
Hello, I am currently evaluating Lucene 3.5.0 for upgrading from 3.0.3, and in the context of my usage, the most important parameter is index writing throughput. To that end, I have been running various tests, but seeing some contradictory results from different setups, which hopefully someone wit