Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-08-08 Thread Bernd Fehling
Hi Tom, I just see that you have Linux with 2.6 kernel. Have you already -XX:+UseLargePages as performance option enabled and in use? Solaris 9 has it on by default but with Linux HugePages must be enabled. http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html Just an id

find associations in a term-document matrix(index)

2013-08-08 Thread Antonio Brito
Hello,   is there a way at Lucene when i can find associations in a term-document matrix (index)?    On R there is a tm package called findAssocs.    Basically, the TERM and CORRELATION BOUND LIMIT is passed as parameter and is returned a array with all terms(words) which correlate with term mo

Re: Avoid automaton Memory Usage

2013-08-08 Thread Michael McCandless
On Thu, Aug 8, 2013 at 12:54 PM, Anna Björk Nikulásdóttir wrote: > > Am 8.8.2013 um 12:37 schrieb Michael McCandless : > >> >>> What would help in my case as I use the same FST for both analyzers, if the >>> same FST object could be shared among both analyzers. So what I am doing is >>> to use

Re: Avoid automaton Memory Usage

2013-08-08 Thread Anna Björk Nikulásdóttir
Am 8.8.2013 um 12:37 schrieb Michael McCandless : > >> What would help in my case as I use the same FST for both analyzers, if the >> same FST object could be shared among both analyzers. So what I am doing is >> to use AnalyzingSuggester.store() and use the stored file for >> AnalyzingSugges

Re: WeakIdentityMap high memory usage

2013-08-08 Thread Michael McCandless
I agree "file sitting" is not great, but at worse this causes a higher transient disk usage, which happens already if you have readers open against those files, during merging, during CFS building, etc. A number of users have complained about the apparent RAM usage of WeakIdentityMap, and it adds

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-08-08 Thread Robert Muir
On Thu, Aug 8, 2013 at 11:18 AM, Tom Burton-West wrote: > Sure I should be able to build a lucene core and give it a try. I probably > won't run it until tomorrow night though because right now I'm running some > other tests on the machine I would run CheckIndex from and disk I/O (i.e. > CheckInd

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-08-08 Thread Tom Burton-West
Sure I should be able to build a lucene core and give it a try. I probably won't run it until tomorrow night though because right now I'm running some other tests on the machine I would run CheckIndex from and disk I/O (i.e. CheckIndex) would mess with the tests. Do I just need to check out revis

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-08-08 Thread Robert Muir
Hi Tom, I committed a fix for the root cause (https://issues.apache.org/jira/browse/LUCENE-5156). Thanks for reporting this! I dont know if its feasible for you to build a lucene-core.jar from branch_4x and run checkindex with that jar file to confirm it really addresses the issue: if this is pos

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-08-08 Thread Tom Burton-West
Hi Robert, I've been running CheckIndex for over a week and it is still working through seekCeil() (See below.) I'm going to kill the CheckIndex. Admittedly, this index is an unusual one, but at one point we were considering using MLT in our regular index which would result in a large termvecto

RE: WeakIdentityMap high memory usage

2013-08-08 Thread Uwe Schindler
Hi Mike, I don't think disabling by default is a good idea. It is not only 64 bit wasted address space (which is not a problem at all, you are right), but the JVM also "sits" on those files: - On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, where you can now delete them i

Re: Avoid automaton Memory Usage

2013-08-08 Thread Michael McCandless
On Wed, Aug 7, 2013 at 1:18 PM, Anna Björk Nikulásdóttir wrote: > Ah I see. I will look into the AnalyzingInfixSuggester. I suppose it could be > useful as an alternative rather to AnalyzingSuggester instead of > FuzzySuggestor ? Yes, but it's very different (it does no fuzzing, and it matches

Re: Re-load Suggester question

2013-08-08 Thread Michael McCandless
In general, the build() method fully replaces all internal suggester state every time you call it. Ie, a whole new FST is built, or a whole new index is created (AnalyzingInfixSuggester). The build() process is not incremental, although with AnalyzingInfixSuggester this is in principle easy to do

Re: WeakIdentityMap high memory usage

2013-08-08 Thread Michael McCandless
Thanks for bringing closure. Note that you should still run a tight ship, ie don't give excess heap to Lucene, and instead let the OS take up the slack of any spare RAM for IO caching. Especially with unmap disabled, the JVM will now only unmap once a map is GC'd, so the larger your heap the long

Re: Join Util with Filter Queries

2013-08-08 Thread Martijn v Groningen
Maybe you can just get the `fq` parameter from the `params` parameter in the createParse method and then use that to create the a FilteredQuery yourself and use that as the fromQuery? On 6 August 2013 22:53, Shane Strasser wrote: > So after looking into the problem, I've started to narrow it do