Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-19 Thread Glen Newton
I will look a little deeper into the information you supplied and comment, but will suggest this on my initial cursory review: 1 - You have 32GB of memory. Using the 64bit VM, try using a 16GB or 24GB heap; 2 - Turn-on huge pages: -XX:+UseLargePages -XX:LargePageSizeInBytes=256m 3 - Tu

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-18 Thread Varun Dhussa
Hi, The details are as follows: Solaris version: Solaris 10 U5 and U6 For the Java Setup, I have tried with: Sun JDK 1.5 (32 & 64) Sun JDK 1.6 (32 & 64) Heap Space: 2G from 32 bit and 4G for 64 bit (Set the same values for both XMS and XMX) Disk: Tried with ZFS (U6) and UFS (U5) I reduced the

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-18 Thread Michael Stoppelman
Fuzzy search tends to be super heavy on CPU because of the Levenstein distance algo. We use it for a small index 60MB for spell correcting and our QPS suffers as a result. There was recently a discussion of a new fuzzy algorithm: https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-18 Thread Glen Newton
Could you give some configuration details: - Solaris version - Java VM version, heap size, and any other flags - disk setup You should also consider using huge pages (see http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html) I will also be posting performance gains using

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-18 Thread eks dev
: Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers > > > I was having some thoughts recently about speeding up fuzzy search. > > The current system does edit-distance on all terms A-Z, single threaded. > Prefix > length can reduce the search space a

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-18 Thread Varun Dhussa
The method suggested would make the speed faster, but I doubt whether it would be substantial on processors with slower clock speed. Keeping in mind that most processors are going multi-core, it would make sense to multi-thread the scan. Any remarks are welcome! Varun Dhussa Product Architect

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-18 Thread mark harwood
I was having some thoughts recently about speeding up fuzzy search. The current system does edit-distance on all terms A-Z, single threaded. Prefix length can reduce the search space and there is a "minimum similarity" threshold but that's roughly where we are. Multithreading this to make use o