MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-23 Thread Gili Nachum
ields share the same FS blocks, then the hot 2 fields values will be to scattered acrossed the FS the OS cache useless. degradating performance back to I/O bounded. Which is the case with Lucene 3.6? Thanks. Gili Nachum. - To unsubs

MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-26 Thread Gili Nachum
ields share the same FS blocks, then the hot 2 fields values will be to scattered across the FS the OS cache useless. degradating performance back to I/O bounded. Which is the case with Lucene 3.6? Thanks. Gili Nachum.

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-31 Thread Gili Nachum
and rows are documents). But for stored fields, or term vectors, which are "row stride", you won't see efficient use of the OS's IO cache. Mike McCandless http://blog.mikemccandless.com On Wed, Jan 23, 2013 at 7:59 AM, Gili Nachum wrote: Hi, I have a search workload

AUTO: Gili Nachum is out of the office (returning 20/02/2013)

2013-02-18 Thread Gili Nachum
I am out of the office until 20/02/2013. For Search/CCM - Noga Tor For AS-Search/Social People Typeahead - Sharon Krisher Or my manager Eitan Shapiro. Note: This is an automated response to your message "What is equivalent to Document.setBoost() from Lucene 3.6 inLucene 4.1 ?" sent on 18/02/20

Re: Lightweight detection of whether a keyword is CJK or not (language detection)

2013-03-10 Thread Gili Nachum
Answering myself for next generations' sake. Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS does the job. Example: import junit.framework.Assert; import org.junit.Test; public class DetectCJK { @Test public void test1() { Assert.assertEquals(Character.UnicodeBlock.BASIC_LATIN, Ch

Re: Lightweight detection of whether a keyword is CJK or not (language detection)

2013-03-11 Thread Gili Nachum
tCJK(Character character, String message) { UnicodeBlock unicodeBlock = Character.UnicodeBlock.of(character); Assert.assertTrue(message, cjkUnicodeBlocks.contains(unicodeBlock)); } } On Mon, Mar 11, 2013 at 12:10 AM, Trejkaz wrote: > On Sun, Mar 10, 2013 at 8:19 PM, Gili Nachu

Re: Should heap size be proportionate to the size of the index I'm opening?

2013-03-11 Thread Gili Nachum
y still be applicable. > > 512Mb for a 70Gb index sounds very conservative. > > > > -- > Ian. > > > On Mon, Mar 11, 2013 at 9:08 AM, Gili Nachum wrote: > > Hello. > > > > I'm getting an OOME with a heap size of 512MB while trying to open an > &g

Best practices in boosting by proximity?

2013-05-04 Thread Gili Nachum
Hi. *I would like for hits that contain the search terms in proximity to each other to be ranked higher than hits in which the terms are scattered across the doc. Wondering if there's a best practice to achieve that?* I also want that all hits will contain all of the search terms (implicit AND): *

Re: Best practices in boosting by proximity?

2013-05-05 Thread Gili Nachum
s on the corpora your index > represents, your queries and your needs. > > > > Given your question it looks like you're using the query parser. Try > something like "your proximity query"~20, but consider the cost of a great > slop. > > > > >

How important is single segment optimization to Search time performance?

2013-08-18 Thread Gili Nachum
ndering if anyone has tested # of segments against search time performance?* I should add I have ~10 indexes, at a total size of 50GB, and I use mutli-index searcher to search over them (Lucene 3.0.3 - yeah it's old I know). The index is updated every 15min. Gili Nachum.

Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

2013-11-05 Thread Gili Nachum
Hello, I got an index corruption in production, and was wondering if it might be a known bug (still with Lucene 3.1), or is my code doing something wrong. It's a local disk index. No known machine power lose. No suppose to even happen, right? This index that got corrupted is updated every 30sec; a

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

2013-11-06 Thread Gili Nachum
AM, Gili Nachum wrote: > Hello, > I got an index corruption in production, and was wondering if it might be > a known bug (still with Lucene 3.1), or is my code doing something wrong. > It's a local disk index. No known machine power lose. No suppose to even > happen, right?

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

2013-11-07 Thread Gili Nachum
Thanks Mike and Uwe. I already reindexed in production, my goal is to get to the root cause to make sure it doesn't happen again. Will remove the flush(). No idea why it's there. Attaching checkIndex.Main() output (why did I bother writing my own output :#) *Output:* Opening index @ C:\\customers\

FSTs to drive type ahead search?

2013-11-23 Thread Gili Nachum
Hello! I've implemented a type ahead search by indexing all possible terms' prefixes as fields on the docs. The resulting index is about 1gb in size and fits in the filesystem cache. Will implementing this differently, over FSTs instead of prefixes, would bare any performance/size/features advantag

Optimal FS block size for "small" documents?

2015-05-25 Thread Gili Nachum
Hi, What FS block size to use? I have an RAID-5 of SSD drives currently configured with a 128KB block size. Can I expect better indexing/query time performance with a smaller block size (say 8K)? Considering my documents are almost always smaller than 8K. I assume all stored fields would fit into