> (3) Makes it a lot faster to update the index. I find this to be the main 
> selling point myself.

Yes of course :-) We want to update the index more often. That's why it's not 
really an option to maintain an optimized index.

> Do you have some typical response times from the optimized index and the 
> segmented one, after some hundred or thousand queries has been processed and 
> the OS cache is properly warmed?

Each query not in the OS cache takes a lot longer to complete compared to the 
optimized version so the server simply fall apart before it could be "warmed". 
With the optimized index, even when pruning the OS cache, incoming queries are 
quickly served. For instance, when the server is cold a big query took 700ms 
and now it takes >2000ms.

> Can you give us a representative query?

Here is a query with two terms. Basically we search for documents having all 
the searched terms (across fields, stems included) or having an exact match 
(spanqueries).

filtered(
(
+(Content:system^6.75 Content:systemº^1.5 Title:system^47.25 Title:systemº^10.5 
(MT_Book:system^81.0 MT_Book:systemº^18.0 MT_Author:system^20.25 
MT_Author:systemº^4.5 MT_Pub:system^16.875 MT_Pub:systemº^3.75 
MT_Cats:system^27.0 MT_Cats:systemº^6.0))

+(Content:data^6.75 Content:dataº^1.5 Title:data^47.25 Title:dataº^10.5 
(MT_Book:data^81.0 MT_Book:dataº^18.0 MT_Author:data^20.25 MT_Author:dataº^4.5 
MT_Pub:data^16.875 MT_Pub:dataº^3.75 MT_Cats:data^27.0 MT_Cats:dataº^6.0))

) 
(
(
        spanNear([Content:system, Content:data], 1, true)^0.5 
        spanNear([Title:system, Title:data], 1, true)^3.5 
        (
                spanNear([MT_Book:system, MT_Book:data], 1, true)^6.0
                spanNear([MT_Author:system, MT_Author:data], 1, true)^1.5
                spanNear([MT_Pub:system, MT_Pub:data], 1, true)^1.25 
                spanNear([MT_Cats:system, MT_Cats:data], 1, true)^2.0)
)^36.0
))->MetaFilter[p=true]

Alessandro De Simone

-----Original Message-----
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: mardi 20 mai 2014 22:09
To: java-user@lucene.apache.org
Subject: RE: search time & number of segments

De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote:
> We have stopped optimizing the index because everybody told us it was a bad 
> idea.
> It makes sense if you think about it. When you reopen the index not all 
> segments must be reopened then you have:
> (1)     better reload time
> (2)     keep the OS file cache at maximum

(3) Makes it a lot faster to update the index. I find this to be the main 
selling point myself.

> I have never read any warning saying that doing so will have a big impact on 
> performance.

And we're back to the puzzle why you get so many more I/O operations with your 
16 segments.


Do you have some typical response times from the optimized index and the 
segmented one, after some hundred or thousand queries has been processed and 
the OS cache is properly warmed?

Can you give us a representative query?

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to