how to adjust buffer size of reading file?

2010-08-04 Thread Li Li
I found the system call by java when reading file, the buffer size is always 1024. Can I modify this value to reduce system call? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: j

Re: Boost and ordering based on most recently updated

2010-08-04 Thread jayendra patil
you can probably try using the sort parameter with the primary sort being on score and the secondary sort being on the recent update date. e.g. sort=score desc, recentUpdateDate desc So the recent update date would take affect in the group where the document have same score. Regards, Jayendra On

Boost and ordering based on most recently updated

2010-08-04 Thread Brian Pontarelli
I have a situation where I'm using a Boost on documents to bump them up in the search results when a search has multiple documents with the same hits in the search query. However, it looks like if two or more documents have the same rank after the Boost is applied, the search results are ordered

Re: get wordno, lineno, pageno for term/phrase

2010-08-04 Thread Itamar Syn-Hershko
I quite liked the idea Erick brought up in his last response - using a special field for storing this data. See if you can define its structure in a way that would help you do that and save both performance and index size. Each term in it signaling lineno and pageno (term text is "p1", "p2"...

Re: get wordno, lineno, pageno for term/phrase

2010-08-04 Thread arun r
Thanks for your responses. In this case, retrieval time will be more important than index size. Each document will be indexed separately, and the data (wordno, lineno, pageno) will be extracted for certain terms/phrases for each document and stored. I define linebreak and pagebreak and add them to

Re: get wordno, lineno, pageno for term/phrase

2010-08-04 Thread Erick Erickson
It depends (TM). Yes, it would bloat the index. But nothing in the original post indicates that this is a concern. The index could be 10M or 100G, in one case it matters a lot and in the other it doesn't. It's also unclear whether query response time matters at all or whether this is some sort of b

Re: get wordno, lineno, pageno for term/phrase

2010-08-04 Thread Itamar Syn-Hershko
Storing all that info per-token as payloads will bloat the index. Wouldn't it be wiser to use a special token to mark page feed and end of paragraph (numbers of which could be then stored as payloads), and scan the token stream per document to retrieve them back? some extra operations for retri