RE: Lucene VSM scoring

2013-07-09 Thread Uwe Schindler
Hi, TF-IDF is just the default (and fast) scoring scheme. You can modify that (the "Similarity") as you want (since Lucene 4.0): http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/search/similarities/package-summary.html There are already various other ones available, like BM25. You have

RE: getLocale of SortField

2013-07-09 Thread Uwe Schindler
Hi, it was not completely removed (it was intended to be removed). See the migration guide @ http://lucene.apache.org/core/4_3_1/MIGRATE.html, section about LUCENE-2514. The constructor and the logic was removed from SortField. The "fast" replacement (means sorting works as fast without collati

Lucene VSM scoring

2013-07-09 Thread Jason Z.
Hi, In the Lucene docs it mentions that Lucene impements a tf-idf weighting scheme for scoring. Is there anyway to modfiy Lucene to implement a custom weighting scheme for the VSM? Thank you.

Re: getLocale of SortField

2013-07-09 Thread Trejkaz
On Wed, Jul 10, 2013 at 12:53 AM, Uwe Schindler wrote: > Hi, > > there is no more locale-based sorting in Lucene 4.x. It was deprecated in 3.x, > so you should get a warning about deprecation already! I wasn't sure about this because we are on 3.6 and I didn't see a deprecation warning in our cod

RE: posting list strings

2013-07-09 Thread Uwe Schindler
Hi, You can replace the term by their hash directly in the analyzer chain. Just write a custom TermToBytesRef attribute that hashes the term to a constant-length byte[] (using a AttributeFactory)! :-) This would give you all features of hashed, constant length terms, but you would lose prefix a

Re: posting list strings

2013-07-09 Thread Adrien Grand
Hi, Lucene stores the string because it may need it to run prefix or range queries. We don't have a hash-based terms dictionary right now but I know some people wrote one since they don't need support for these queries, see for instance the Earlybird paper[1]. Then if you can find a perfect hashin

Re: NRT + static rank based sorting

2013-07-09 Thread Adrien Grand
Hi Sriram, On Tue, Jul 9, 2013 at 5:06 AM, Sriram Sankar wrote: > I've finally got something running and will send you some performance > numbers as promised shortly. In the meanwhile, I've a question regarding > the use of real time indexing along with ordering by static rank. Before > each se

Re: Lucene in Action

2013-07-09 Thread Vinh Dang
Please try, I am new on Lucene also, and willing to study and share :) Sent from my BlackBerry® smartphone from Viettel -Original Message- From: "Vinh Dang" Date: Tue, 9 Jul 2013 15:19:41 To: Reply-To: dqvin...@gmail.com Subject: Re: Lucene in Action You have my yesterday question :)

Re: Lucene in Action

2013-07-09 Thread Vinh Dang
You have my yesterday question :) After unzip lucene, you just need to import lucene core. JAR file into your project to use (with eclipse,just drag and drop). Lucene core.jar (I do not remember exact name, but easy to find this jar file) provides core functions of lucene --Original Message

RE: getLocale of SortField

2013-07-09 Thread Uwe Schindler
You should look at at using the cool new Analysis features of Lucene 4. But that needs rewriting your code. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Yonghui Zhao [mailto:zhaoyong...@gmail.com] >

RE: getLocale of SortField

2013-07-09 Thread Yonghui Zhao
got it. Thanks 在 2013-7-9 下午10:54,"Uwe Schindler" 写道: > Hi, > > there is no more locale-based sorting in Lucene 4.x. It was deprecated in > 3.x, so you should get a warning about deprecation already! > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de

RE: getLocale of SortField

2013-07-09 Thread Uwe Schindler
Hi, there is no more locale-based sorting in Lucene 4.x. It was deprecated in 3.x, so you should get a warning about deprecation already! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Yonghui Zh

Lucene in Action

2013-07-09 Thread Šimun Šunjić
I am learning about Apache Lucene from Manning book: Lucene in Action. However examples from book is for Lucene v3.0.3 and today Lucene is in version 4.3.1. I can't find any good newer Lucene tutorial for learning, can you guys from community suggest me some :) Thanks -- mag.inf. Šunjić Šimun

getLocale of SortField

2013-07-09 Thread Yonghui Zhao
I am updating one project from lucene 3.x to lucene 4.x I found getLocale of SortField is moved. How can I fix it?

Suggestion for Autocomplete/Autosuggest

2013-07-09 Thread sivakumar
We have indexed file name and contents in lucene index also provided search which working fine . Now we are planning to provide auto suggestion ( similar to google search ) Used Lucene version : 4.0 we have tested with spell pkg ( org.apache.lucene.search.spell ) but in latest/next lucene

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward
IIRC, SpanQueries try and match on the smallest interval possible. So if you've got T1 … T1 … T2, then SpanNear(T1, T2) will match from the second T1. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 09:56, Sébastien Druon wrote: > Thanks Alan, > > Do you know if the search would exclude other

Re: Search for a token appearing after another

2013-07-09 Thread Sébastien Druon
Thanks Alan, Do you know if the search would exclude other occurences of T1 between T1 and T2? ex: T1 (...)* T1 (...)* T2 would not match? Thanks again Sébastien On 9 July 2013 09:48, Alan Woodward wrote: > You can use Integer.MAX_VALUE as the slop parameter. > > Alan Woodward > www.flax.co

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward
You can use Integer.MAX_VALUE as the slop parameter. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 07:55, Sébastien Druon wrote: > Hello, > > I am looking for a way to search for a token appearing after another and > retrieve tehir positions. > > ex: T1 (...)* T2 > > I know the SpanTermQuer

Re: TermDocs

2013-07-09 Thread lukai
The code snippet you posted is implementation of MatchAllQuery , it only gives you the live doc id in the specified segment. If you want to get extra information about a term, eg. freq, payload, you need to do some calculation. The good thing is FST is sorted, so you can maintain a list of TermsEnu

RE: TermDocs

2013-07-09 Thread Uwe Schindler
Hi, > I don't find an elegant solution. reader.termDocs(null) returns AllTermDocs > which doesn't exist in lucene 4.3. > > > I use this piece of code > > Bits liveDocs = reader.getLiveDocs(); > for (int i = 0; i < reader.maxDoc(); ++i) { > if (liveDocs != null && !liveDocs.get(i)