Re: What is the difference between PhraseQuery and BooleanQuery with BooleanClause.Occur.SHOULD

2016-04-01 Thread Sachin Kulkarni
I got the answer. Somehow I missed it. The PhraseQuery requires the terms to be in a fixed order whereas the BooleanQuery does not require the terms to be in a particular order. On Thu, Mar 31, 2016 at 3:07 PM, Sachin Kulkarni wrote: > Hi, > > I am using Lucene-5.0.0. > If I had

What is the difference between PhraseQuery and BooleanQuery with BooleanClause.Occur.SHOULD

2016-03-31 Thread Sachin Kulkarni
Hi, I am using Lucene-5.0.0. If I had a qurey "New York" and if I use the BooleanQuery with the BooleanClause set to MUST on the two terms, is it the same as dong a PhraseQuery with the two terms? I am doing some 2-gram type queries and they are giving me different results with these two methods.

Re: Learning to Rank algorithms in Lucene

2015-08-18 Thread Sachin Kulkarni
Where do you plan to use it? So far there is no built in learning to rank implementations in Lucene at least. There are suggestions to include those. I do not know about Solr. I worked on research projects on Learning to Rank algorithms and I had used Lucene to generate the features which then I r

Re: Can lucene index tokenized files?

2014-09-25 Thread Sachin Kulkarni
searched in the index that I have created? Thank you in advance. Regards, Sachin On Mon, Sep 15, 2014 at 4:36 PM, Sachin Kulkarni wrote: > Hi Erick, > > Thank you. > > Yes the data is in text form with the space delimited tokens. > The queries are categories that the documents bel

Re: Can lucene index tokenized files?

2014-09-15 Thread Sachin Kulkarni
alysis chain that mimics the pre-tokenization > you have at index time? > > Best, > Erick > > On Sun, Sep 14, 2014 at 8:34 PM, Sachin Kulkarni > wrote: > > Hi Uwe, > > > > Thank you. > > I do not have the tokens serialized, so that reduces one

Re: Can lucene index tokenized files?

2014-09-14 Thread Sachin Kulkarni
t; > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Sachin Kulkarni [mailto:kulk...@hawk.iit.edu] > > Sent: Sunday, September 14, 2014 10:06 PM &g

Can lucene index tokenized files?

2014-09-14 Thread Sachin Kulkarni
Hi, I have a dataset which has files in the form of tokens where the original data has been tokenized, stemmed, stopworded. Is it possible to skip the lucene analyzers and index this dataset in Lucene? So far the dataset I have dealt with was raw and used Lucene's tokenization and stemming schem

Re: How does Lucene decides which fields have termvectors stored and which not?

2014-08-22 Thread Sachin Kulkarni
o. I works well once I fixed the parser. Regards, Sachin Kulkarni On Tue, Aug 19, 2014 at 9:53 PM, Sachin Kulkarni wrote: > Hi Kumaran, > > See below some part of the code and the .alg file. > Here is the function from DocMaker.java from the package "package > org.apache.luce

Re: How does Lucene decides which fields have termvectors stored and which not?

2014-08-19 Thread Sachin Kulkarni
zed.norms=true content.source.excludeIteration=true ResetSystemErase CreateIndex { AddDoc } : * CloseIndex ### END OF FILE Regards, Sachin Kulkarni On Tue, Aug 19, 2014 at 1:59 PM, Sachin Kulkarni wrote: > Hi Kumaran, > > I am using the benchmark utility from Lucene and doing the

Re: How does Lucene decides which fields have termvectors stored and which not?

2014-08-19 Thread Sachin Kulkarni
indexing code. please share it > > - > Kumaran R > > > > > > On Tue, Aug 19, 2014 at 7:18 PM, Sachin Kulkarni > wrote: > > > Hi, > > > > Sorry for all the code, It got sent out accidentally. > > > > The following code is part of t

Re: How does Lucene decides which fields have termvectors stored and which not?

2014-08-19 Thread Sachin Kulkarni
reTermVectorOffsets : false list field is : docdate Field storeTermVectorOffsets : false list field is : doctitle Field storeTermVectorOffsets : false list field is : body Field storeTermVectorOffsets : false ***/ Hope this code comes out legible in the email. Thank you. Regards, Sachin K

Re: How does Lucene decides which fields have termvectors stored and which not?

2014-08-19 Thread Sachin Kulkarni
" + IFT.stored()); //for (FieldInfo.IndexOptions c : IFT.indexOptions().values()) // System.out.println(c); } // *88 // On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian wrote: > Hi Sachin Kulkarni, > > If possible, Please share your code. &

How does Lucene decides which fields have termvectors stored and which not?

2014-08-18 Thread Sachin Kulkarni
Hi, I am using Lucene 4.6.0. I have been storing 5 fields for my documents in the index, namely body, title, docname, docdate and docid. But when I get the fields using IndexReader.getTermVectors(indexedDocID) I only get the docname and body fields and can retrieve the term vectors for those fie

Re: How does Lucene decide which fields to index?

2014-08-04 Thread Sachin Kulkarni
You should know two things to get this. > 1.Indexed fields can be searched. > 2.Stored fields can be fetched. > > Check your code whether you are storing all fields. > > > -- > Kumaran R > Sent from Phone > > > On 04-Aug-2014, at 7:13 pm, Sachin Kulkarni > wro

How does Lucene decide which fields to index?

2014-08-04 Thread Sachin Kulkarni
. Regards, Sachin Kulkarni

how to extract feature vectors.

2013-04-14 Thread Sachin Kulkarni
Dear all, I would like to extract feature vectors for each document that is relevant to a query and write it out to a file. Is there a way in Lucene where I can specify a parameter to do this? or which part of the code deals with the feature vectors related to the documents so that I can modify th

Re: Lucene 4 architecture - paper available

2012-10-09 Thread Sachin Kulkarni
. Kind Regards, Sachin Kulkarni On Tue, Oct 9, 2012 at 6:59 AM, Andrzej Bialecki wrote: > Hi all, > > Together with Grant Ingersoll and Robert Muir we have submitted a paper to > the "SIGIR 2012 Workshop on Open Source Information Retrieval" held on 16 > Aug

TREC document Parser questions..

2012-10-06 Thread Sachin Kulkarni
Hi, I am using the TRECParserByPath in lucene to index the TREC disc 4-5 data. This does cover all the filetypes except CR collection IS Lucene using the default Gov2parser to par the CR Collection? IS there a parser that can be use for the CR Collection directly? Thank you. Regards, Sachin

Re: setting different similarity in config (.alg) file at indexing

2012-09-28 Thread Sachin Kulkarni
your index to the new format with IndexUpgrader first." So basically in my case I do not need to set it in the .alg file. On Wed, Sep 5, 2012 at 7:58 AM, Sachin Kulkarni wrote: > Hi, > > For Lucene core 4.0. BETA, under the search.similarities help page it says > the followin

setting different similarity in config (.alg) file at indexing

2012-09-05 Thread Sachin Kulkarni
Hi, For Lucene core 4.0. BETA, under the search.similarities help page it says the following "To change Similarity, one must do so for both indexing and searching, and the changes must happen befo

Open Relevance Project.

2012-08-08 Thread Sachin Kulkarni
Dear All, I was wondering if the Open Relevance Project(ORP) is currently active and available for users. I just installed Lucene and was hoping to use the ORP to do some relevance testing and work with their dataset. When I search on google I see that the ORP website and wiki have not been upda