Re: Auto commit when flush

2012-06-27 Thread Shai Erera
You could extend IndexWriter to AutoCommitIndexWriter and override flush() to call super.flush() then commit() (or simply just commit()). I haven't tested it but I think it should work. However, make sure you understand the implications of commit() -- it's heavier than just flush. Perhaps you can

Re: Auto commit when flush

2012-06-27 Thread Li Li
flush is not commit. On Thu, Jun 28, 2012 at 2:42 PM, Aditya wrote: > Hi Ram, > > I guess IndexWriter.SetMaxBufferedDocs will help... > > Regards > Aditya > www.findbestopensource.com > > > On Wed, Jun 27, 2012 at 11:25 AM, Ramprakash Ramamoorthy < > youngestachie...@gmail.com> wrote: > >> Dear,

Re: Question about chinese and WildcardQuery

2012-06-27 Thread wangjing
最好搜索的Analyzer 和生成index的Analyzer 保持一致 On Thu, Jun 28, 2012 at 2:31 PM, Paco Avila wrote: > Thank, using Whitespace Analyzer works, but I don't understand why > StandardAnalyzer does not work if according with the ChineseAnalyzer > deprecation I should use StandardAnalyzer: > > @deprecated Use {@li

Re: Question about chinese and WildcardQuery

2012-06-27 Thread Li Li
in Chinese, there isn't word boundary between words. it writes like: Iamok. you should tokenize it to I am ok if you want to search *amo*, you should view I am ok as one token. In Chinese, fuzzy search is not very useful. even use Standard Analyzer, it's ok to use boolean query. because "Iamok" is

Re: Auto commit when flush

2012-06-27 Thread Aditya
Hi Ram, I guess IndexWriter.SetMaxBufferedDocs will help... Regards Aditya www.findbestopensource.com On Wed, Jun 27, 2012 at 11:25 AM, Ramprakash Ramamoorthy < youngestachie...@gmail.com> wrote: > Dear, > >I am using Lucene for my log search tool. Is there a way I can > automatically

Re: Question about chinese and WildcardQuery

2012-06-27 Thread Paco Avila
Thank, using Whitespace Analyzer works, but I don't understand why StandardAnalyzer does not work if according with the ChineseAnalyzer deprecation I should use StandardAnalyzer: @deprecated Use {@link StandardAnalyzer} instead, which has the same functionality. Is very annoying. 2012/6/27 Li Li

Re: about .frq file format in doc

2012-06-27 Thread wangjing
thanks a lot btw: docId is 0,1,2 but the delta is 0,1,2, not the 0,1,1. The delta value is useless at this moment. In which scene the delta value will be usefull? On Thu, Jun 28, 2012 at 11:48 AM, Li Li wrote: > On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote: >> thanks >> >> co

Re: about .frq file format in doc

2012-06-27 Thread Li Li
On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote: > thanks > > could you help me to solve another problem, > > why lucene will reset lastDocID = 0 when finish add one doc? it will not call finish after adding a document reading the JavaDoc of FormatPostingsDocsConsumer /** Called when w

Re: about .frq file format in doc

2012-06-27 Thread Li Li
lastDocID represent last document which contains this term. because it will reuse this FormatPostingsDocsConsumer. so you need clear all member variables in finish method On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote: > thanks > > could you help me to solve another problem, > > why lucene will

Re: about .frq file format in doc

2012-06-27 Thread wangjing
thanks could you help me to solve another problem, why lucene will reset lastDocID = 0 when finish add one doc? in source code FormatPostingsDocsWriter.java @Override void finish() throws IOException { long skipPointer = skipListWriter.writeSkip(out);

Re: find meaningful words through Lucene

2012-06-27 Thread Mike Sokolov
Maybe high frequency terms that are not evenly distributed throughout the corpus would be a better definition. Discriminative terms. I'm sure there is something in the machine learning literature about unsupervised clustering that would help here. But I don't know what it is :) -Mike On 0

Re: about .frq file format in doc

2012-06-27 Thread Simon Willnauer
see definitions: http://lucene.apache.org/core/3_6_0/fileformats.html#Definitions simon On Wed, Jun 27, 2012 at 6:08 PM, Simon Willnauer wrote: > a term in this context is a (field,text) tuple - does this make sense? > simon > > On Wed, Jun 27, 2012 at 11:40 AM, wangjing wrote: >> http://lucene

Re: about .frq file format in doc

2012-06-27 Thread Simon Willnauer
a term in this context is a (field,text) tuple - does this make sense? simon On Wed, Jun 27, 2012 at 11:40 AM, wangjing wrote: > http://lucene.apache.org/core/3_6_0/fileformats.html#Frequencies > > The .frq file contains the lists of documents which contain each term, > along with the frequency o

Re: Question about chinese and WildcardQuery

2012-06-27 Thread Li Li
standard analyzer will segment each character into a token, you should use whitespace analyzer or your own analyzer that can tokenize it as one token for wildcard search 在 2012-6-27 傍晚6:20,"Paco Avila" 写道: > Hi there, > > I have to index chinese content and I don't get the expected results when >

Re:Question about chinese and WildcardQuery

2012-06-27 Thread 齐保元
maybe you did not enable prefixquery feature. At 2012-06-27 18:19:52,"Paco Avila" wrote: Hi there, I have to index chinese content and I don't get the expected results when searching. It seems that the WildcardQuery does not work properly with the chinese characters. See attached sample code

Question about chinese and WildcardQuery

2012-06-27 Thread Paco Avila
Hi there, I have to index chinese content and I don't get the expected results when searching. It seems that the WildcardQuery does not work properly with the chinese characters. See attached sample code. I store the string "专项信息管理.doc" using the StandardAnalyzer and after that search for "专项信*"

Re: Lucene Query About Sorting

2012-06-27 Thread Apostolis Xekoukoulotakis
Cant he synchronously iterate over both fields postingLists and use one priorityQueue that picks the docs that contain the query and have the best order according to the second field. It requires more work but this should be feasible. 2012/6/27 Ian Lea > I think he wants 1, sort all matched doc

Re: Question

2012-06-27 Thread Ian Lea
Add imageid as a stored field, no need to index it unless you want to be able to search by it. Add the tags as an analyzed indexed field. no need to store unless you want to read/display the values. StandardAnalyzer will work fine. Then use QueryParser to build a query like "tags: car", execute

Re: Lucene Query About Sorting

2012-06-27 Thread Ian Lea
I think he wants 1, sort all matched docs by field A. If lucene sorting doesn't work for you you can always sort the hits yourself using whatever technique you want. Sorting large numbers of docs is always going to be expensive. -- Ian. On Wed, Jun 27, 2012 at 8:54 AM, Li Li wrote: > what do

Re: Re: find meaningful words through Lucene

2012-06-27 Thread Ian Lea
All words are important if they help people find what they want. Maybe you want high frequency terms. See contrib class org.apache.lucene.misc.HighFreqTerms. -- Ian. On Wed, Jun 27, 2012 at 3:04 AM, 齐保元 wrote: > > meaningful just means the word is important than others,like > keywords/keyph

Re: Lucene Query About Sorting

2012-06-27 Thread Li Li
what do you want to do? 1. sort all matched docs by field A. 2. sort all matched docs by relevant score, selecting top 100 docs and then sort by field A On Wed, Jun 27, 2012 at 1:44 PM, Yogesh patel wrote: > Thanks for reply Ian , > > But i just gave suppose document number..i have 2-3 GB index