You could extend IndexWriter to AutoCommitIndexWriter and override flush()
to call super.flush() then commit() (or simply just commit()). I haven't
tested it but I think it should work.
However, make sure you understand the implications of commit() -- it's
heavier than just flush. Perhaps you can
flush is not commit.
On Thu, Jun 28, 2012 at 2:42 PM, Aditya wrote:
> Hi Ram,
>
> I guess IndexWriter.SetMaxBufferedDocs will help...
>
> Regards
> Aditya
> www.findbestopensource.com
>
>
> On Wed, Jun 27, 2012 at 11:25 AM, Ramprakash Ramamoorthy <
> youngestachie...@gmail.com> wrote:
>
>> Dear,
最好搜索的Analyzer 和生成index的Analyzer 保持一致
On Thu, Jun 28, 2012 at 2:31 PM, Paco Avila wrote:
> Thank, using Whitespace Analyzer works, but I don't understand why
> StandardAnalyzer does not work if according with the ChineseAnalyzer
> deprecation I should use StandardAnalyzer:
>
> @deprecated Use {@li
in Chinese, there isn't word boundary between words. it writes like:
Iamok. you should tokenize it to I am ok
if you want to search *amo*, you should view I am ok as one token. In
Chinese, fuzzy search is not very useful. even use Standard Analyzer,
it's ok to use boolean query. because "Iamok" is
Hi Ram,
I guess IndexWriter.SetMaxBufferedDocs will help...
Regards
Aditya
www.findbestopensource.com
On Wed, Jun 27, 2012 at 11:25 AM, Ramprakash Ramamoorthy <
youngestachie...@gmail.com> wrote:
> Dear,
>
>I am using Lucene for my log search tool. Is there a way I can
> automatically
Thank, using Whitespace Analyzer works, but I don't understand why
StandardAnalyzer does not work if according with the ChineseAnalyzer
deprecation I should use StandardAnalyzer:
@deprecated Use {@link StandardAnalyzer} instead, which has the same
functionality.
Is very annoying.
2012/6/27 Li Li
thanks a lot
btw: docId is 0,1,2 but the delta is 0,1,2, not the 0,1,1.
The delta value is useless at this moment.
In which scene the delta value will be usefull?
On Thu, Jun 28, 2012 at 11:48 AM, Li Li wrote:
> On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote:
>> thanks
>>
>> co
On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote:
> thanks
>
> could you help me to solve another problem,
>
> why lucene will reset lastDocID = 0 when finish add one doc?
it will not call finish after adding a document
reading the JavaDoc of FormatPostingsDocsConsumer
/** Called when w
lastDocID represent last document which contains this term.
because it will reuse this FormatPostingsDocsConsumer. so you need
clear all member variables in finish method
On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote:
> thanks
>
> could you help me to solve another problem,
>
> why lucene will
thanks
could you help me to solve another problem,
why lucene will reset lastDocID = 0 when finish add one doc?
in source code FormatPostingsDocsWriter.java
@Override
void finish() throws IOException {
long skipPointer = skipListWriter.writeSkip(out);
Maybe high frequency terms that are not evenly distributed throughout
the corpus would be a better definition. Discriminative terms. I'm
sure there is something in the machine learning literature about
unsupervised clustering that would help here. But I don't know what it
is :)
-Mike
On 0
see definitions:
http://lucene.apache.org/core/3_6_0/fileformats.html#Definitions
simon
On Wed, Jun 27, 2012 at 6:08 PM, Simon Willnauer
wrote:
> a term in this context is a (field,text) tuple - does this make sense?
> simon
>
> On Wed, Jun 27, 2012 at 11:40 AM, wangjing wrote:
>> http://lucene
a term in this context is a (field,text) tuple - does this make sense?
simon
On Wed, Jun 27, 2012 at 11:40 AM, wangjing wrote:
> http://lucene.apache.org/core/3_6_0/fileformats.html#Frequencies
>
> The .frq file contains the lists of documents which contain each term,
> along with the frequency o
standard analyzer will segment each character into a token, you should use
whitespace analyzer or your own analyzer that can tokenize it as one token
for wildcard search
在 2012-6-27 傍晚6:20,"Paco Avila" 写道:
> Hi there,
>
> I have to index chinese content and I don't get the expected results when
>
maybe you did not enable prefixquery feature.
At 2012-06-27 18:19:52,"Paco Avila" wrote:
Hi there,
I have to index chinese content and I don't get the expected results when
searching. It seems that the WildcardQuery does not work properly with the
chinese characters. See attached sample code
Hi there,
I have to index chinese content and I don't get the expected results when
searching. It seems that the WildcardQuery does not work properly with the
chinese characters. See attached sample code.
I store the string "专项信息管理.doc" using the StandardAnalyzer and after that
search for "专项信*"
Cant he synchronously iterate over both fields postingLists and use one
priorityQueue that picks the docs that contain the query and have the best
order according to the second field.
It requires more work but this should be feasible.
2012/6/27 Ian Lea
> I think he wants 1, sort all matched doc
Add imageid as a stored field, no need to index it unless you want to
be able to search by it.
Add the tags as an analyzed indexed field. no need to store unless you
want to read/display the values. StandardAnalyzer will work fine.
Then use QueryParser to build a query like "tags: car", execute
I think he wants 1, sort all matched docs by field A.
If lucene sorting doesn't work for you you can always sort the hits
yourself using whatever technique you want. Sorting large numbers of
docs is always going to be expensive.
--
Ian.
On Wed, Jun 27, 2012 at 8:54 AM, Li Li wrote:
> what do
All words are important if they help people find what they want.
Maybe you want high frequency terms. See contrib class
org.apache.lucene.misc.HighFreqTerms.
--
Ian.
On Wed, Jun 27, 2012 at 3:04 AM, 齐保元 wrote:
>
> meaningful just means the word is important than others,like
> keywords/keyph
what do you want to do?
1. sort all matched docs by field A.
2. sort all matched docs by relevant score, selecting top 100 docs and
then sort by field A
On Wed, Jun 27, 2012 at 1:44 PM, Yogesh patel
wrote:
> Thanks for reply Ian ,
>
> But i just gave suppose document number..i have 2-3 GB index
21 matches
Mail list logo