Re: how to get payload of a term after IndexSearch.search ?

2013-06-27 Thread wgggfiy
- -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-payload-of-a-term-after-IndexSearch-search-tp4021789p4073708.html Sent from the Lucene - Java Users mailing list archive at Nabble

Re: why did I build index slower and slower ?

2013-05-14 Thread wgggfiy
up - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/why-did-I-build-index-slower-and-slower-tp4062798p4063395.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: [PhraseQuery] Can "jakarta apache"~10 be searched by offset ?

2013-05-13 Thread wgggfiy
Jack, according to you, How can I implemt this requirement ?Could you give me a clue ? thank you very much.The regex query seemed not worked ? I got the field such asFieldType fieldType = new FieldType(); FieldInfo.IndexOptions indexOptions = FieldInfo.IndexOptions.DOCS

Re: why did I build index slower and slower ?

2013-05-13 Thread wgggfiy
En, thanke you. I also found the question that I should make the writer a singleton. and the writer commited and closed every batch. That is,In every buildIndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, analyzer);iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);iwc.setR

why did I build index slower and slower ?

2013-05-12 Thread wgggfiy
My situation is that There are 10,000,000 documents, and I Build index every 5,000 documents. while *in every build*, I follow these steps: IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, analyzer); iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_O

RE: [PhraseQuery] Can "jakarta apache"~10 be searched by offset ?

2013-05-07 Thread wgggfiy
ok,thx but now How can I implemt this requirement ?Jack gave me a clue, but I failed, and it returns no docs when I cameup with a regex query like "jakarta.{1,10}apache"Is there some limitations when use regex query like not indexed and son on ? - -- Email: wuqiu.m...

Re: [PhraseQuery] Can "jakarta apache"~10 be searched by offset ?

2013-05-06 Thread wgggfiy
That's the question.When I get the doc by QueryParser("jakarta apache"~10), which means it hits the query syntax, but it depends on the word position and not on offset, and that is not my intent. There are some docs which satisfied the ("jakarta apache"~10) but not satisfied the regex "jakarta.{1,1

[PhraseQuery] Can "jakarta apache"~10 be searched by offset ?

2013-05-06 Thread wgggfiy
As I know, the syntax *"jakarta apache"~10*, which is a PhraseQuery with a slop=10 in position, but What I want is *based on offset* not on position? Anyone can help me ? thx. - -- Email: wuqiu.m...@qq.com -- -- View this message in context:

Re: Could group results be sorted by groupDocs.totalHits ?

2013-02-08 Thread wgggfiy
I found SortField and Sort, but they just sort by a field, and what I want is to sort by the groupDocs.totalHits ? Anyone knows ? thx - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/Could-g

what's the difference of facet and group search ??

2013-02-01 Thread wgggfiy
rt, I'm totally puzzled, Can anyone explain it with an example ? thx. - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-of-facet-and-group-search-tp4037914.html Sent from

Re: How to find related words ?

2013-01-30 Thread wgggfiy
en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above, what's the difference between you guys ? and I'm reading the reference about how to *extract relevant terms from the top document(s). * anyway, thx - -- Email: wuqiu.m...@qq.com --

How to find related words ?

2013-01-30 Thread wgggfiy
In short, you put in a term like "Lucene", and The ideal output would be "solr", "index", "full-text search", and so on. How to make it ? to find the related words. thx My idea is to use FuzzyQuery, or MoreLikeThis, or calc the score with all the terms and then sort. Any idea ? - -

Re: IndexWriter deleteDocuments

2013-01-30 Thread wgggfiy
it seems that a doc is really deleted until next index merge or something. I'm not sure. - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/IndexWriter-deleteDocuments-tp4037365p4037377.html Se

how to avoid OutOfMemoryError while indexing ?

2013-01-26 Thread wgggfiy
I found it is very easy to come into OutOfMemoryError. My idea is that lucene could set the RAM memory Automatically, but I couldn't find the API. My code: IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, analyzer); int mb = 1024 * 1024; double ram = Runtime.getRuntime().maxMemory

Re: how to add attributes to a field just like term's payload ?

2013-01-24 Thread wgggfiy
hello, but there is no getCommitUserData in IndexReader, how can I get the userdata ?? thx - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-add-attributes-to-a-field-just-like-term-s-p

Re: how to add attributes to a field just like term's payload ?

2013-01-06 Thread wgggfiy
en ha, sounds good, and perhaps already satisfied my need. thx so much. - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-add-attributes-to-a-field-just-like-term-s-payload-tp4031045p4

how to add attributes to a field just like term's payload ?

2013-01-05 Thread wgggfiy
hello, as we know, we can add payload to a term, but whether can we add extra custom info into a field ? such the description of the field, which is the property shared by thd field of all documents. how to make it ? thx - -- Email: wuqiu.m...@qq.com

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread wgggfiy
Thx very much! Lingpipe and Gate are very useful, and new to me, but is it too larger to realize the custom like class TestPostingItem { int termId; long startOffset; long endOffset; float score; int segId; long timeStamp; } ? - ---

Re: what is the offsets and payload in DocsAndPositionsEnum for ??

2012-11-23 Thread wgggfiy
After I finish "packing your information into a payload", but is there some method to search with the information ? what is the "PayloadTermQuery" for ?? thx - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.47206

Re: what is the offsets and payload in DocsAndPositionsEnum for ??

2012-11-18 Thread wgggfiy
thx, mike. about the 3th question, "encode them all into the payload" is better than "a new postings format with the codec" ?? I mean replace the orginal posting item (position, startOffset, endOffset, payload) with my own inverted item such as class TestPostingItem { int termId; l

what is the offsets and payload in DocsAndPositionsEnum for ??

2012-11-18 Thread wgggfiy
s the offset of the term 'lucene', and 33.2 is a score, and 2 is some id, my question is how I can make it indexed ? my first idea is to relized my own posting list format, but is it possible to make it with the startOffset, endOffset and payload ? thx. wgggfiy -- View this message in

Re: how do re-get the doc after the doc was indexed ?

2012-11-17 Thread wgggfiy
Wa, Exactly !! thx, jack. good idea -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-re-get-the-doc-after-the-doc-was-indexed-tp4020865p4020868.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

how do re-get the doc after the doc was indexed ?

2012-11-17 Thread wgggfiy
for example: I indexed a doc with path=c:/books/appach.txt, author=Mike After a long time, I wanted to modify the author to John. But the quethion is how I can get the exact same doc fastly ?? My idea is to traverse the docs from id=0 to id=maxDoc(), and retrive it with store fields, and check its

Re: Retrieval of the position of indexed terms

2012-11-16 Thread wgggfiy
Does anyone resove this ? thx -- View this message in context: http://lucene.472066.n3.nabble.com/Retrieval-of-the-position-of-indexed-terms-tp4015079p4020835.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Lucene Index File Format

2012-11-16 Thread wgggfiy
I'm study deeply in the index format, write java utils to log all of it. And now I have successfully logged .si, .fnm, .fdx, .fdt, but the .tim and .tiq is too complicated... -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Index-File-Format-tp4011133p4020685.html S

Re: Lucene 4.0 Get All Index Terms

2012-11-16 Thread wgggfiy
me too ! Could you explain how you solved it ?? -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-4-0-Get-All-Index-Terms-tp3686023p4020683.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --