ParallelMultiSearcher
Hello, java-user. Does anyone here uses ParallelMultiSearcher for searching big arrays of data? I have some questions about PrefixQuery search.. Thanks in advance. -- Yura Smolsky, http://altervisionmedia.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ParallelMultiSearcher
Dont ask to ask, just ask! ;) Citerar Yura Smolsky <[EMAIL PROTECTED]>: > Hello, java-user. > > Does anyone here uses ParallelMultiSearcher for searching big arrays > of data? I have some questions about PrefixQuery search.. > > Thanks in advance. > > -- > Yura Smolsky, > http://altervisionmedia.com/ > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: ParallelMultiSearcher
Hello, Ronnie. RK> Dont ask to ask, just ask! ;) ok. I have big issue when I try to search ParallelMultiSearcher for PrefixQuery. This query is being rewritten to BooleanQuery during search. This causes Similarity to calculate docFreq for each Term in the BooleanQuery. So if we have a lot of results for some PrefixQuery then we have a lot of calls to docFreq method of Searchable object passed to ParallelMultiSearcher. In my case this Searchable object exists on the other computer (network). Search became very slow b/c of those multiple calls of docFreq over net. I am not sure if this question for users mail list. But I have spent about 3 days to fix this problem and I do not see any solution. Maybe developers of Lucene could suggest something... Thanks and sorry for my bad English. -- Yura Smolsky, http://altervisionmedia.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re[2]: ParallelMultiSearcher
On 9/21/06, Yura Smolsky <[EMAIL PROTECTED]> wrote: ok. I have big issue when I try to search ParallelMultiSearcher for PrefixQuery. This query is being rewritten to BooleanQuery during search. This causes Similarity to calculate docFreq for each Term in the BooleanQuery. So if we have a lot of results for some PrefixQuery then we have a lot of calls to docFreq method of Searchable object passed to ParallelMultiSearcher. IDF often does not make sense for auto-expanding queries (range, prefix, etc). If you don't need the idf factor that makes rarer terms count more, then use a PrefixFilter wrapped in a ConstantScoreQuery. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ConstantScoreQuery.html http://incubator.apache.org/solr/docs/api/org/apache/solr/search/PrefixFilter.html -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Analysis/tokenization of compound words
Aspell has some support for compound words that might be useful to look at: http://aspell.sourceforge.net/man-html/Compound-Words.html#Compound-Word s Peter Peter Binkley Digital Initiatives Technology Librarian Information Technology Services 4-30 Cameron Library University of Alberta Libraries Edmonton, Alberta Canada T6G 2J8 Phone: (780) 492-3743 Fax: (780) 492-9243 e-mail: [EMAIL PROTECTED] -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 19, 2006 10:22 AM To: java-user@lucene.apache.org Subject: Analysis/tokenization of compound words Hi, How do people typically analyze/tokenize text with compounds (e.g. German)? I took a look at GermanAnalyzer hoping to see how one can deal with that, but it turns out GermanAnalyzer doesn't treat compounds in any special way at all. One way to go about this is to have a word dictionary and a tokenizer that processes input one character at a time, looking for a word match in the dictionary after each processed characters. Then, CompoundWordLikeThis could be broken down into multiple tokens/words and returned at a set of tokens at the same position. However, somehow this doesn't strike me as a very smart and fast approach. What are some better approaches? If anyone has implemented anything that deals with this problem, I'd love to hear about it. Thanks, Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
writer.minMergeDocs in lucene 2.0
Hi all, I am trying to index a database.. indexing is taking quite a time.. i am trtying to tune it.for which i am trying to increase minMergeDocs.. in lucen 1.4.3 there is field called writer.minMergeDocs.. i found writer.setMaxMergeDocs() method in lucne 2.0 but not any method called writer.setMinMergeDocs().. can any one help.. how can i increase minimum docucment merge size. thanks all Ismail Siddiqui
Re: writer.minMergeDocs in lucene 2.0
On 9/21/06, Ismail Siddiqui <[EMAIL PROTECTED]> wrote: in lucen 1.4.3 there is field called writer.minMergeDocs.. i found writer.setMaxMergeDocs() method in lucne 2.0 but not any method called writer.setMinMergeDocs().. Try setMaxBufferedDocs() -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
analyzer to populate more that one field of Lucene document
I need to create two fields for Lucene documents populated 1) by numbers 2) by other strings 3) by values of another specific format What kind of Analyzer would do it? Using the customized analyzer, the current code is like IndexWriter indexWriter = new IndexWriter(indexDir, analyzer, true); Document doc = new Document(); doc.add(new Field("numeric_contents", new FileReader(f))); // numeric tokens doc.add(new Filed("other_contents", new FileReader(f))); //the same file but other than numeric tokens Thanks -- Boris Galitsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: analyzer to populate more that one field of Lucene document
I think you want a PerFieldAnalyzerWrapper. It allows you to make a different analyzer for each field in your document. You'll have to write the code to extract the file contents in your desired formats for each field, but you probably do that already ... You can instantiate your IndexWriter with an instance of a PerFieldAnalyzerWrapper and it all "just happens" after that.. From the javadoc for PerFieldAnalyzerWrapper... <<< This analyzer is used to facilitate scenarios where different fields require different analysis techniques.>>> Best Erick On 9/21/06, Boris Galitsky <[EMAIL PROTECTED]> wrote: I need to create two fields for Lucene documents populated 1) by numbers 2) by other strings 3) by values of another specific format What kind of Analyzer would do it? Using the customized analyzer, the current code is like IndexWriter indexWriter = new IndexWriter(indexDir, analyzer, true); Document doc = new Document(); doc.add(new Field("numeric_contents", new FileReader(f))); // numeric tokens doc.add(new Filed("other_contents", new FileReader(f))); //the same file but other than numeric tokens Thanks -- Boris Galitsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: analyzer to populate more that one field of Lucene document
Thanks a lot Erick Boris * Erick Erickson <[EMAIL PROTECTED]> [Thu, 21 Sep 2006 20:53:42 -0400]: I think you want a PerFieldAnalyzerWrapper. It allows you to make a different analyzer for each field in your document. You'll have to write the code to extract the file contents in your desired formats for each field, but you probably do that already ... You can instantiate your IndexWriter with an instance of a PerFieldAnalyzerWrapper and it all "just happens" after that.. >From the javadoc for PerFieldAnalyzerWrapper... <<< This analyzer is used to facilitate scenarios where different fields require different analysis techniques.>>> Best Erick On 9/21/06, Boris Galitsky <[EMAIL PROTECTED]> wrote: > > I need to create two fields for Lucene documents populated > 1) by numbers > 2) by other strings > 3) by values of another specific format > > What kind of Analyzer would do it? > > Using the customized analyzer, the current code is like > > IndexWriter indexWriter = new IndexWriter(indexDir, analyzer, true); > Document doc = new Document(); > doc.add(new Field("numeric_contents", new FileReader(f))); // > numeric tokens > doc.add(new Filed("other_contents", new FileReader(f))); //the > same file but other than numeric tokens > > Thanks > -- > Boris Galitsky. > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Boris Galitsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]