RE: Indexing with SnowballAnalyzer and multiple languages in a single index

2006-04-20 Thread jwang
You can have multiple languages in the same index. Just make sure that your language identification process is consistent. You might still get some false positives, for example, if there's a German root that has the same letters as a French root, but means something different, then it might still

Re: Most used words

2006-04-20 Thread karl wettin
20 apr 2006 kl. 13.34 skrev Daniel Cortes: How do you do to obtain the most used words of and Index? Use the terms() and termDocs() from IndexReader. Or if available, use the term frequency vectors. - To unsubscribe, e-

Indexing with SnowballAnalyzer and multiple languages in a single index

2006-04-20 Thread Lorenzo Di Gaetano
Hi all, I'm working at the search api of a multi language CMS, and I'm using the latest Lucene release. I'm using the SnowballAnalyzer in order to have stemmers for various languages. I know that I must use the same analyzer for indexing and searching, in order to obtain correct hits, but can

Most used words

2006-04-20 Thread Daniel Cortes
Hi everybody, I have a simple question for you. How do you do to obtain the most used words of and Index? In my case I want to obtain the 10 most used words in a group. I thinked in use a TreeSet with all words and their frequencies of hits (whit the restriction of GROUPID). Someone have any

Re: BooleanQuery$TooManyClauses

2006-04-20 Thread Supriya Kumar Shyamal
Normally the default setup for BooleanCluase count is 1024, may be your query produce more query than 1024, one work around is that you set the BooleanCluase count to more than 1024. You can do that by just invoking the static method BooleanQuery.setMaxClauseCount(2048); supriya Flávio Marim