Ok, I will explain the full 'problem' and then explain how I approach it: Lets divide it into three steps:
1. I have a 'dictionary' of words - for every word, I have a list of worlds, which are ids of text documents that the word appears in. So, for example, for the word 'dog', I have '1 1600 36000' in the "worlds" field (which are tokenized whin indexed) - which means that the word dog appears in worlds 1, 1600 and 36000. 2. This index is used to choose synonyms for the word dog - using the "worlds" field - I do a search on this index, giving the query "'1 1600 36000" as in input and thus get the words that are close to the word "dog". I take the 10 closest words. 3. These 10 synonyms are then used to expand the query. Basically, I have 2 problems in this process: a. In the process of finding the synonyms, I would like that the frequency of the word in each of the worlds will be taken into account. so that if 'dog' appeared 3 times in world 1, 10 times in world 1600 and 4 times in world 36000, then it will be taken into account. I wanted to avoid "expanding" the field to be "1 1 1 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 36000 36000 36000 36000". Accordingly I wanted to be able to set the freq by myself. b. In the process of using the synonyms, I wanted to be able to set a 'penalty' factor to the synonyms, together with giving differnt weight to differnt synonyms, according to theur score. I looked at an old thread - Search for synonyms - implemenetation for review : . http://mail-archives.apache.org/mod_mbox/lucene-java-user/200603.mbox/%3c39b0fb508e5d7540aca5ad57225e150d392...@xmail.me.corp.entopia.com%3e I don;t know if its part of lucene now. I didn't quite understand how to use it. Is there a better way to approach it? I hope I explained it well. Thanks, Liat 2009/4/21 Doron Cohen <cdor...@gmail.com> > Depending on the problem you are trying to solve there may be other > solutions to it, not requiring setting wrong (?) values for term > frequencies. > If you can explain what you are trying to solve, people on the list may > be able to suggest such alternatives. > - Doron > > On Sun, Apr 19, 2009 at 2:39 PM, liat oren <oren.l...@gmail.com> wrote: > > > Hi, > > > > I would like to be able to set the term freq to differnt values at index > > time, or at search time. > > > > So if a document has the following text: 1 2, the freq of 1 will get 100 > > and > > the freq of 2 will get 200. I want to avoid expanding it by writing 1 100 > > times. > > > > I looked at Similarity class and wanted to override it, but the tf > function > > gets only freq, so I don't know for which term this freq relates to, thus > I > > can't change the value. > > > > Thanks, > > Liat > > >