Re: Spanquery problem

Erick Erickson Mon, 30 Apr 2007 05:52:40 -0700

The first thing I'd do is not use a HashSet when you collect your
SpanTermQuerys since the iteration order is not guaranteed. That is,
the order when putting them in is not necessarily the same as
when getting them out. So you may be searching for
"automatique climatisation" rather then "climatisation automatique".


You can easily test this by looking at snQ.toString() to see what query
is actually constructed.....

If that isn't it, post the results and we'll have another go at it....

Erick

On 4/30/07, axel.reymonet <[EMAIL PROTECTED]> wrote:


Hello,

I am having some issues with the SpanQuery functionality. As a matter of
fact, I index a single french file containing for instance "climatisation
automatique" (which means automatic air-conditioning) with the classical
FrenchAnalyzer, and when I search in this index with SpanQuery, I have the
following situation :
- I have 1 span for the "climatisation" request
- I have 1 span for the "automatique" request
- I have 0 span for the "climatisation automatique" request
Maybe I am doing something wrong, but I cannot spot my mistake. I have
given
the problematic portion of code below. Does anyone have an idea ? Thanks
in
advance,

Axel Reymonet


My code is as follows :
- FOR INDEXING
public void fileProcess(String filepath)
{
File index_dir = new File(this.currentProjectPath+"index");
if (index_dir.exists())
    {
    String[] files = index_dir.list();
    for (String f:files)
        {
        File file = new File (this.currentProjectPath+"index/"+f);
        file.delete();
        }
    index_dir.delete();
    }
File toBeIndexed = new File(filepath);
if (!toBeIndexed.exists() || !toBeIndexed.canRead())
    {
    System.out.println("Document directory '"
+toBeIndexed.getAbsolutePath()+ "' does not exist or is not readable,
please
check the path");
    System.exit(1);
    }
try {
IndexWriter writer = new IndexWriter(index_dir, new
org.apache.lucene.analysis.fr.FrenchAnalyzer(),true);
writer.addDocument(FileDocument.Document(toBeIndexed));
writer.close();
} catch (IOException e) {System.out.println(" caught a " + e.getClass()
+"\n
with message: " + e.getMessage());}
}

-----------------------------------------
- FOR SEARCHING
public void testSF(String searched,String indexPath)
{
try{
IndexReader reader = IndexReader.open(indexPath);
Analyzer analyzer = new org.apache.lucene.analysis.fr.FrenchAnalyzer();
TokenStream requestStream = analyzer.tokenStream("contents",new
StringReader(searched));
HashSet<SpanTermQuery> qSet = new HashSet<SpanTermQuery>();
Token currentToken = requestStream.next();
while (currentToken!=null)
    {
    qSet.add(new SpanTermQuery(new
Term("contents",currentToken.termText())));
    currentToken = requestStream.next();
    }
SpanQuery[] sQ = new SpanQuery [qSet.size()];
int k = 0;
for (SpanTermQuery stq:qSet)
    {
    sQ[k]=stq;
    k++;
    }
SpanNearQuery snQ;
snQ = new SpanNearQuery(sQ,0,true);
Spans spans = snQ.getSpans(reader);
int resultsCpt = 0;
while (spans.next())
    resultsCpt++;
System.out.println("Number of results: "+resultsCpt);
}
catch(IOException ioe){ioe.printStackTrace();}
}
--------------------------------------------
- OUTPUT
Query: climatisation
Number of results: 1
Query: automatique
Number of results: 1
Query: climatisation automatique
Number of results: 0

Re: Spanquery problem

Reply via email to