Denis Prost writes:
 >    Attached are 4 log files :
 >      * one from "recoll -t -q gazette" (155 results)
 >      * one from recollrunner with the same query (only "default query
 >        language" checked in recollrunner config) (3 results : only the
 >        ones among the 155 which do not contain spaces in their pathes)
 >      * one from recoll -t -f -q gazette" (46 results)
 >      * one from recollrunner with the same query ("default query language
 >        checked" and "match filenames" checked in recollrunner config) (0
 >        result)
 > 
 >    I hope it will help solving this issue.
 >    Regards
 >    Denis

Thanks a lot for the log files, my comments below:

first:
 > :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [*gazette*]

My guess is that this is from the 3d query (recoll -t -f -q gazette). The
"-q" which would specify a "query language" query is ignored (because of how
the options are parsed), and this is a filename query where gazette is
transformed to *gazette* because it is neither capitalized nor contains
wildcards. It is supposed to return all documents with [gazette] as part of
their file name.

Second:
 > :4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: [gazette]

This is from  [recoll -t -q gazette], which is a regular text search query,
returning all documents with gazette or a derivative ([gazettes]) in the
contents, or possibly in the file name field processed as text.

Third:

 > :4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: ['gazette']

This is probably from recollrunner with only 'default query language'
checked: there is excessive quoting, but it doesn't hurt much because this
is a full text search and the quotes get eliminated. I don't know why
recollrunner returns few results, but as you mention that these are only
the ones without spaces in the file name, I'd suspect a problem parsing the
output from recoll.

Fourth:
 > :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [*'gazette'*]

This is with recollrunner, "match filenames" and "default query language"
checked. "Match filename" takes precedence and the query fails because of the
excessive quoting.

The only thing that I find strange in the logs is that the 3rd one seems to
indicate that the query actually returns more results than the 1st one,
when I would have thought that they are identical. But the quoting may have
affected the query, the actual Xapian query is truncated in the log for
some reason, so we can't be sure:

:4:../rcldb/rclquery.cpp:237:Query::SetQuery: Q: ((gazette:(wqf=11) OR gazettes 
OR gazet:4:../rcldb/rclquery.cpp:344:Fetching for first 50, count 50

So I think that the first fixes should be for recollrunner to:
 - Avoid excessive single quote quoting
 - Indicate somehow that "query language" and "file name search" are
   different and exclusive modes.
 - Try to better parse the query output when there are spaces in the file
   names.

And then we may get into possible Recoll issues. I'd be quite interested
though by the logs from the 2 following commands:

recoll -t -q gazette
recoll -t -q "'gazette'"

Cheers,

Jf



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to