Dear all,
I am using lucene 3.0 to index the pdf reports that I generate
dynamically. I index the pdf file name (without extension), file path
and its absolute path as fields. I search with the file name without
extension; it retrieves a list, as usually 2 or more files are present
in the same name
Hey there,
you might have to implement a some kind of unique identifier using an
indexed lucene field. When you are indexing you should fire a query with the
uuid of your document (maybe the path to you pdf document) and check if the
document is in the index already. You could also do a boolean qu
dear,
Thanks for you reply Mr. simon, I found it very useful.
I have another doubt, I create the index in a clustered environment (2
physical systems and 2 virtual). A shared system among the nodes is
where this index will be created. The scheduler runs in another remote
system which will create an
dear all,
as replied below, does searching again for the document in the index
and if found skip the indexing else index it, is this not similar to
indexing all pdf documents once again, is not this overhead? As I am
not going to index the details of the pdf (so if an indexed pdf was
recreated i n
Hi all,
In a clustered environment I search the index from the web
application. In the web application I am creating IndexReader on each
request. is it expensive to do like this? I read somewhere in the web
that try using the same reader as much as possible. Can i keep the
initially created IndexR
Regarding Part3:
Data quality
For our search domain (catalog products) we face very often the problem that
the search data is full of acronyms and abbreviations like:
cable,nym-j,pvc,3x2.5mm²
or
dvd-/cd-/usb-carradio,4x50W,divx,bl
We solved this by a combination of normalization for better data
The quick answer is that the session is probably the wrong place to keep
an IndexReader, since that's per-user. I'd define a new server/servlet that
did my searching and have my webapps use that. Makes it really simple
to re-use index readers.
And reopening the IndexReader for each request will p
We discovered very soon after going to production that Lucene's scores were
often 'too precise'. For example, a page of 25 results may have several
different score values, and all within 15% of each other, but to the end
user all 25 results were equally relevant. Thus we wanted the secondary sort
f
Hello,
Lucene core doesn't seems to use relative word positioning (?) for scoring.
For example, indexing that phrase "a b c d e f g h i j k l m n o p q r
s t u v w x y z", these queries give the same results (0.19308087) :
- 1 : phrase:'e f g'
- 2 : phrase:'o k z'
I'm a bit familiar with lucen
Grant,
We are currently working on a relevancy improvement project. We took the IBM's
paper from 2007 TREC and followed the approaches they described to improve
Lucene's relevance. It also gave us some idea of Lucene’s out-of-the-box
precision performance (MAP). In addition to it we used som
On Mon, May 3, 2010 at 15:11, Adriano Crestani
wrote:
> I actually never liked how QueryNode -> query string is done today, using
> QueryNode.toQueryString(...) method. A QueryNode shouldn't be responsible
> for converting itself back to the string format, because different
> SyntaxParser(s) may c
11 matches
Mail list logo