30 dec 2008 kl. 17.13 skrev Lebiram:
Hi Lebiram,
contrib/misc contains a couple of tools that might be of help.
Just wanted to reconstruct a new index based on an existing
index(but turning off norms) that's all.
If you want to create an identical index but without norms use
FieldNormModi
Tom:
Have a look at ASCIIFoldingFilter.
o...@lesina:~/workspace/asf-lucene$ svn log
./src/java/org/apache/lucene/analysis/ASCIIFoldingFilter.java
r724053 | markrmiller | 2008-12-06 18:25:42 -0500 (Sat, 06 Dec 2008) | 1 line
Hi All,
Thanks for the reply.
Just wanted to reconstruct a new index based on an existing index(but turning
off norms) that's all.
However, as it is nearly impossible to extract the terms of unstored fields,
we might think of other ways.
Thanks for the inputs guys!
__
Actually, you can reconstruct the text, but it's a lossy process. Stop words
aren't in the index for instance. And it's very time-consuming. Luke makes
a "best guess" at this process, so you might want to take a look at that
code. But even the very bright folks who put Luke together caution that
it
You might want to take a look at using the ISOLatinAccentFilter or similar
at
both index and query time. It basically folds accented characters into their
un-accented form.
Matthew:
You wrote:
<<>>
I also did this before realizing that the second field is unnecessary.
Storing is
orthogonal to in
That is my understanding of it too. Terms in the index will point to the
position of the tokens they map to. Since one index term can point at any
number of tokens, this isn't a sequence map, but just a search map. If you
still have the text that was indexed you could run it through an analyzer
Just thought I'd comment since I had to do word processing before indexing
in my application as well. Matt's method is pretty similar to what I did.
I wrote a filter that transforms the tokens as they get indexed (and also
use that for searching). Since I am indexing a block of words, rather than
If you are constrained in such a way as to not use the French Analyzer
you might instead consider transforming the input as an additional step
at both search/indexing time.
Use something like a regex that looks for é and always replaces it with
e in the index, and at search time. (expand this
Dear all,
I'd like my lucene searches to be insensitive to (French) accents. For example,
considering a indexed term "métal", I want to get it when searching for "metal"
or "métal" . I use lucene-2.3.2 and the searches are performed with:
IndexSearcher.search(query,filter,sorter), Another filte
I am not sure but from my understanding fields that are only indexed and not
stored do not keep position. So even if you get back all terms for a field
for a given document you won't be able to reconstruct original words
sequence.
And remember that not all words are indexed.
Alex
2008/12/30 Lebi
OK I think I see what's going on here... I'll open an issue & fix it.
Thanks Shalin!
Mike
Shalin Shekhar Mangar wrote:
> Hello,
>
> Solr uses IndexCommit#getFileNames() to get a list of files for
> replication.
> One windows user reported an exception which looks like it may have been
> caused
Op Tuesday 30 December 2008 10:03:03 schreef Claudia Santos:
> Hello,
>
> I would like to know more about Lucene's retrieval model, more
> specifically about the boolean model.
> Is that a standard model or an extended model? I mean, it returns
> just documents that match the boolean expression or
Hi All,
Is it possible to extract the text that was indexed but not stored for a field
in a document?
Right now, reader.document() returns only fields that was stored. However I'd
also want to get the text on the indexed only field...
I'd appreciate your help
JDBM is surely a better way than in memory hash map.
But I feel since all previous documents are already in the index, although
not closed yet, there should be a way to read all previous terms.
It's ok to use additional data structure, like JDBM or hash map, to
duplicate the terms, in order to look
Hello,
I would like to know more about Lucene's retrieval model, more specifically
about the boolean model.
Is that a standard model or an extended model? I mean, it returns just
documents that
match the boolean expression or include in the search result all Documents
which correspond to the gi
15 matches
Mail list logo