Hi,
Karl thanks for the reply. But I am not able follow you. Should I extend Query
class and How should I get matching term. Can you please elaborate on it.
Regards,
Allahbakhs
-Original Message-
From: Karl Wettin [mailto:[EMAIL PROTECTED]
Sent: Monday, February 11, 2008 9:53 PM
To: ja
Excellent. MemoryIndex solves the problem. I didn't knew about this
index. Thanks.
-Nilesh
On Feb 8, 2008 8:23 AM, Erick Erickson <[EMAIL PROTECTED]> wrote:
> You might want to check out MemoryIndex before rejecting putting a single
> doc in memory and searching against it. It's quite fast, altho
: I read the doc for the api indexreader.setNorm() after I posted the question
: earlier. To use that setNorm() to modify the field boost, it seems to me that
: one has to know how the boost is folded to the norm (in the default impl, it's
: boost* lengthNorm) and has to know the old norm value wh
thanks, Hoss!
I read the doc for the api indexreader.setNorm() after I posted the
question earlier. To use that setNorm() to modify the field boost, it
seems to me that one has to know how the boost is folded to the norm (in
the default impl, it's boost* lengthNorm) and has to know the old norm
: It's clear that there is no easy way to do "in-place" doc update in the lucene
: index, but I think it should be theoretically possible to update the field and
: doc boostings in place, that is, without deleting and re-adding the doc and
: it's fields. Does anyone know how?
boosts are folded in
The way I've always done this was to index two fields: say, "contents"
and "contents_unstemmed", (using a PerFieldAnalyzer) and then query
on both of them. This has the double effect of a) boosting unstemmed
hits, because every unstemmed match is also a stemmed one, so the
BooleanQuery combining
Well, it is done now.
As final result, I surrended myself to "double-storing". This way, I have
indexed the original text with COMPRESSED option to save some space.
And to highlight the results correctly, I made some matching between
unaccented-words and original words by regular expressions, an
Heads up!
We are working through what looks like an index corruption issue when
you use autoCommit=false with IndexWriter, in Lucene 2.3, so please
try to avoid doing so if you can...
Details are here:
https://issues.apache.org/jira/browse/LUCENE-1173
Mike
--
Hi,
It's clear that there is no easy way to do "in-place" doc update in the
lucene index, but I think it should be theoretically possible to update
the field and doc boostings in place, that is, without deleting and
re-adding the doc and it's fields. Does anyone know how?
Thanks!
Jay
-
Cool man.
The Hits.id(int) worked fine. Thanks for the detailed info.
And hopefully your answer is going to usefull for future Google searches. ;)
Cesar
Steven A Rowe wrote:
>
> Hi Cesar,
>
> On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote:
>> I'm running problems with document deletion.
Basically the index is big is because there is a large number of
documents, but each individual document is very small. There is also a
lot of redundancy, which, I believe is also why the index size is fairly
small.
Basically I am using the index to store the user's profile information,
and
On Feb 11, 2008, at 4:00 PM, Cesar Ronchese wrote:
For example:
Indexed word: usuário
Terms typed by the user, to find the word above: usuário or usuario or
usuãrio, etc.
If you feel ambitious, you can try something along the lines of Sean
M. Burke's Unidecode!:
http://interglacial.com/~s
Hi Cesar,
On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote:
> I'm running problems with document deletion.
> [...]
> This simply doesn't delete anything from the Index.
>
> //see the code sample:
> //"theFieldName" was previously stored as Field.Store.YES and
> Field.Index.TOKENIZED.
> Term t =
Ops!
Found a situation here Karl:
If the content is stored without accents, everything is OK.
But, as my content is stored with accents, and I noticed the ISOFilter just
removes the accent from the search terms, it is not returning to my Hits
collection.
Any idea how to fix it?
--
View this mes
Woot, Karl.
It worked like a charm! It even worked with the Highlighter. THANKS!
karl wettin-3 wrote:
>
>
> 11 feb 2008 kl. 18.16 skrev Cesar Ronchese:
>
>> I don't know how to set that filter to Query object.
>
> It is a TokenStream you filter, not the Query. In your case the
> TokenStr
Hey All.
I'm running problems with document deletion. I tried to use
DeleteDocuments() and DeleteDocument() methods, both are with problems,
according explained below:
1) DeleteDocuments(term)
This simply doesn't delete anything from the Index.
//see the code sample:
//"theFieldName" was pre
Ah, very cool. Thanks for the tip.
-M
On Feb 11, 2008 10:58 AM, Erick Erickson <[EMAIL PROTECTED]> wrote:
> You have to bet a bit clever. You can certainly inject the original with
> an
> increment of 0. See SynonymAnalyzer in Lucene In Action. This will not
> break phrase queries since your two
You have to bet a bit clever. You can certainly inject the original with an
increment of 0. See SynonymAnalyzer in Lucene In Action. This will not
break phrase queries since your two tokens occupy the same position.
But you'll have to do something like add a $ to the original at index time.
That w
11 feb 2008 kl. 18.16 skrev Cesar Ronchese:
I don't know how to set that filter to Query object.
It is a TokenStream you filter, not the Query. In your case the
TokenStream is produced by the QueryParser invoking
analyzer.tokenStream(field, new StringReader(input)). So what you have
to
See below...
On Feb 11, 2008 12:17 PM, Cesar Ronchese <[EMAIL PROTECTED]> wrote:
>
> Hey, Erick. You inferred right.
>
> I analized your code and it looks like a common Indexing and Searching
> code.
> Are you sure you pasted the correct code? :P
>
Did you try to run it? It's just a self-contai
Hi all,
I've got an index with tokens that are stemmed. Sometimes I really need to
boost the unstemmed
version of a query word to get the most relevant documents.
Example:
Query: [olives].
I don't want to match documents with the words: oliver, oliver's, etc...
Since I'm stemming when creating t
Hey, Erick. You inferred right.
I analized your code and it looks like a common Indexing and Searching code.
Are you sure you pasted the correct code? :P
Anyways, is the concept about doubling storing data, one content with
accents and other without? If yes, I did it earlier, but once I search i
> One more thing,
> are you aware of that you are supposed to apply that filter on the
> query too?
I don't know how to set that filter to Query object. I've searched to see if
it is possible, but I can't find references. If it is possible, do you have
a quick example?
I'm searching this way:
Right now there is not a good way, other than to use the
TermPositions. See https://issues.apache.org/jira/browse/LUCENE-1001
for some thoughts on adding the ability. Unfortunately, I ran into a
roadblock, and haven't been able to get back to it. If you feel you
can submit a patch, it w
Cedric Ho wrote:
On Feb 9, 2008 12:07 AM, Ruslan Sivak <[EMAIL PROTECTED]> wrote:
The app does other things then search the index. I'm basically using
ColdFusion for the website and have four instances running on two
servers for load balancing. Each app does the searches, and the search
tim
You would have to collect the payloads from matching terms by
extending a query.
See this recent thread:
http://www.nabble.com/Faceting-with-payloads-td15322956.html#a15322956
Are you sure this is what you want to do? What is it you store in the
payloads, and how do you plan to use this info
Hey Karl. Thanks for the response. I have some doubts more:
1) About the ISOLatin1AccentFilter class:
> What is the problem you have with this? Are they not unique enough?
I need to store the words in the way it was written. So, if the text to be
indexed contains the word "usuário", my user expe
I'm inferring that you need the original text for display purposes or some
such,
but want to search a "canonical" form. So the following may be totally
irrelevant if my inference is wrong.
Indexed and stored are two very distinct things in Lucene. If you create
a field that is both stored and
Hi,
Thanks for the reply. But is there any way that from the search result I can
get Payload.
See my requirement is when user search for some field I want to display also
additional data which is stored as Payload.
Regards,
Allahbaksh
-Original Message-
From: Karl Wettin [mailto:[E
Has anyone contributed an IndexDeletionPolicy that has been tested on an
NFS system?
Bob Hastings
Ancept Inc.
11 feb 2008 kl. 14.46 skrev Allahbaksh Mohammedali Asadullah:
d.add(new Field("f1", "This field has no payloads", Field.Store.NO,
Field.Index.TOKENIZED));
d.add(new Field("f2", "This field has payloads in all docs",
Field.Store.YES, Field.Index.TOKENIZED));
Document doc = hits.doc(i);
He
11 feb 2008 kl. 16.08 skrev Karl Wettin:
All I could find is about the ISOLatin1AccentFilter class, which as
far I
could understand, it just removes the accented chars so I can store
it in
its unaccented form.
What is the problem you have with this? Are they not unique enough?
One more
11 feb 2008 kl. 16.00 skrev Cesar Ronchese:
Hello, guys.
I've searching the google to make the lucene performs accent-
insensitive
searches.
All I could find is about the ISOLatin1AccentFilter class, which as
far I
could understand, it just removes the accented chars so I can store
it
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/document/Field.Index.html#NO_NORMS
?
11 feb 2008 kl. 15.55 skrev <[EMAIL PROTECTED]>:
Hi Grant,
Lucene 2.2.0
I'm not actually explicitely storing term vectors. It seems the huge
amount of byte arrays is actually coming from SegmentR
Hello, guys.
I've searching the google to make the lucene performs accent-insensitive
searches.
All I could find is about the ISOLatin1AccentFilter class, which as far I
could understand, it just removes the accented chars so I can store it in
its unaccented form.
What I would like to know is,
Hi Grant,
Lucene 2.2.0
I'm not actually explicitely storing term vectors. It seems the huge
amount of byte arrays is actually coming from SegmentReader.norms. Maybe
that cache constantly grows as I read somewhere that it's on-demand. I'm
not using any field or document boosting..is there some way
No, it's split into about 100 individual indexes. But I'm running my
64-bit JVM with around 10gb max memory in order to avoid running out of
memory after running all my unit tests (I have some other indexes as
well running as part of this application).
Upon further investigation, it seems to have
Hi,
I have saved payload in my index. When the user types the query I get HIT
document. From HIT document how I can get the value of Payload for particular
tree.
For example
_analyzer = new PayloadAnalyzer();
_writer = new IndexWriter(new File("d:/test1"), _analyz
Solr has a strategy using rsync that makes it relatively easy to copy
an index around to other servers. It uses rsync to just copy the
diffs, so you could easily mirror this in your application.
There is no SQL backend for Lucene, but at 4mb you could certainly
serialize it as a blob to a
Hi Marc,
Can you give more info about what your field properties are? Your
subject line implies you are storing term vectors, is that the case?
Also, what version of Lucene are you using?
Cheers,
Grant
On Feb 8, 2008, at 10:51 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]
> wrote:
Hi,
40 matches
Mail list logo