Thanks for responding Jonathan. I will look into k-grams approach.
The objects could differ by small local changes. To provide some business
context, the application requires indexing email messages and attachments. If
the attachments differ by some threshold (users making edits/reviews), the
hi I need to compare two Base64 representation strings of some MIME content
that I am storing within a Lucene index. I need to efficiently compare them to
find the closest match to a query Base64 string , post Lucene query.
I am not sure of the best way to approach this, could I compare the hash
: The application which uses the index expects this in same field. So, can't use
: two fields.
be carefully about termiology here ... there are
"org.apache.lucene.document.Field" objects, and then there are "fields" or
"field names"
you can index a Document containing multiple "Field" objects
Its hard to predict the future of LUCENE-831. I would bet that it will
end up in Lucene at some point in one form or another, but its hard to
say if that form will be whats in the available patches (I'm a contrib
committer so I won't have any real say in that, so take that prediction
with a gra
H, I don't understand payloads, but it seems to me that it *might*
apply. Search the mail list for "payload" and/or look at the docs. Payloads
were added after the last time I had to really dig into Lucene.
But from what I've seen going by on the thread, it may be what you need.
But then I cou
I have the same problem with cache and too many sorted fields, and had to
implement a big workaround to be able to plug my own cache implementation in
lucene 2.3.2. What I'd really like to see in the new cache implementation is
easier pluggability and extension of the lucene classes, which is curre
On Fri, Nov 14, 2008 at 12:05 PM, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> My problem with Phrase Query is that it requires
> existence of all the terms in documents. I want them more
> permissible. I want it to match with lower score.
> Does dismax also requires all the terms?
The mandato
Hi,
I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
API/Implementation) which I have interest in.
I posted previously on this with my concern that given the current default
cache I sometimes get OOM-errors because I have a lot of fields which are
sorted on, which ultimate
Thanks Erick!
The application which uses the index expects this in same field. So,
can't use two fields.
Any ways, Thank you guys for quick your responses!
thanks
ravi
On 14-Nov-08, at 6:38 PM, Erick Erickson wrote:
As far as I know you can't do this with just one field. Why do you
care?
Yonik,
Thank you for your reply.
My problem with Phrase Query is that it requires
existence of all the terms in documents. I want them more
permissible. I want it to match with lower score.
Does dismax also requires all the terms?
> Solr's dismax parser can generate queries that do most of
> t
Hi All,
Based on your valuable inputs, we tried a few experiments with number of
threads. The observation is, if the number of threads are one less than
the number of cores (we have 'main' as a separate thread. Essentially,
including 'main' number of threads equal to number of cores), the
indexi
Hi Mayur,
Solr has built-in support for facets. I don't understand what you mean by
scoped searches. Could you please give a concrete example?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: "Bapat, Mayur" <[EMAIL PROTECTED]>
To: ja
Solr's dismax parser can generate queries that do most of this... it's
a combination of term queries and sloppy phrase queries.
Simplest example:
+(DEF GHI) "DEF GHI"~10^5
The only thing that it doesn't work for is the terms out of order
(they will still be matched). You could use span queries i
PhraseQuery requires all the terms in the phrase
exists in the field being searched. I am looking
for a more permissible version of PhraseQuery which
is sensitive to the order of the terms but
allows missing terms, which would lower the score
but still matches.
For example, query "DEF GHI" would
I think to do this efficiently you'd need to modify Lucene's builtin
query classes (eg TermQuery) such that during the scoring process, in
addition to simply computing its contribution to the document's score,
it would also record further information like total number of
occurrences of ea
As far as I know you can't do this with just one field. Why do you
care? Storing two fields, one indexed but not stored and one stored
but not indexed shouldn't use very many resources.
Best
Erick
On Fri, Nov 14, 2008 at 3:06 AM, Ravi L <[EMAIL PROTECTED]> wrote:
> Thanks Anshum!
>
> This can be
>>BTW, if you have a small test index with multiple commit points could you
>>please send it to me off the list?
See the "setup" method in the junit test "TestTransactionRollbackCapability2"
attached here: https://issues.apache.org/jira/browse/LUCENE-1449
Cheers,
Mark
- Original Message
mark harwood wrote:
Hi Andrzej,
Thanks for the update. Looks like you've been busy adding some great
new features!
I think you may have a bug in opening an index with prior commit
points, though. I want to keep these in my index and so I opened it
in Luke selecting the "open read only" and "kee
Hi Andrzej,
Thanks for the update. Looks like you've been busy adding some great new
features!
I think you may have a bug in opening an index with prior commit points,
though. I want to keep these in my index and so I opened it in Luke selecting
the "open read only" and "keep all commit points
Thanks Anshum!
This can be possible. But, I am searching for is to do this with only
one field.
thanks
ravi
On 14-Nov-08, at 1:32 PM, Anshum wrote:
Hi Ravi,
In that case, you could have 2 fields. One of them would be indexed
(i.e.
"foo bar") and you could use the other only to store as p
Hi,
Does Lucene support Scoped Searches? My intention is to index an XML
String and search for a matching element/attribute value from that XML
by specifying scope(path).
Also is there any direct support for Facets building in Lucene?
Regards,
Mayur
-
Hi Ravi,
In that case, you could have 2 fields. One of them would be indexed (i.e.
"foo bar") and you could use the other only to store as per your logic.
Hope this solves your purpose.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opin
22 matches
Mail list logo