Thanks for that info. These indexes will be large, in the 10s of millions.
id field is unique and is 29 bytes. I guess that's still a lot of data to
trawl through to get to the term.
Have you tested how long it takes to look up docs from your id?
Not in indexes that size in a live environme
What is SimilarityQueries? I'd try the explain capabilities to see
more.
On May 5, 2009, at 2:23 PM, Kamal Najib wrote:
hi all,
i got the similarity score 0.3044460713863373 between two docs which
have the same text content, is it correct? I expected 1.0, hier is
my result line:
doc:"
On Tue, May 5, 2009 at 7:24 PM, Antony Bowesman wrote:
> Michael McCandless wrote:
>>
>> Lucene doesn't provide any way to do this, except opening a reader.
>>
>> Opening a reader is not "that" expensive if you use it for this
>> purpose. EG neither norms nor FieldCache will be loaded if you just
Michael McCandless wrote:
Lucene doesn't provide any way to do this, except opening a reader.
Opening a reader is not "that" expensive if you use it for this
purpose. EG neither norms nor FieldCache will be loaded if you just
enumerate the term docs.
Thanks for that info. These indexes will
hi all,
i got the similarity score 0.3044460713863373 between two docs which have the
same text content, is it correct? I expected 1.0, hier is my result line:
doc:"this expression of galectin-1 in blood vessel walls was correlated with
vascular"
doc2 :"this expression of galectin-1 in blood v
Even i have a similar requirement. I need the percentage match.
The way I am going about it is doing 2 searches
eg if my search string is "pizza cheese " and my document has " pizza
cheese ketchup"
percentage match = ( score of searching "pizza cheese" in " pizza
cheese ketchup") / ( score of
Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm
http://www.meetup.com/NOVA-Lucene-Solr-Meetup/
Join us for an evening of presentations and discussion on
Lucene/Solr, the Apache Open Source Search Engine/Platform, featuring:
Erik Hatcher, Lucid Imagination, Apache Lucene/Solr PMC: Solr power
There isn't a very clean way to do this just yet, but it is doable.
Index with positions (you might find offsets useful too) and then use
the TermVectorMapper and TermVector API call on the IndexReader (not
the termPositions). Then, you will need to implement a
TermVectorMapper that takes
Hello,
I would like to ask if anyone has tried running Lucene 1.2 with JDK 1.5?
So far I could not find any documentation stating incompatibility and/or
known bugs.
Can anyone chime on similar experience? (Running older Lucene library with
newer JDK)
Thanks,
Dave
Hi,
I know that a new encoding technique, PFOR, is being implemented in the
Lucene project [1]. Have you heard about the "Group Varint" encoding
technique from Google ? There is a technical explanation in the talk of
Jeffrey Dean, "Challenges in Building Large-Scale Information Retrieval
Syst
Maybe joseph means 'percentage of the theoretical maximum score' for the
query?
See this thread:
http://www.gossamer-threads.com/lists/lucene/java-user/61075?search_string=theoretical%20maximum%20score;#61075
Peter
On Tue, May 5, 2009 at 8:36 AM, Erick Erickson wrote:
> But to echo Chris, what
But to echo Chris, what does percentage mean? The percent
of the words that matched? So, in your example,
would document one match 75%, doc two 50% and
doc three 100%? And what would that mean to a user?
I think it would help if you backed up and told us *why*
you want these percentages. A higher
They should be very nearly the same. Under the hood, when you call
updateDocument, IndexWriter buffers up the deleted terms, and flushes
them periodically.
Mike
On Tue, May 5, 2009 at 7:42 AM, Antony Bowesman wrote:
> Just wondered which was more efficient under the hood
>
> for (int i = 0; i
Lucene doesn't provide any way to do this, except opening a reader.
Opening a reader is not "that" expensive if you use it for this
purpose. EG neither norms nor FieldCache will be loaded if you just
enumerate the term docs.
But, you can let Lucene do the same thing for you by just always using
Just wondered which was more efficient under the hood
for (int i = 0; i < size; i++)
terms[i] = new Term("id", doc_key[i]);
This
writer.deleteDocuments(terms);
for (int i = 0; i < size; i++)
writer.addDocument(doc[i]);
Or this
for (int i = 0; i < size; i++)
writer.updateDoc
It was in contrib, thank you!
Ian Lea wrote:
>
> You can find them in the source tarball. And maybe elsewhere
> (contrib?) but I'm not sure about that.
>
>
> --
> Ian.
>
>
> On Tue, May 5, 2009 at 9:40 AM, Huntsman84 wrote:
>>
>> Hi,
>>
>> Does anybody know why at lucene API documentati
I'm adding Documents in batches to an index with IndexWriter. In certain
circumstances, I do not want to add the Document if it already exists, where
existence is determined by field id=myId.
Is there any way to do this with IndexWriter or do I have to open a reader and
look for the term id:X
joseph.christopher wrote:
>
>
> thanks for the reply,
>
> By percentage, what I meant is that how much matching is the retrived
> result with the search query.
>
> for exmple : if I have 3 indexed documnts like
>
> 1) chicken onion cheese pizza
>
> 2) mixed vegetable cheese pizza
>
> 3
Hi Bradford,
Your mail reminds me of something I recently came across:
http://svn.apache.org/repos/asf/labs/clouds/apache_cloud_computing_edition.pdf
Perhaps if you have slides accompanying your talk, you may
consider to make them publicly available. I for one would
love to see them.
Best rega
Please elaborate...
Here's a code snippet, as you can see I'm not trying to remove or
requesting to remove anything.
//Perform indexing
for (Class entityType : entityTypes){
//read the data from the database
//Scrollable res
I guess you are trying to remove or requesting to remove null referenced
object.
Manish B. Joshi
(Adserving Team)
On Tue, May 5, 2009 at 1:58 PM, Enrico Goosen
wrote:
> Hi,
>
>
>
> I’m new to Lucene, and I’m getting an exception while trying to do a manual
> indexing operation on one of my ent
These query types are in the contrib package lucene-regex.jar.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Huntsman84 [mailto:tpgarci...@gmail.com]
> Sent: Tuesday, May 05, 2009 10:41 AM
> To: java-u
> Lucene needs to be able to ask a RAF opened for writing what it's
> current "position" is during indexing, which it then stores away, and
> later during searching it needs to ask a RAF opened for reading to
> seek back to that position so it can read bytes from there. Would the
> encryption APIs
You can find them in the source tarball. And maybe elsewhere
(contrib?) but I'm not sure about that.
--
Ian.
On Tue, May 5, 2009 at 9:40 AM, Huntsman84 wrote:
>
> Hi,
>
> Does anybody know why at lucene API documentation you can find the package
> regex and its classes (RegexQuery, RegexTermE
Would you encrypt at the file level? Ie, the encryption would live
"under" a RandomAccessFile (RAF) and otherwise feel "normal" to
Lucene?
(I think I remember others exploring encryption at the individual term
level, which is interesting but does leak information in that you can
see individual te
Hi,
Does anybody know why at lucene API documentation you can find the package
regex and its classes (RegexQuery, RegexTermEnum, SpanRegexQuery...), but
they don't exist in the jar (so you can't use them)?
Can I find them somewhere?
Thank you so much!
--
View this message in context:
http://w
Hi,
I'm new to Lucene, and I'm getting an exception while trying to do a
manual indexing operation on one of my entities.
It works fine for the Product entity, but fails for the ProductInfo
entity (see attached).
Versions:
hibernate-search 3.0.1.GA
Lucene 2.3
10:26:57,167 ERROR [Indexer
If you store such sensitive data that you think about index encription.
then I may suggest simply isolate the host with lucene index:
- ssh only, VERY limited set of users to login
- provide a solr over https to search the index (avoid in-tranzit interception)
- setup firewall rules
This way Lu
28 matches
Mail list logo