Hi Otis,
Won't that delete all documents with term1, then all
documents with
term2...rather than deleting all documents that
contain only term1 and
term2...or am I missing the obvious and doing
something wrong?
Thanks,
Josh
Otis Gospodnetic wrote:
> Heh, I have to try the obvious - two
reader.
Otis,
What about the internal of Lucene? Are there any major changes in there?
LIA is such a great book. Any date when LIA2 is coming? I definitely must
get it :)
On 9/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi,
I think you'll find most of the book to still be useful (but then ag
Hi,
I think you'll find most of the book to still be useful (but then again, I'm
the co-author, so maybe I'm not 100% objective). One thing where the API
changed is Fields. They are now constructed differently, so the code in the
book won't match the current API.
We have LIA code working unde
The code works with Lucene 2.0, I've used it. However, it did change slightly.
If you look in JIRA you'll find some comments about it. If I recall
correctly, some changes I made to LuceneDictionary(?) class now require the
index directory to existI think.
Otis
- Original Message ---
Heh, I have to try the obvious - two reader.delete(term) calls?
Otis
- Original Message
From: Josh Joy <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, September 26, 2006 10:04:13 PM
Subject: Multiple Terms, Delete From Index
Hi All,
I need to delete from the index wh
Thanks a lot Chris for the detailed & patitent response.
The value of a the field norm for any field named "A" is typically the
lengthNorm of the field, times the document boost, times the field boost
for *each* Field instance added to the document with the name "A".
(lengthNorm is by default
Hi All,
I need to delete from the index where 2 terms are
matching, rather than
just one term.
For example,
IndexReader reader = IndexReader.open(dir);
Term[] terms = new Term[2];
terms[0] = new Term("city","city1");
terms[1] = new Term("state","state1");
reader.delete(terms);
reader.close();
A
I've added a FAQ that may help you with this, "How do i get code written
for Lucene 1.4.x to work with Lucene 2.x?"
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-d09fdfc8a6335eab4e3f3dc8ac41a40a3666318e
: Date: Tue, 26 Sep 2006 20:56:57 -
: From: Chris Salem <[EMAIL PROTECTED]>
: R
: > IndexSearcher[] searchers;
: > searchers=new IndexSearcher[3];
: > String path="/home/sn/public_html/";
: > searchers[0]=new IndexSearcher(path+"index1");
: > searchers[1]=new IndexSearcher(path+"index2");
: > searchers[2]=new IndexSearcher(path+"
Vlad,
Please check published papers on sampling inverted indexes and
multi-level caching - this is most probably what Google and other major
search engines use.
You can see a simple implementation of this principle in Nutch - the
index is sorted in decreasing order by a PageRank-like score (
Thanks, Mark, that clears things up a bit. No need to appologise - I am
quite a novice with Lucene.
To explain my concern a bit, assume that your inverted index is queried
with 'or' query for the most 'common' terms (ie, after excluding such
denominators as 'a', 'the', etc). Let's say, you have fo
Glad I could help. I don't read a word of German, but even I could see the
227 milliseconds at the bottom .
Glad things are working for you.
Erick
On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote:
Hi Erick,
the problem was this piece of code I don't need anymore.
for(int i=0;ihttp://www.suchste.
>>- get the top 1000 results WITHOUT executing query across whole data set
(Apologies if this is telling something you are already fully aware of )
- Counting matches doesn't involve scanning the text of all the docs so
may be less expensive than you think for a single index. It very quickly
l
Paul's Matcher in Jira will almost enable this, indirectly but possible
- Original Message
From: karl wettin <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 26 September, 2006 11:30:24 PM
Subject: Re: Re[2]: how to enhance speed of sorted search
On 9/26/06, Chris Hoste
Hi.
I couldn't find the answer to this question in the mailing list archive.
In case I missed it, please let me know the keyword phrase I should be
looking for, if not a direct link.
All the 'Lucene' powered implementations I saw (well, primarily those
utilizing Solr) return exact count of the
On 9/26/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
if you are seeing "slow" performance from sorted searches,
the time spent "scoring" the results isn't the biggest contributor to how
long thesearch takes -- it tends to be negligable for most queries.
I've many times wished for a visiting
Hi.
I have a question regarding Lucene scoring algorithm. Providing I have a
query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
and doc2 "d e", will doc1 score higher than doc2? In other words, does
Lucene takes into account the number of terms matched in the document in
case o
Hi Erick,
the problem was this piece of code I don't need anymore.
for(int i=0;iNow it is very fast, thank you very much for your email that is written
in detail.
Here is my application, that still is in development phase.
http://www.suchste.de
Greetings Gaston
P.S. The search for 'web' del
Does anyone have sample code on how to build a dictionary?
I found this article online and but it uses version 1.4.3 and it doesn't seem
to work on 2.0.0:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1
Here's the code I have:
indexReader = IndexReader.open(originalIndexDi
Does anyone have sample code on how to build a dictionary?
I found this article online and but it uses version 1.4.3 and it doesn't seem
to work on 2.0.0:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1
Here's the code I have:
indexReader = IndexReader.open(originalIndexDi
See below.
On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote:
hi,
first thank you for the fast reply.
I use MultiSearcher that opens 3 indexes, so this makes the whole
operation surly slower, but 20seconds for 5260 results out of an 212MB
index is much too slow.
Another reason can of course be m
Hi,
I have bought the Lucene In Action Book for more than a year now, and was
using Lucene 1.x during that time. Now, I have a new project with Lucene and
Lucene is now 2.0. Many APIs seems to have changed.
I would like to ask the experts here, what are the important or substantial
changes from
hi,
first thank you for the fast reply.
I use MultiSearcher that opens 3 indexes, so this makes the whole
operation surly slower, but 20seconds for 5260 results out of an 212MB
index is much too slow.
Another reason can of course be my ISP.
Here is my code:
IndexSearcher[] searcher
: The symptom:
: Very high fieldNorm for field A.(explain output pasted below) The boost i am
: applying to the troublesome field is 3.5 & the max boost applied per doc is
: 1.8
: Given that information, the very high fieldNorm is very surprising to me.
: Based on what I read, FieldNorm = 1 / s
: I am thinking should be this faster
The ConstantScoreQuery wrapped arround the QueryFilter might in fact be
faster then the raw query -- have your tried it to see?
you might be able to shave a little bit of speed off by accessing the bits
from the Filter directly and iterating over them yourse
Well, my index is over 1.4G, and others are reporting very large indexes in
the 10s of gigabytes. So I suspect your index size isn't the issue. I'd be
very, very, very surprised if it was.
Three things spring immediately to mind.
First, opening an IndexSearcher is a slow operation. Are you openi
: Since the overhead in first is the speed of the system, i think adopting
: second method will be better.
:
: Is there any other solution for this problem?? Am i going in right
: direction??
you're definitely on teh right path -- those are the two bigsolutions i
can think of, which appraoch you
Either you grap the next best svn client and check out the branch of
2.0 or you just download
the source dist from a mirror. use this one
http://mirrorspace.org/apache/lucene/java/
best regards simon
On 9/26/06, djd0383 <[EMAIL PROTECTED]> wrote:
I there a link to a zip file where I can ge
Hi,
Lucene has itself volatile caching mechanism provided by a weak
HashMap. Is there a possibilty to serialize the Hits Object? I think of
a HashMap that for each found result, caches the first 100 results. Is
it possible to implement such a feature or is there such an extension?
My problem
On Sep 26, 2006, at 8:50 AM, Bhavin Pandya wrote:
Hi,
Do anybody have idea for spell checker in java.
I want to use with lucene...but which must work well for phrases
also...
-Bhavin pandya
When I googled "java spell check open source" I found
http://jazzy.sourceforge.net/
I have looked at
I there a link to a zip file where I can get the entire package of source
files (version 2, please). I know I am able to view them in the Source
Repository (http://svn.apache.org/viewvc/lucene/java/trunk/), but I do not
really feel like going through each of those to download them all. I am
look
On Tuesday 26 September 2006 17:57, Virlouvet Olivier wrote:
> Hi
>
> In javadoc IndexReader.termPositions() maps to the definition :
> Term=> >*
>where returned enumeration is ordered by doc number.
>
> Are positions ordered for each doc or not ?
The po
On 9/26/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote:
Hi,
Do anybody have idea for spell checker in java.
I want to use with lucene...but which must work well for phrases also...
You are welcome to try this:
https://issues.apache.org/jira/browse/LUCENE-626
it is good with phrases, is trained b
Thanks again Erick for taking the time.
I agree that the CachingWrapperFilter as described
under "using a custom filter" in LIA is probably my
best bet. I wanted to check if anything had been added
in Lucene releases since the book was written I wasn't
aware of.
Cheers again.
--- Erick Erickson
I'm running this on Windows 2003 server (NTFS). The Java VM version is
1.5.0_06. This exception is not consistent, but it is not intermittent
either. It does not throw it at any particular point while rebuilding
the index, but it will throw this exception at some point (it could be
1/3 way throu
Hi
In javadoc IndexReader.termPositions() maps to the definition :
Term=> >*
where returned enumeration is ordered by doc number.
Are positions ordered for each doc or not ?
Thanks
Olivier
-
Yahoo!
Hi Sharad,
I've done this on Simpy.com. You can see it in action if you create some
Watchlists and Watchlists Filters in Simpy.
The neighbourhood is pulled from the DB, and the search uses a MultiSearcher to
search that neighbourhood.
Otis
- Original Message
From: Sharad Agarwal <[EM
You could do it with a custom HitCollector, no?
Otis
- Original Message
From: Bhavin Pandya <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, September 26, 2006 8:43:56 AM
Subject: How to remove duplicate records from result
Hi,
I searched the index and i found say 100
Look at LingPipe from Alias-i.com. Look at Named Entity extraction and its
classifiers.
Otis
- Original Message
From: Vladimir Olenin <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, September 25, 2006 9:49:31 PM
Subject: does anyone know of a 'smart' categorizing tex
Van Nguyen wrote:
I only get this error when using the server version of jvm.dll with my
JBoss app server… but when I use the client version of jvm.dll, the same
index builds just fine.
This is an odd error. Which OS are you running on? And, what kind of
filesystem is the index directory
Lucene-based one is described on the Wiki. Another one is the one from
LingPipe. It may not be free, depending on what you do with it.
Otis
- Original Message
From: Bhavin Pandya <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, September 26, 2006 8:50:14 AM
Subject:
Hi,
While i was searching forum for my problem for searching a substring, i got
few very good links.
http://www.gossamer-threads.com/lists/lucene/java-user/39753?search_string=Bitset%20filter;#39753
http://www.gossamer-threads.com/lists/lucene/java-user/7813?search_string=substring;#7813
http://w
Hi,
Do anybody have idea for spell checker in java.
I want to use with lucene...but which must work well for phrases also...
-Bhavin pandya
Hi,
I searched the index and i found say 1000 records but out of that 1000 records
i want to filter duplicate records based on value of one field.
is there any way except looping through whole Hit object ?
Because it wont work when number of hit is too large...
Thanks.
Bhavin pandya
Thx a lot!
2006/9/26, markharw00d <[EMAIL PROTECTED]>:
Pass a field name to the QueryScorer constructor.
See "testFieldSpecificHighlighting" method in the Junit test for the
highlighter for an example.
Cheers
Mark
zhu jiang wrote:
> Hi all,
>
>For example, if I have a document with two f
I guess there are many possibilities to implement some control
structure to track the references to your searcher / reader. As it is
best practice to have one single searcher open you can track the
reference to the searcher while one reference is hold by the class you
request your searcher from. I
Hi all,
after I delete some entries from the index, I close the IndexSearcher to
ensure that the changes are done.
But after this I couldn't figure out a way to tell if the searcher is closed
or not.
Any ideas?
Regards
Frank
-
Hello, Chris.
CH> 3) most likely, if you are seeing "slow" performance from sorted searches,
CH> the time spent "scoring" the results isn't the biggest contributor to how
CH> long thesearch takes -- it tends to be negligable for most queries. A
CH> better question is: are you reusing the exact sa
Pass a field name to the QueryScorer constructor.
See "testFieldSpecificHighlighting" method in the Junit test for the
highlighter for an example.
Cheers
Mark
zhu jiang wrote:
Hi all,
For example, if I have a document with two fields text and num like
this:
text:foo bar
49 matches
Mail list logo