Re: Optimum way to find all document without particular field

2009-03-04 Thread Daniel Noll
Chris Lu wrote: Allahbaksh, If you ONLY want to find all document with a particular field that is not null, you can loop through the TermEnum and TermDocs to find all the document ids. But this can not easily be combined with other queries. Surely this would be equivalent to a PrefixQuery w

Re: crawler questions..

2009-03-04 Thread Tim Williams
On Wed, Mar 4, 2009 at 4:41 PM, Grant Ingersoll wrote: > You might have a look at Droids (http://incubator.apache.org/droids/) or > Nutch (http://lucene.apache.org/nutch) and their communities.  They are much > more focused on crawling (not to say there aren't people here who crawl, > just saying

Re: crawler questions..

2009-03-04 Thread Grant Ingersoll
You might have a look at Droids (http://incubator.apache.org/droids/) or Nutch (http://lucene.apache.org/nutch) and their communities. They are much more focused on crawling (not to say there aren't people here who crawl, just saying those projects are (mostly) about crawling) On Mar 4, 2

crawler questions..

2009-03-04 Thread bruce
Hi... Sorry that this is a bit off track. Ok, maybe way off track! But I don't have anyone to bounce this off of.. I'm working on a crawling project, crawling a college website, to extract course/class information. I've built a quick test app in python to crawl the site. I crawl at the top level

Re: Why do range queries work on fields only ?

2009-03-04 Thread Raymond Balmès
I'm in the design phase and not used lucene so far... that should come pretty soon though. The range query that I have been looking at in the API documentation and the code, calls for a field name and subsequently search a field for range (example of date searching)... and I did not see operators

Re: Why do range queries work on fields only ?

2009-03-04 Thread Raymond Balmès
Erick, Sorry I meant the first option as in the range query for fields. Ok will look at the span query most of the time the number of terms will small, although there is one use case where it could go up to 50 consecutives terms. -Raymond- On Tue, Mar 3, 2009 at 9:30 PM, Erick Erickson wrote: >

Lucene: MultiSearcher

2009-03-04 Thread KrustyDerClown
Hello, i have a short question to the MultiSearcher. Is it possible to identify from which index a result/hit comes when i use a MultiSearcher (2 Indizes)? Thank you for your help. Greets Oliver

Re: Lucene Demo

2009-03-04 Thread Michael McCandless
Woops, what Erick said ;) Mike Erick Erickson wrote: 10,000 tokens is the out-of-the-box default limit. You can set it to whatever you want via IndexWriter.setMaxFieldLength Best Erick On Wed, Mar 4, 2009 at 2:11 PM, Matt Kuenzel wrote: Is there a limit to the document size that Lucene

Re: Luke site is down?

2009-03-04 Thread Andrzej Bialecki
Hi all, I apologize for the inconvenience - the site went down without any prior notice from the ISP. I'm investigating the issue ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web _

Re: Demo Question

2009-03-04 Thread Michael McCandless
There is no limit, except if you try to index a way-too-large a document you will hit OutOfMemoryException. Mike Matt Kuenzel wrote: Is there a limit to the document size that Lucene will index in the demo (org.apache.lucene.demo.*)? ---

Re: Lucene Demo

2009-03-04 Thread Matt Kuenzel
thanks, erick! On Wed, Mar 4, 2009 at 2:53 PM, Erick Erickson wrote: > 10,000 tokens is the out-of-the-box default limit. You can > set it to whatever you want via IndexWriter.setMaxFieldLength > > Best > Erick > > On Wed, Mar 4, 2009 at 2:11 PM, Matt Kuenzel > wrote: > > > Is there a limit to t

Re: Lucene Demo

2009-03-04 Thread Erick Erickson
10,000 tokens is the out-of-the-box default limit. You can set it to whatever you want via IndexWriter.setMaxFieldLength Best Erick On Wed, Mar 4, 2009 at 2:11 PM, Matt Kuenzel wrote: > Is there a limit to the document size that Lucene will index in the demo > (org.apache.lucene.demo.*)? >

Re: Optimum way to find all document without particular field

2009-03-04 Thread Erick Erickson
Remember, though, that this won't work *unless* there is a value to exclude, thus several of the suggestions to index a special token in the relevant field that's guaranteed to not be something you ever want to legitimately search on. Erick On Wed, Mar 4, 2009 at 2:10 PM, Uwe Schindler wrote: >

Demo Question

2009-03-04 Thread Matt Kuenzel
Is there a limit to the document size that Lucene will index in the demo (org.apache.lucene.demo.*)?

Lucene Demo

2009-03-04 Thread Matt Kuenzel
Is there a limit to the document size that Lucene will index in the demo (org.apache.lucene.demo.*)?

Re: Luke site is down?

2009-03-04 Thread Erik Hatcher
On Mar 4, 2009, at 2:08 PM, Ruslan Sivak wrote: Is there a separate mailing list for getopt? Perhaps someone can notify the site owner? I've just sent Andrzej "Luke" Bialecki an e-mail, though I imagine he monitors this list too. Erik ---

RE: Optimum way to find all document without particular field

2009-03-04 Thread Uwe Schindler
To find all document, that not contain a term, you can combine a MatchAllDocsQuery with BooleanClause.Occur.MUST, combined with one or more TermQueries with BooleanClause.Occur.MUST_NOT (the terms you do not want in the documents). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://ww

Re: Luke site is down?

2009-03-04 Thread Ruslan Sivak
Steven A Rowe wrote: When I initially tried an hour or so ago to access , I was able to see the site. But now, I'm not able to get anything. Here's the response I see using curl: PROMPT$ curl http://www.getopt.org/luke/ curl: (52) Empty reply from server Loo

Re: Optimum way to find all document without particular field

2009-03-04 Thread Shashi Kant
A simple solution would be to store the string "NULL" instead of null and then query. On Wed, Mar 4, 2009 at 1:26 PM, Chris Lu wrote: > Allahbaksh, > > If you ONLY want to find all document with a particular field that is not > null, you can loop through the TermEnum and TermDocs to find all th

Re: Optimum way to find all document without particular field

2009-03-04 Thread Erick Erickson
Well, you could construct a Filter as you were looping and use the Filter with your queries Erick On Wed, Mar 4, 2009 at 1:26 PM, Chris Lu wrote: > Allahbaksh, > > If you ONLY want to find all document with a particular field that is not > null, you can loop through the TermEnum and TermD

Re: Optimum way to find all document without particular field

2009-03-04 Thread Chris Lu
Allahbaksh, If you ONLY want to find all document with a particular field that is not null, you can loop through the TermEnum and TermDocs to find all the document ids. But this can not easily be combined with other queries. -- Chris Lu - Instant Scalable Full-Text

RE: Luke site is down?

2009-03-04 Thread Steven A Rowe
When I initially tried an hour or so ago to access , I was able to see the site. But now, I'm not able to get anything. Here's the response I see using curl: PROMPT$ curl http://www.getopt.org/luke/ curl: (52) Empty reply from server Looks like you can get a

Re: Luke site is down?

2009-03-04 Thread Michael Barbarelli
I'm not having any problems with the following. http://www.getopt.org/luke/ On Wed, Mar 4, 2009 at 5:07 PM, Ruslan Sivak wrote: > I'm not getting anything when I go to http://www.getopt.org/luke/, or > http://www.getopt.org. > > Does anyone know how long the site is expected to be down and is t

Luke site is down?

2009-03-04 Thread Ruslan Sivak
I'm not getting anything when I go to http://www.getopt.org/luke/, or http://www.getopt.org. Does anyone know how long the site is expected to be down and is there an alternate download location for luke? Russ - To unsubscri

Re: Confidence scores at search time

2009-03-04 Thread Erik Hatcher
On Mar 4, 2009, at 9:05 AM, Michael McCandless wrote: I think (?) Explanation.toString() is in fact supposed to return the full explanation (not just the first line)? You're right... I just read the code wrong after seeing the output Ken posted originally. He followed up with a correct

Re: Confidence scores at search time

2009-03-04 Thread Michael McCandless
I think (?) Explanation.toString() is in fact supposed to return the full explanation (not just the first line)? Mike Ken Williams wrote: On 3/2/09 1:58 PM, "Erik Hatcher" wrote: On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: In the output, I get explanations like "0.88922405 = (M

RE: relevance vs. score

2009-03-04 Thread spring
> It's the similarity scoring formula. EG see here: > >http://lucene.apache.org/java/2_4_0/scoring.html > > and here: > > > http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene > /search/Similarity.html OK; thank you -

Re: relevance vs. score

2009-03-04 Thread Michael McCandless
It's the similarity scoring formula. EG see here: http://lucene.apache.org/java/2_4_0/scoring.html and here: http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/Similarity.html Mike wrote: I think for "ordinary" Lucene queries, "score" and "relevance" mean the same

RE: relevance vs. score

2009-03-04 Thread spring
> I think for "ordinary" Lucene queries, "score" and "relevance" mean > the same thing. > > But if you do eg function queries, or you "mixin" recency into your > scoring, etc., then "score" could be anything you computed, a value > from a field, etc. Hm, how is relevance then defined? ---

Re: Optimum way to find all document without particular field

2009-03-04 Thread Ganesh
- Original Message - From: "Ganesh" To: Sent: Wednesday, March 04, 2009 12:05 PM Subject: Re: Optimum way to find all document without particular field Allahbaksh, I don't think, Lucene could filter out Null and Not null values. In case if the field value is Null, index the field

Re: relevance vs. score

2009-03-04 Thread Michael McCandless
I think for "ordinary" Lucene queries, "score" and "relevance" mean the same thing. But if you do eg function queries, or you "mixin" recency into your scoring, etc., then "score" could be anything you computed, a value from a field, etc. Mike wrote: Hi, When I say: sorted by relev

Re: not updating caching

2009-03-04 Thread Ian Lea
What exactly are you using: Solr or some other server or straight lucene? Lucene itself doesn't do caching. When you close and start the server what exactly are you closing and starting? If by server you mean something like Tomcat then perhaps you are not reopening index readers/searchers after

Re: not updating caching

2009-03-04 Thread sandyg
HI, Thnx for the reply. But explicetly am not using or creating any caching.But why its happening Otis Gospodnetic wrote: > > > I have a feeling you are using Solr or some other server and not straight > Lucene. To turn off Solr caching, comment it out from solrconfig.xml (but > you'll need

relevance vs. score

2009-03-04 Thread spring
Hi, When I say: sorted by relevance or sorted by score - are relevance and score synonym for each other or what is the difference in relation to sorting? Thank you - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apach