Calculated terms during a query

2009-01-07 Thread Joe MarkAnthony
Greetings, I would like to search for items based on 'calculated' terms. Specifically, say I am using Lucene to search a collection of tasks, with fields "start_date" and "end_date", among others. The question to solve is: "Find all tasks that took longer than 100 days". So the easy answer

Re: Determining index term count

2009-01-07 Thread Andrzej Bialecki
Greg Shackles wrote: I'm not sure offhand how to write the code to do it, but I know when you open an index in Luke, that is one of the numbers it gives you. If you want to just get the number once that would be an easy way to do it. If you want the code for it, Luke is open source so you could

Re: Determining index term count

2009-01-07 Thread Greg Shackles
I'm not sure offhand how to write the code to do it, but I know when you open an index in Luke, that is one of the numbers it gives you. If you want to just get the number once that would be an easy way to do it. If you want the code for it, Luke is open source so you could see how they do it. (

Re: Help with installing Lucene

2009-01-07 Thread Glen Newton
> I'm not sure if it's a better idea to use something like Solr or start from > scratch and customize the application as I move forward. What do you think LuSql might be appropriate for your needs: "LuSql is a high-performance, simple tool for indexing data held in a DBMS into a Lucene index. It c

Re: Help with installing Lucene

2009-01-07 Thread ahammad
Greg Shackles wrote: > > > Depending on what you need, there might be something already built that > can > do what you want. I can't look up links right now but you might want to > look into Solr and see if that works for what you want. Otherwise, I > think > there are code samples and whatn

Re: Help with installing Lucene

2009-01-07 Thread Erick Erickson
See the other's comments, but do be aware that there are many valuable additions to Lucene in the contrib area, but to use them you need to include the particular jar from contrib that you want in your CLASSPATH. That is, the contrib contributions do NOT reside in the lucene jar, they are separate

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Ok. Just to followup, I performed the same steps with another of our indexes and did not have the same issue: Opening index @ /lucenedata/index4 Segments file=segments_85 numSegments=1 version=FORMAT_HAS_PROX [Lucene 2.4] 1 of 1: name=_42 docCount=3986767 compound=true hasProx=true

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread Marcelo Ochoa
Hi: Could you try open the index using Luke but using the JDK bundled with the Oracle DB? I mean, try to use Luke as an standalone application in the same machine but outside the OJVM using the JDK at: $ORACLE_HOME/jdk which was used to compile most of the classes running inside the OJV

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Michael McCandless-2 wrote: > > That exception seems to indicate that the fdx file being opened by > FieldsReader is 0 length (it's trying to read the first int from that > file). > > Is the exception repeatable, if you try again to call > IndexReader.open? > > It's odd that CheckIndex finds

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Toke Eskildsen wrote: > > A quick check when a corrupt index problem is encountered: > Does any of your machines run Java 1.6.0_04-1.6.0_10b25? > Thanks Toke. As I mentioned in my response to Erick, this is complicated by the fact that the error is within a java stored procedure in Oracle. Th

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Erick Erickson wrote: > > I guess my first question, based on your statement that you ran > checkindex from a different machine would be whether you have > the same version of Lucene installed on both machines? And how > did you get your index where it is now? did you optmize it in place > or d

Determining index term count

2009-01-07 Thread Christian Reuschling
Is there a fast way to determine the total number of terms inside an index? Currently I only found the way to walk through the TermEnumeration, i.e. TermEnum termEnum4TermCount = reader.terms(); int iTermCount = 0; while (termEnum4TermCount.next()) iTermCount++; termEnum4TermCount.close();

Re: Help with installing Lucene

2009-01-07 Thread Simon Willnauer
Hi there, On Wed, Jan 7, 2009 at 3:39 PM, ahammad wrote: > > Hello, > > I have a side project coming up which requires writing a search engine. I > came across Lucene but I'm having some problems figuring out how to install > it. I'm trying to get it to work on a Windows box. > > On the Lucene we

Re: Help with installing Lucene

2009-01-07 Thread Greg Shackles
You don't really "install" it as it is not its own standalone application. You write the software that interfaces with the Lucene API. The src zip you mentioned has all the Lucene source, so you can use that if you want to compile the library yourself. If you want to use the precompiled binary of

Help with installing Lucene

2009-01-07 Thread ahammad
Hello, I have a side project coming up which requires writing a search engine. I came across Lucene but I'm having some problems figuring out how to install it. I'm trying to get it to work on a Windows box. On the Lucene website, there are two files: lucene-2.4.0-src.zip and lucene-2.4.0.zip (w

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread Michael McCandless
That exception seems to indicate that the fdx file being opened by FieldsReader is 0 length (it's trying to read the first int from that file). Is the exception repeatable, if you try again to call IndexReader.open? It's odd that CheckIndex finds no problem with the index, but opening an IndexR

Re: TermScorer default buffer size

2009-01-07 Thread Paul Elschot
On Wednesday 07 January 2009 07:25:17 John Wang wrote: > Hi: > >The default buffer size (for docid,score etc) is 32 in TermScorer. > > We have a large index with some terms to have very dense doc sets. By > increasing the buffer size we see very dramatic performance improvements. > >

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread Toke Eskildsen
On Tue, 2009-01-06 at 23:07 +0100, 1world1love wrote: > Greetings all. I have an index that I have optimized and when I try to open > the index I get this: > > java.io.IOException: read past EOF A quick check when a corrupt index problem is encountered: Does any of your machines run Java 1.6.0_04