RE: References to deleted file handles in long-running server application

2005-11-17 Thread Monsur Hossain
How often are you updating your index? Are you closing your old IndexSearchers after switching over to the new index? You'll need to close the searchers in order to release the file handle. This was the same issue I was experiencing: http://mail-archives.apache.org/mod_mbox/lucene-java-user/2

Re: TermFreqVector

2005-11-17 Thread Chris Lamprecht
Can you post the code you're using to create the Document and adding it to the IndexWriter? You have to tell lucene to store term freq vectors (it isn't done by default). Also I'm not sure what you mean when you say your documents do not have fields. Do you have at least one field? -chris On

Re: References to deleted file handles in long-running server application

2005-11-17 Thread Matt Magoffin
I've been watching our servers today, and now there are 2500 "deleted" file handles open like this. Seems to be quite large. Still don't know why there are so many. I'm using the compound index format already to reduce the number of open files. -- m@ > Hello, I use Lucene in a long-running server

TermFreqVector

2005-11-17 Thread Anna Buczak
I have indexed a set of documents that do not have fields. I want to use the getTermFreqVector method from IndexReader to get the frequencies. However when I do that as: TermFreqVector[] z = ir.getTermFreqVectors(0); z is null. So I can't get the frequency vectors. Help will be very much appr

Re: Memory Usage

2005-11-17 Thread Daniel Noll
Doug Cutting wrote: Daniel Noll wrote: Doug Cutting wrote: Daniel Noll wrote: I actually did throw a lot of terms in, and eventually chose "one" for the tests because it was the slowest query to complete of them all (hence I figured it was already spending some fairly long time in I/O, a

Re: Memory Usage

2005-11-17 Thread Marvin Humphrey
On Nov 17, 2005, at 4:16 PM, Daniel Noll wrote: Doug Cutting wrote: Daniel Noll wrote: I actually did throw a lot of terms in, and eventually chose "one" for the tests because it was the slowest query to complete of them all (hence I figured it was already spending some fairly long tim

References to deleted file handles in long-running server application

2005-11-17 Thread Matt Magoffin
Hello, I use Lucene in a long-running server application on a Linux server, and the other day I got the "Too many open files" exception. I've increased the number of allowed file handles, but was checking out the open file handles using "lsof", and see about 300 files listed like the following: ja

Re: Memory Usage

2005-11-17 Thread Doug Cutting
Daniel Noll wrote: Doug Cutting wrote: Daniel Noll wrote: I actually did throw a lot of terms in, and eventually chose "one" for the tests because it was the slowest query to complete of them all (hence I figured it was already spending some fairly long time in I/O, and would be penalised t

Re: Memory Usage

2005-11-17 Thread Daniel Noll
Doug Cutting wrote: Daniel Noll wrote: I actually did throw a lot of terms in, and eventually chose "one" for the tests because it was the slowest query to complete of them all (hence I figured it was already spending some fairly long time in I/O, and would be penalised the most.) Every oth

Re: IndexReader question

2005-11-17 Thread Michael Curtin
I think you want to access the TermEnum from IndexReader's terms() method. Depending upon how many fields you have an which ones you're interested in for term frequencies, something like this should get you started: String dir = "topleveldir"; IndexReader ir = new IndexReader(FSDirectory.getDir

Re: Lucene & Transactional semantics

2005-11-17 Thread Marc Hadfield
The Compass Framework ( http://www.compassframework.org/display/SITE/Home) implements transactional semantics "on top" of Lucene, such that you can treat the Lucene Index as an ORM-style database. Compass uses a recent version of Lucene but I'm sure some functionality is abstracted out and p

IndexReader question

2005-11-17 Thread Anna Buczak
I built an index of my documents using Lucene. I am interested in exporting part of the information in the Lucene index to a file (and using that file in another application). The information that I want to export consists mainly of the frequencies of the words in each of the documents. Does an

Re: Lucene & Transactional semantics

2005-11-17 Thread Beto Siless
Hi, I'm with the transaction problem too: I have Documents which are represented by a Business Object (persisted in a DB with an ORM), indexed with Lucene and finally stored in the file system. So it's very difficult to maintain the consistency in an error scenario. The main problem is that if

Re: Field Boosting

2005-11-17 Thread Paul Smith
This would be a good candidate for an IllegalStateException to be thrown if the user calls this method when it's not valid. Save the user some hassles? (one can JavaDoc to one is blue in the face, but throwing a good RuntimeException with a message trains the users much quicker... :) ) P

Re: Field Boosting

2005-11-17 Thread Yonik Seeley
Right. getBoost() is meaningless on retrieved documents (it isn't set when a doc is read from the index). There really should have been a separate class for documents retrieved from an index vs documents added... but that's water way under the bridge. -Yonik On 11/17/05, Erik Hatcher <[EMAIL PRO

Re: Field Boosting

2005-11-17 Thread Chris Hostetter
: I don't believe, though haven't checked, that doc.getBoost() is a : valid thing to call on documents retrieved from an index. The boost : factor gets collapsed into other factors computed at index time, so : it is incorrect to expect the exact boost factor set at indexing time : is available dur

Re: Wordnet JWLN

2005-11-17 Thread Stefan Gusenbauer
José Ramón Pérez Agüera wrote: For this task you can use GATE, where you can find a POS-Tagger very useful. http://gate.ac.uk/ (sorry for my english) jose José Ramón Pérez Agüera Despacho 411 tlf. 913947599 Dept. de Sistemas Informáticos y Programación Facultad de Informática Universidad Com

Re: Issues while doing ant on lucene source

2005-11-17 Thread Dan Armbrust
Pol, Parikshit wrote: Hi Folks. I downloaded the Lucene and tried to do an ant. It initially gave me the following error: ... Are you using a current version of ant? Lucene 1.4.3 should already be fully built when you downloaded it - you shouldn't have to compile it. If you want the "curre

Re: Deprecated API in BooleanQuery broken in Lucene from CVS?

2005-11-17 Thread Daniel Naber
On Dienstag 15 November 2005 11:24, Patrick Kimber wrote: > I have checked out the latest version of Lucene from CVS and have > found a change in the results compared to version 1.4.3. Lucene isn't in CVS anymore, it's in SVN. With the latest version from SVN, I cannot reproduce your problem. R

Re: fnm file disappear

2005-11-17 Thread Otis Gospodnetic
Are you using Windows and a compound index format (look at your index dir - does it have .cfs file(s))? This may be a bad combination, judging from people who reported this problem so far. Otis --- Gioni <[EMAIL PROTECTED]> wrote: > Hi all > > I'm using lucene to index some document, all work

RE: Wordnet JWLN

2005-11-17 Thread Rajesh Munavalli
There is also a package from Stanford NLP group for POS tagging using WordNet. They claim to have the best accuracy. Here is the link. http://www-nlp.stanford.edu/ -Original Message- From: José Ramón Pérez Agüera [mailto:[EMAIL PROTECTED] Sent: Thu 11/17/2005 9:52 AM To: java-user@lucene

Re: Wordnet JWLN

2005-11-17 Thread José Ramón Pérez Agüera
For this task you can use GATE, where you can find a POS-Tagger very useful. http://gate.ac.uk/ (sorry for my english) jose José Ramón Pérez Agüera Despacho 411 tlf. 913947599 Dept. de Sistemas Informáticos y Programación Facultad de Informática Universidad Complutense de Madrid - Mensaje

Wordnet JWLN

2005-11-17 Thread Stefan Gusenbauer
For my index i want to check if a word is a noun, is this possible with the wordnet package which can be found under lucene contributions or does anyone knows a good tutorial or documentation for http://jwordnet.sourceforge.net/ ? Thanks Stefan -

Re: Memory Usage

2005-11-17 Thread Doug Cutting
Daniel Noll wrote: I actually did throw a lot of terms in, and eventually chose "one" for the tests because it was the slowest query to complete of them all (hence I figured it was already spending some fairly long time in I/O, and would be penalised the most.) Every other query was around 7ms

Sorting: single field vs multiple fields

2005-11-17 Thread Monsur Hossain
Anyone have any ballpark stats about sorting a single field versus sorting multiple fields? I understand every implementation is different, but I'm just trying to get a sense of what to expect before I revamp my index. We need fairly fine-grained sorting of items, so I have a field with the dat

Re: Multiple Analyzers

2005-11-17 Thread Michael Curtin
> Hi everybody, I want to know how to create an analyzer whith this and > StopFilter and LowerCaseFilter. Exists some example anywhere? > thks for replies Not bad at all. StopAnalyzer by itself may do what you want. If not, here's an example of a custom analyzer: class MyAnalyzer extends Ana

Multiple Analyzers

2005-11-17 Thread Daniel Cortes
Hi everybody, I want to know how to create an analyzer whith this and StopFilter and LowerCaseFilter. Exists some example anywhere? thks for replies - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

fnm file disappear

2005-11-17 Thread Gioni
Hi all I'm using lucene to index some document, all work withouth problem since I was replace lucene 1.4.2 with 1.4.3. Now on a random basis I got an exception: java.io.FileNotFoundException: /usr/local/tomcat-azalea/lucene/_3ax.fnm (No such file or directory) The problem is that I use lucene to

Re: case-sensitive search

2005-11-17 Thread Erik Hatcher
On 17 Nov 2005, at 04:27, jibu mathew wrote: Is it possible to do both case-sensitive and non case-sensitive search on already indexed documents? If not, is there any way to implement it without making two indexes for each case? Please help me in this regard. On already indexed documents? N

Re: Field Boosting

2005-11-17 Thread Erik Hatcher
On 17 Nov 2005, at 09:23, [EMAIL PROTECTED] wrote: I have a similar problem, I have boosted documents in an index, when I run a query it shows boosted documents first, but when I loop the Documents through the Hits class, this is: Document doc = hits.doc(i); System.out.println("Query scorin

Re: Re: Field Boosting

2005-11-17 Thread dblanch
I have a similar problem, I have boosted documents in an index, when I run a query it shows boosted documents first, but when I loop the Documents through the Hits class, this is: Document doc = hits.doc(i); System.out.println("Query scoring: " + formatter.format(hits.score(i))); //never h

Re: Field Boosting

2005-11-17 Thread Erik Hatcher
Daniel, Could you give us a test case that shows the boost not working properly? I'm using document level boosting (which is really what field level boosting does under the covers) in some of my applications and it is working as expected. Erik On 17 Nov 2005, at 05:39, [EMAIL PROTECT

Re: Field Boosting

2005-11-17 Thread David Escuer
Hi Daniel, I faced the same problem a couple of days ago. I was trying to set the boost values while indexing, but the results wasn't the expected. I've solved just putting the boost values in the search query, using the '^' operator. There is an example: ((+text:house)^25.0) (+title:house)^

Re: Date Indexing

2005-11-17 Thread Erik Hatcher
On 17 Nov 2005, at 07:06, [EMAIL PROTECTED] wrote: I have a copy of the book. It tells you how to index as I noted, but not how to retrieve the date from search results. document.get("date") only returns Strings. How do I get it to return the Date object? As mentioned, DateField is the

Re: Date Indexing

2005-11-17 Thread Erik Hatcher
Oh, and sorry to miss the sorting question. Lucene can sort search results by String or numeric values. Field.Keyword(String,Date) can only be sorted as a String though. If you truly want to index and sort dates but don't need hours, minutes, seconds, milliseconds, then index them as YYY

Re: Date Indexing

2005-11-17 Thread Daniel . Clark
I have a copy of the book. It tells you how to index as I noted, but not how to retrieve the date from search results. document.get("date") only returns Strings. How do I get it to return the Date object? ~ Daniel Clark, Senior Consultant Sybase Federal P

Re: Date Indexing

2005-11-17 Thread Erik Hatcher
On 17 Nov 2005, at 05:43, [EMAIL PROTECTED] wrote: I indexed dates using Field.Keyword(String,Date). The values seem to be encoded when I retrieve them via document.get("date"). Luke confirmed it. How do I decode the Date when retrieving from Document object? Or does it not work in vers

Re: Is there a tool that merges Lucene indexes?

2005-11-17 Thread Erik Hatcher
On 17 Nov 2005, at 03:37, Oren Shir wrote: Does Luke, Lucli, or any of the existing tools enable merging Lucene indexes? No, none of those tools do it, but it is all of about 10 lines of code: public class IndexMergeTool { public static void main(String[] args) throws IOException { File

Date Indexing

2005-11-17 Thread Daniel . Clark
I indexed dates using Field.Keyword(String,Date). The values seem to be encoded when I retrieve them via document.get("date"). Luke confirmed it. How do I decode the Date when retrieving from Document object? Or does it not work in version 1.4.3? Also, does Lucene only sort String values? ~~~

Field Boosting

2005-11-17 Thread Daniel . Clark
When I boost fields while indexing, the fields still have a boost of 1.0 during searching. When I view the values via Luke, it confirms the value of 1.0. Do I have to boost it agin during search? I want certain fields to have higher priority/score during search. How do I get it to work? I'm u

case-sensitive search

2005-11-17 Thread jibu mathew
Hi all, Is it possible to do both case-sensitive and non case-sensitive search on already indexed documents? If not, is there any way to implement it without making two indexes for each case? Please help me in this regard. Thanks in advance Jibu

Is there a tool that merges Lucene indexes?

2005-11-17 Thread Oren Shir
Hi, Does Luke, Lucli, or any of the existing tools enable merging Lucene indexes? Thanks, Oren Shir