How often are you updating your index? Are you closing your old
IndexSearchers after switching over to the new index? You'll need to
close the searchers in order to release the file handle. This was the
same issue I was experiencing:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/2
Can you post the code you're using to create the Document and adding
it to the IndexWriter? You have to tell lucene to store term freq
vectors (it isn't done by default). Also I'm not sure what you mean
when you say your documents do not have fields. Do you have at least
one field?
-chris
On
I've been watching our servers today, and now there are 2500 "deleted"
file handles open like this. Seems to be quite large. Still don't know why
there are so many. I'm using the compound index format already to reduce
the number of open files.
-- m@
> Hello, I use Lucene in a long-running server
I have indexed a set of documents that do not have fields. I want to
use the getTermFreqVector method from IndexReader to get the
frequencies. However when I do that as:
TermFreqVector[] z = ir.getTermFreqVectors(0);
z is null. So I can't get the frequency vectors.
Help will be very much appr
Doug Cutting wrote:
Daniel Noll wrote:
Doug Cutting wrote:
Daniel Noll wrote:
I actually did throw a lot of terms in, and eventually chose "one"
for the tests because it was the slowest query to complete of them
all (hence I figured it was already spending some fairly long time
in I/O, a
On Nov 17, 2005, at 4:16 PM, Daniel Noll wrote:
Doug Cutting wrote:
Daniel Noll wrote:
I actually did throw a lot of terms in, and eventually chose
"one" for the tests because it was the slowest query to complete
of them all (hence I figured it was already spending some fairly
long tim
Hello, I use Lucene in a long-running server application on a Linux
server, and the other day I got the "Too many open files" exception. I've
increased the number of allowed file handles, but was checking out the
open file handles using "lsof", and see about 300 files listed like the
following:
ja
Daniel Noll wrote:
Doug Cutting wrote:
Daniel Noll wrote:
I actually did throw a lot of terms in, and eventually chose "one"
for the tests because it was the slowest query to complete of them
all (hence I figured it was already spending some fairly long time in
I/O, and would be penalised t
Doug Cutting wrote:
Daniel Noll wrote:
I actually did throw a lot of terms in, and eventually chose "one"
for the tests because it was the slowest query to complete of them
all (hence I figured it was already spending some fairly long time in
I/O, and would be penalised the most.) Every oth
I think you want to access the TermEnum from IndexReader's terms() method.
Depending upon how many fields you have an which ones you're interested in
for term frequencies, something like this should get you started:
String dir = "topleveldir";
IndexReader ir = new IndexReader(FSDirectory.getDir
The Compass Framework (
http://www.compassframework.org/display/SITE/Home) implements
transactional semantics "on top" of Lucene, such that you can treat the
Lucene Index as an ORM-style database. Compass uses a recent version of
Lucene but I'm sure some functionality is abstracted out and p
I built an index of my documents using Lucene. I am interested in
exporting part of the information in the Lucene index to a file (and
using that file in another application). The information that I want to
export consists mainly of the frequencies of the words in each of the
documents.
Does an
Hi, I'm with the transaction problem too: I have Documents which are
represented by a Business Object (persisted in a DB with an ORM),
indexed with Lucene and finally stored in the file system. So it's very
difficult to maintain the consistency in an error scenario.
The main problem is that if
This would be a good candidate for an IllegalStateException to be
thrown if the user calls this method when it's not valid. Save the
user some hassles? (one can JavaDoc to one is blue in the face, but
throwing a good RuntimeException with a message trains the users much
quicker... :) )
P
Right. getBoost() is meaningless on retrieved documents (it isn't set
when a doc is read from the index).
There really should have been a separate class for documents retrieved
from an index vs documents added... but that's water way under the
bridge.
-Yonik
On 11/17/05, Erik Hatcher <[EMAIL PRO
: I don't believe, though haven't checked, that doc.getBoost() is a
: valid thing to call on documents retrieved from an index. The boost
: factor gets collapsed into other factors computed at index time, so
: it is incorrect to expect the exact boost factor set at indexing time
: is available dur
José Ramón Pérez Agüera wrote:
For this task you can use GATE, where you can find a POS-Tagger very useful.
http://gate.ac.uk/
(sorry for my english)
jose
José Ramón Pérez Agüera
Despacho 411 tlf. 913947599
Dept. de Sistemas Informáticos y Programación
Facultad de Informática
Universidad Com
Pol, Parikshit wrote:
Hi Folks.
I downloaded the Lucene and tried to do an ant. It initially gave me the
following error:
...
Are you using a current version of ant?
Lucene 1.4.3 should already be fully built when you downloaded it - you
shouldn't have to compile it.
If you want the "curre
On Dienstag 15 November 2005 11:24, Patrick Kimber wrote:
> I have checked out the latest version of Lucene from CVS and have
> found a change in the results compared to version 1.4.3.
Lucene isn't in CVS anymore, it's in SVN. With the latest version from SVN,
I cannot reproduce your problem.
R
Are you using Windows and a compound index format (look at your index
dir - does it have .cfs file(s))?
This may be a bad combination, judging from people who reported this
problem so far.
Otis
--- Gioni <[EMAIL PROTECTED]> wrote:
> Hi all
>
> I'm using lucene to index some document, all work
There is also a package from Stanford NLP group for POS tagging using WordNet.
They claim to have the best accuracy. Here is the link.
http://www-nlp.stanford.edu/
-Original Message-
From: José Ramón Pérez Agüera [mailto:[EMAIL PROTECTED]
Sent: Thu 11/17/2005 9:52 AM
To: java-user@lucene
For this task you can use GATE, where you can find a POS-Tagger very useful.
http://gate.ac.uk/
(sorry for my english)
jose
José Ramón Pérez Agüera
Despacho 411 tlf. 913947599
Dept. de Sistemas Informáticos y Programación
Facultad de Informática
Universidad Complutense de Madrid
- Mensaje
For my index i want to check if a word is a noun, is this possible with
the wordnet package which can be found under lucene contributions or
does anyone knows a good tutorial or documentation for
http://jwordnet.sourceforge.net/ ?
Thanks
Stefan
-
Daniel Noll wrote:
I actually did throw a lot of terms in, and eventually chose "one" for
the tests because it was the slowest query to complete of them all
(hence I figured it was already spending some fairly long time in I/O,
and would be penalised the most.) Every other query was around 7ms
Anyone have any ballpark stats about sorting a single field versus sorting
multiple fields? I understand every implementation is different, but I'm
just trying to get a sense of what to expect before I revamp my index.
We need fairly fine-grained sorting of items, so I have a field with the
dat
> Hi everybody, I want to know how to create an analyzer whith this and
> StopFilter and LowerCaseFilter. Exists some example anywhere?
> thks for replies
Not bad at all. StopAnalyzer by itself may do what you want. If not, here's
an example of a custom analyzer:
class MyAnalyzer extends Ana
Hi everybody, I want to know how to create an analyzer whith this and
StopFilter and LowerCaseFilter. Exists some example anywhere?
thks for replies
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
Hi all
I'm using lucene to index some document, all work withouth problem since
I was replace lucene 1.4.2 with 1.4.3. Now on a random basis I got an
exception:
java.io.FileNotFoundException: /usr/local/tomcat-azalea/lucene/_3ax.fnm
(No such file or directory)
The problem is that I use lucene to
On 17 Nov 2005, at 04:27, jibu mathew wrote:
Is it possible to do both case-sensitive and non case-sensitive search
on already indexed documents? If not, is there any way to implement it
without making two indexes for each case? Please help me in this
regard.
On already indexed documents? N
On 17 Nov 2005, at 09:23, [EMAIL PROTECTED] wrote:
I have a similar problem, I have boosted documents in an index,
when I run
a query it shows boosted documents first, but when I loop the
Documents
through the Hits class, this is:
Document doc = hits.doc(i);
System.out.println("Query scorin
I have a similar problem, I have boosted documents in an index, when I run
a query it shows boosted documents first, but when I loop the Documents
through the Hits class, this is:
Document doc = hits.doc(i);
System.out.println("Query scoring: "
+ formatter.format(hits.score(i))); //never h
Daniel,
Could you give us a test case that shows the boost not working properly?
I'm using document level boosting (which is really what field level
boosting does under the covers) in some of my applications and it is
working as expected.
Erik
On 17 Nov 2005, at 05:39, [EMAIL PROTECT
Hi Daniel,
I faced the same problem a couple of days ago. I was trying to set the
boost values while indexing, but the
results wasn't the expected. I've solved just putting the boost values
in the search query, using the '^'
operator. There is an example:
((+text:house)^25.0) (+title:house)^
On 17 Nov 2005, at 07:06, [EMAIL PROTECTED] wrote:
I have a copy of the book. It tells you how to index as I noted,
but not
how to retrieve the date from search results. document.get("date")
only
returns Strings. How do I get it to return the Date object?
As mentioned, DateField is the
Oh, and sorry to miss the sorting question. Lucene can sort search
results by String or numeric values. Field.Keyword(String,Date) can
only be sorted as a String though. If you truly want to index and
sort dates but don't need hours, minutes, seconds, milliseconds, then
index them as YYY
I have a copy of the book. It tells you how to index as I noted, but not
how to retrieve the date from search results. document.get("date") only
returns Strings. How do I get it to return the Date object?
~
Daniel Clark, Senior Consultant
Sybase Federal P
On 17 Nov 2005, at 05:43, [EMAIL PROTECTED] wrote:
I indexed dates using Field.Keyword(String,Date). The values seem
to be
encoded when I retrieve them via document.get("date"). Luke
confirmed it.
How do I decode the Date when retrieving from Document object? Or
does it
not work in vers
On 17 Nov 2005, at 03:37, Oren Shir wrote:
Does Luke, Lucli, or any of the existing tools enable merging Lucene
indexes?
No, none of those tools do it, but it is all of about 10 lines of code:
public class IndexMergeTool {
public static void main(String[] args) throws IOException {
File
I indexed dates using Field.Keyword(String,Date). The values seem to be
encoded when I retrieve them via document.get("date"). Luke confirmed it.
How do I decode the Date when retrieving from Document object? Or does it
not work in version 1.4.3? Also, does Lucene only sort String values?
~~~
When I boost fields while indexing, the fields still have a boost of 1.0
during searching. When I view the values via Luke, it confirms the value
of 1.0. Do I have to boost it agin during search? I want certain fields
to have higher priority/score during search. How do I get it to work? I'm
u
Hi all,
Is it possible to do both case-sensitive and non case-sensitive search
on already indexed documents? If not, is there any way to implement it
without making two indexes for each case? Please help me in this regard.
Thanks in advance
Jibu
Hi,
Does Luke, Lucli, or any of the existing tools enable merging Lucene
indexes?
Thanks,
Oren Shir
42 matches
Mail list logo