Re: Exceptions in merge thread (while optimizing) causing problems with subsequent reopens

2009-04-09 Thread Khawaja Shams
Mike, I am sorry for wasting your time :). There were indeed two threads that were performing this operation. Out of curiosity, which part of this is not thread safe? An indexreader reopening while a commit is going on? Thanks again for your help. Regards, Khawaja On Thu, Apr 9, 2009 at 5:44

Re: Exceptions in merge thread (while optimizing) causing problems with subsequent reopens

2009-04-09 Thread Michael McCandless
That code looks right. Are there multiple threads that may enter it? Can you show the code where you create the IndexWriter, add docs, etc? Can you call IndexWriter.setInfoStream for the entire life of the index, up until when the optimize error happens, and post back? Mike On Thu, Apr 9, 2009

Re: Exceptions in merge thread (while optimizing) causing problems with subsequent reopens

2009-04-09 Thread Khawaja Shams
Hi Michael, Thanks for the quick response. I only have one IndexWriter, and there are no other processes accessing this particular index. I have tried deleting the entire index and reconstructing it, but the index corruption is repeatable. Incidentally, there are no new writes since the last comm

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-09 Thread Koji Sekiguchi
Dan OConnor wrote: Thanks for the feed back Chris. Can you (or someone else on the list) tell me about the IndexMerge tool? Please see: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/misc/IndexMergeTool.html Koji -

Re: Exceptions in merge thread (while optimizing) causing problems with subsequent reopens

2009-04-09 Thread Michael McCandless
These are serious corruption exceptions. Is it at all possible two writers are accessing the index at the same time? Can you describe more about how you're using Lucene? Mike On Thu, Apr 9, 2009 at 7:59 PM, Khawaja Shams wrote: > Hello, >  I am having a problem with reopening the IndexReader w

RE: Help to determine why an optimized index is proportionaly too big.

2009-04-09 Thread Dan OConnor
Thanks for the feed back Chris. Can you (or someone else on the list) tell me about the IndexMerge tool? Thanks Dan -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, April 09, 2009 6:46 PM To: java-user@lucene.apache.org Subject: Re: Help to det

Exceptions in merge thread (while optimizing) causing problems with subsequent reopens

2009-04-09 Thread Khawaja Shams
Hello, I am having a problem with reopening the IndexReader with Lucene 2.4 ( I updated to 2.4.1, but still no luck). The exception is preceded by an exception in optimizing the index. I am not reopening the reader while the commit or optimization is going on in the writer (optimizing happens in

Re: Lucene help with query

2009-04-09 Thread Koji Sekiguchi
John Seer wrote: Koji Sekiguchi-2 wrote: If you omit norms when indexing the name field, you'll get same score back. Koji During building I set omit norms, but result doesn't change at all. I am still getting the same score I meant if you set nameField.setOmitNorms( true ), you'

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-09 Thread Michael McCandless
On Thu, Apr 9, 2009 at 6:46 PM, Chris Hostetter wrote: > > : The second stage index failed an optimization with a disk full exception > : (I had to move it to another lucene machine with a larger disk partition > : to complete the optimization. Is there a reason why a 22 day index would > : be 10x

Re: Lucene Filtering

2009-04-09 Thread Chris Hostetter
: How do you create a Lucene Filter to check if a field has a value? It is : part for a ChainedFilter that I am creating. take a look at RangeFilter ... you want a RangeFilter on your field name where the upper and lower bounds are both null. -Hoss

Re: SpellChecker AlreadyClosedException issue

2009-04-09 Thread Chris Hostetter
: My code looks like this: : : Directory dir = null; : try { :dir = FSDirectory.getDirectory("/path/to/dictionary"); :SpellChecker spell = new SpellChecker(dir); // exception thrown here :// ... :dir.close(); : This code works, but in a highly concurrent situation AlreadyClosedEx

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-09 Thread Chris Hostetter
: The second stage index failed an optimization with a disk full exception : (I had to move it to another lucene machine with a larger disk partition : to complete the optimization. Is there a reason why a 22 day index would : be 10x the size of an 8 day index when the document indexing rate is

Re: Index in text format

2009-04-09 Thread Paul Elschot
On Thursday 09 April 2009 21:56:44 Andy wrote: > Is there a way to have lucene to write index in a txt file? No. You could try a hexdump of the index file(s), but that isn't really human readable. Instead of that you may want to try Luke: http://www.getopt.org/luke/ Regards, Paul Elschot

Index in text format

2009-04-09 Thread Andy
Is there a way to have lucene to write index in a txt file?

Re: Vector space implemantion

2009-04-09 Thread Grant Ingersoll
Sounds quite interesting. You might be interested in http://wiki.apache.org/solr/TermVectorComponent . If anything, the Solr code will show you how to do get the information out of the Lucene index. On Apr 9, 2009, at 1:01 PM, Andy wrote: Well, I'm planning to have the term weights (assu

Re: Lucene help with query

2009-04-09 Thread John Seer
Koji Sekiguchi-2 wrote: > > If you omit norms when indexing the name field, you'll get same score > back. > > Koji > During building I set omit norms, but result doesn't change at all. I am still getting the same score -- View this message in context: http://www.nabble.com/Lucene-help-w

Re: Query any data

2009-04-09 Thread Erick Erickson
searching for fieldname:* will be *extremely* expensive as it will, by default, build a giant OR clause consisting of every term in the field. You'll throw MaxClauses exceptions right and left. I'd follow Tim's thread lead first Best Erick 2009/4/8 王巍巍 > first you should change your querypa

Re: Vector space implemantion

2009-04-09 Thread Andy
Well, I'm planning to have the term weights (assume in a matrix) and then using an adaptive learning system transform them into a new weights in such a way that index formed of these be optimized. Its just a test to see if this hypothesis is working or not. --- On Thu, 4/9/09, Grant Ingersoll

Re: Problem with ranking in lucene

2009-04-09 Thread Grant Ingersoll
Your best bet is to look into the explanations of each of these documents in the context of your query via the explain() method on the Searcher (IndexSearcher). If I had to venture a guess, the docs w/ only one term have a higher TF/IDF value (I would even venture to guess that they contai

Re: Suggestive Search

2009-04-09 Thread Konstantyn Smirnov
I implemented the suggestions-feature for a couple of web-sites. an example can be seen on http://www.genios.de/r_firmen/webcgi?START=016&SEITE=firmenk_d.ein&DBN=&WID=01852-8850939-00904_3 genios.de . type smth in in the Firma and Person fields. The Firma-index has 3++ mio records, Person ~ 1.

Re: Vector space implemantion

2009-04-09 Thread Grant Ingersoll
Assuming you want to handle the vectors yourself, as opposed to relying on the fact that Lucene itself implements the VSM, you should index your documents with TermVector.YES. That will give you the term freq on a per doc basis, but you will have to use the TermEnum to get the Doc Freq. A

Problem with ranking in lucene

2009-04-09 Thread Ariel
Hi everybody: I have a question about the ranking of lucene. Here I have the problem: when I do a search in my index like this: bank OR transference I get 10 results, the first two documents that are returned have the both terms in the content field but then the 3th, 4th and 5th only has the word

Re: Suggestive Search

2009-04-09 Thread 王巍巍
In my project, i stored the user input keyword in the database, as a result, I build a index from the database and use it to do suggestive search. The code example is googled and I changed the analyzer and query function. I attach the code but you have to modify the code to make it run. For chines

Vector space implemantion

2009-04-09 Thread Andy
Hello all, I'm new to lucene and trying to implement a vector space model using lucene. I need to have a file (or on memory) with TF/IDF weight of each term in each document. (in fact that is a matrix with documents presented as vectors, in which the elements of each vector is the TF weight ...

[Fwd: Vector space implemantion]

2009-04-09 Thread John Byrne
Hi - wrong address! Forwarding this to the mailing list... --- Begin Message --- Hello all, I'm new to lucene and trying to implement a vector space model using lucene. I need to have a file (or on memory) with TF/IDF weight of each term in each document. (in fact that is a matrix with docume

Re: query c++

2009-04-09 Thread John Byrne
Hi, This came up before, a while ago: http://www.nabble.com/searching-for-C%2B%2B-to18093942.html#a18093942 I don't think there is an easier way than modifying the standard analyzer. As I suggested in that earlier thread, I would make the analyzer recognize token patterns that consist of wor

Re: query c++

2009-04-09 Thread 王巍巍
to be detailed, I implemented a ftp search engine for campus students. I have handle many different words including chinese words, as a result I can't only use whitespaceanalyzer. My analyzer is now like this: StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym)

Re: query c++

2009-04-09 Thread hyj
王巍巍,您好! WhitespaceAnalyzer can work. === 2009-04-09 15:15:14 您在来信中写道:=== >I want to make my lucene can search word like c++, c#, how can i modify my >analyzer to achieve this goal? > >-- >王巍巍(Weiwei Wang) >Department of Computer Science >Gulou Campus of Nanjing University >Nanj

query c++

2009-04-09 Thread 王巍巍
I want to make my lucene can search word like c++, c#, how can i modify my analyzer to achieve this goal? -- 王巍巍(Weiwei Wang) Department of Computer Science Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Mobile: 86-13913310569 MSN: ww.wang...@gmail.com Homepage: http://cs.nju.edu