Rails and lucene

2008-02-18 Thread coolgeng coolgeng
Hi guys, Now an idea knock my brain, which I want to integrate the lucene into my ruby application. And the newest lucene api owns the interface to join the ruby application. UnfortunatelyI have no experience about it. Let us talk about it. -- Best Regards Cooper Geng

Re: How to index word-pairs and phrases

2008-02-18 Thread Grant Ingersoll
Hi Ghinwa, A Term is simply a unit of tokenization that has been indexed for a Field, produced by a TokenStream. In the demo, on the main site, this can be seen in the file called IndexFiles.java on line 56: IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true, Ind

How to index word-pairs and phrases

2008-02-18 Thread Ghinwa Choueiter
Hi, I am new to Lucene and have been reading the documentation. I would like to use Lucene to query a song database by lyrics. The query could potentially contain typos, or even wrong words, word contractions (can't versus cannot), etc.. I would like to create an inverted list by word pairs and

Re: regex expressions within phrase queries

2008-02-18 Thread Jim Bogan
By custom phrase query class I was trying to ask if it would be possible, or even a good idea, to create a modified PhraseQuery class that is more efficient that span queries (as I only want to use it for phrases). This class might have multiple possible terms generated from a regex at a certain po

Re: Using lucene with a Geospatial catalog

2008-02-18 Thread Otis Gospodnetic
Stephane, check out the last 2 links in http://www.simpy.com/group/363 , they are for geospatial searching with Lucene. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Stephane Nicoll <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Se

Re: Lucene in EJB enviornment

2008-02-18 Thread Otis Gospodnetic
Hello - opening a new IndexSearcher for every request is not the thing to do. Reuse a single IndexSearcher instance. This must be in the FAQ. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: techkatta <[EMAIL PROTECTED]> > To: java-user@l

FieldSortedHitQueue rise in memory

2008-02-18 Thread Brian Doyle
We've implemented a custom sort class and use it to sort by distance. We have implemented the equals and hashcode in the sort comparator. After running for a few hours we're reaching peak memory usage and eventually the server runs out of memory. We did some profiling and noticed that a large

Re: how to safely periodically reopen the IndexReader?

2008-02-18 Thread Robert . Hastings
We have the same situation and use an atomic counter. Basically, we have a SearcherHolder class and a SearcherManager class. The SearcherHolder holds the searcher and the number of threads referencing the searcher. When the thread that writes to the index closes the index, it sends an event

Re: Using lucene with a Geospatial catalog

2008-02-18 Thread John Wang
Check out www.browseengine.com, it is an open source meta engine on top of lucene. -John On Feb 17, 2008 2:22 AM, Stephane Nicoll <[EMAIL PROTECTED]> wrote: > Hi, > > I've been browsing the archive and the documentation about Lucene. It > really seems that it could help implementing my use case b

Lucene in EJB enviornment

2008-02-18 Thread techkatta
I am using the Lucene in the EJB enviornment with Berkeley DB JE as a data store using the JCA on JBoss 4.2.0 My question is using Lucene in EJB enviornment is suggestable or not ? For every request i am trying to open the IndexSearcher object and while exiting from the EJb i am closing. It's g

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread Grant Ingersoll
Good point Jan! On Feb 18, 2008, at 9:00 AM, Jan Peter Stotz wrote: Grant Ingersoll wrote: Note: ENCODING is whatever encoding the file is in, as in "UTF-8", if that is what your files are in. I think there is a misunderstanding, the WordExtractor extracts text from MS Word (.doc) files.

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread Jan Peter Stotz
Grant Ingersoll wrote: Note: ENCODING is whatever encoding the file is in, as in "UTF-8", if that is what your files are in. I think there is a misunderstanding, the WordExtractor extracts text from MS Word (.doc) files. Those files are binary and therefore does not have an encoding. I wou

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread Grant Ingersoll
Not sure about WordExtractor, does it also take a Reader? I would try: Reader input = new InputStreamReader(new FileInputStream(file), "ENCODING"); WordExtractor extractor = new WordExtractor(input); content = extractor.getText(); Note: ENCODING is whatever encoding the file is in, as in "UT

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread kratoras
No problem about the misunderstanding. I am using InputStream input =new URL ( "file:///"+file.getAbsolutePath() ).openStream (); WordExtractor extractor = new WordExtractor(input); content=extractor.getText(); where the wordextractor is org.apache.poi.hwpf.extractor.WordExtractor; The word

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread Grant Ingersoll
How are you loading the document into the content variable below? My guess is still that you have different locales on Windows and Ubuntu. (Btw, sorry about the java-user comment. I should wake up before sending responses. For some reason I thought the email was sent to java-dev) -Gran

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread kratoras
Actually what i figured out just now is that the problem is on the indexing part. A document with a 15MB size is transformed in a 23MB index which is not normal since on windows for the same document the index is 3MB. For the indexing i use: writer = new IndexWriter(index, new GreekAnalyzer(), !in

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread Grant Ingersoll
This question is best asked on java-user. However, my guess is that it is related to your Locale and that you need to set the character encoding to Greek on Ubuntu when reading in your files. Something like: Reader reader = new InputStreamReader(new FileInputStream(file), "GREEK Char Enco

Problem using Lucene on Ubuntu

2008-02-18 Thread kratoras
Hello! I ve written a sample application which indexes documents written in Greek using the GreekAnalyzer and search these documents with both greek and english words. Though on Windows the searching returns correct results, if i try it on Ubuntu the searching does not return any results for any g