Hi guys,
Now an idea knock my brain, which I want to integrate the lucene into my
ruby application. And the newest lucene api owns the interface to join the
ruby application. UnfortunatelyI have no experience about it. Let us talk
about it.
--
Best Regards
Cooper Geng
Hi Ghinwa,
A Term is simply a unit of tokenization that has been indexed for a
Field, produced by a TokenStream. In the demo, on the main site,
this can be seen in the file called IndexFiles.java on line 56:
IndexWriter writer = new IndexWriter(INDEX_DIR, new
StandardAnalyzer(), true, Ind
Hi,
I am new to Lucene and have been reading the documentation. I would like to use
Lucene to query a song database by lyrics. The query could potentially contain
typos, or even wrong words, word contractions (can't versus cannot), etc..
I would like to create an inverted list by word pairs and
By custom phrase query class I was trying to ask if it would be possible, or
even a good idea, to create a modified PhraseQuery class that is more
efficient that span queries (as I only want to use it for phrases). This
class might have multiple possible terms generated from a regex at a certain
po
Stephane, check out the last 2 links in http://www.simpy.com/group/363 , they
are for geospatial searching with Lucene.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Stephane Nicoll <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Se
Hello - opening a new IndexSearcher for every request is not the thing to do.
Reuse a single IndexSearcher instance. This must be in the FAQ. :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: techkatta <[EMAIL PROTECTED]>
> To: java-user@l
We've implemented a custom sort class and use it to sort by distance. We
have implemented the equals and hashcode in the sort comparator. After
running for a few hours we're reaching peak memory usage and eventually the
server runs out of memory. We did some profiling and noticed that a large
We have the same situation and use an atomic counter. Basically, we have
a SearcherHolder class and a SearcherManager class. The SearcherHolder
holds the searcher and the number of threads referencing the searcher.
When the thread that writes to the index closes the index, it sends an
event
Check out www.browseengine.com, it is an open source meta engine on top of
lucene.
-John
On Feb 17, 2008 2:22 AM, Stephane Nicoll <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've been browsing the archive and the documentation about Lucene. It
> really seems that it could help implementing my use case b
I am using the Lucene in the EJB enviornment with Berkeley DB JE as a data
store using the JCA on JBoss 4.2.0
My question is using Lucene in EJB enviornment is suggestable or not ?
For every request i am trying to open the IndexSearcher object and while
exiting from the EJb i am closing. It's g
Good point Jan!
On Feb 18, 2008, at 9:00 AM, Jan Peter Stotz wrote:
Grant Ingersoll wrote:
Note: ENCODING is whatever encoding the file is in, as in "UTF-8",
if that is what your files are in.
I think there is a misunderstanding, the WordExtractor extracts text
from MS Word (.doc) files.
Grant Ingersoll wrote:
Note: ENCODING is whatever encoding the file is in, as in "UTF-8", if
that is what your files are in.
I think there is a misunderstanding, the WordExtractor extracts text
from MS Word (.doc) files. Those files are binary and therefore does not
have an encoding.
I wou
Not sure about WordExtractor, does it also take a Reader? I would try:
Reader input = new InputStreamReader(new FileInputStream(file),
"ENCODING");
WordExtractor extractor = new WordExtractor(input);
content = extractor.getText();
Note: ENCODING is whatever encoding the file is in, as in "UT
No problem about the misunderstanding.
I am using
InputStream input =new URL ( "file:///"+file.getAbsolutePath()
).openStream ();
WordExtractor extractor = new WordExtractor(input);
content=extractor.getText();
where the wordextractor is org.apache.poi.hwpf.extractor.WordExtractor;
The word
How are you loading the document into the content variable below? My
guess is still that you have different locales on Windows and Ubuntu.
(Btw, sorry about the java-user comment. I should wake up before
sending responses. For some reason I thought the email was sent to
java-dev)
-Gran
Actually what i figured out just now is that the problem is on the indexing
part. A document with a 15MB size is transformed in a 23MB index which is
not normal since on windows for the same document the index is 3MB. For the
indexing i use:
writer = new IndexWriter(index, new GreekAnalyzer(), !in
This question is best asked on java-user. However, my guess is that
it is related to your Locale and that you need to set the character
encoding to Greek on Ubuntu when reading in your files.
Something like: Reader reader = new InputStreamReader(new
FileInputStream(file), "GREEK Char Enco
Hello!
I ve written a sample application which indexes documents written in Greek
using the GreekAnalyzer and search these documents with both greek and
english words. Though on Windows the searching returns correct results, if i
try it on Ubuntu the searching does not return any results for any g
18 matches
Mail list logo