Hi all, I need information about which version of Apache Lucene search
engine you recommend,
which is stable and recomended for production or mission-critical
systems.
Thanks in advance
Regards,
Sam
=
The information transmitted is intended only for the
Hi,
Looking at your problem I can think of one solution for small and
*midsize* result sets. (And I have to say it may be similar to what
Aleksander proposes).
Write workaround query in the following form:
select addfield from (
select addfield, generated_counter from table where id = 2
union
Dear List,
I am using lucene to count the number of hits of queries in documents
(ie taking raw frequencies as scores), which seems to work fairly well
using a modified Similarity, returning freq for tf and 1.0 for everyting
else, and a HitCollector to collect the hits.
I also want to allow 'pref
On 6/30/06, Dominik Bruhn <[EMAIL PROTECTED]> wrote:
SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);
Where LUCENERESULT is like 2,3,19,3,5.
This works fine but got one problem: The Search-Result of Lucene is order by
relevance and so the id-list is also sorted by relevance. But the
On Monday 03 July 2006 19:52, Patricio wrote:
> Hello, I'm novice in Java.
> I try to understand how the query terms are matched with the index terms to
> calculate the Hits.
>
> I thought that the class "IndexSearcher" was responsible for this process,
> but apparently the classes "Scorer" and "H
Hello, I'm novice in Java.
I try to understand how the query terms are matched with the index terms to
calculate the Hits.
I thought that the class "IndexSearcher" was responsible for this process,
but apparently the classes "Scorer" and "HitCollector" are essential to
determine the retrieved docu
Well, *assuming* that you're working in Java, you can't predict very much
about when the garbage collector actually goes about freeing memory.
Depending on how memory is measured, you may or may not be getting an
accurate count.
I wonder what would happen if you allowed the JVM only a *little* mo
Hi everyone,
I am working on a project with around 35000 documents (8 text fields with
256 chars at most for each field) on lucene. But unfortunately this index is
updated at every moment and I need that these new items be in the results of
my search as fast as possible.
I have an IndexSearcher,
Hi,
thanks Erick for the answer.
The problem is that I am using Lucene through the Hibernate support, to map
trasparently Java domain entities to a file system Lucene index (no support
for a RAM Index at the moment, as far as I saw).
So some of my unit tests (which collaborate at some level with
Don't know if this helps or hurts, but my approach for unit tests was to
implement an index in a RAMdir for each test, index enough documents for my
tests that I could strictly control and just do searches, man...
True, the weakness was that the data sets are very small, and this more of a
"black
On Mon, 3 Jul 2006, mcarcelen wrote:
I´ve used the classes "org.apache.poi.hslf.extractor.PowerPointExtractor"
and "org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor" with
lucene2.0 to extract text but when I try to use the other classes such as
"org.apache.poi.hslf.HSLFSlideShow", "org.a
Hi,
I wanted just to share my issues with unit testing a component collaborating
with a Hits object.
The scenario is: I have a web page pagination component (say, it shows N
results per page) over the Hits results found in the Lucene index.
I want to test the pagination itselft, so I would like
hehe that works.. its now racing thourgh 10 000 docs in a couple seconds :)
2006/7/3, Aleksander M. Stensby <[EMAIL PROTECTED]>:
Ah, didnt see that, yeah, you should have something like
new IndexWriter..
for each document, writer.add
writer.optimize()
writer.close()
batching it up wi
Hi all!
I´ve used the classes "org.apache.poi.hslf.extractor.PowerPointExtractor"
and "org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor" with
lucene2.0 to extract text but when I try to use the other classes such as
"org.apache.poi.hslf.HSLFSlideShow", "org.apache.poi.hslf.record.Record"
I select it in parts, chunks of 5000 records with the limit keyword..
the thing is it starts very fast..but then slows down so i doubt it
has to do with tokenizing
2006/7/3, Aleksander M. Stensby <[EMAIL PROTECTED]>:
My guess is if that you actually do a complete select * from you db, and
mana
thanks it is working fine now
DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Pvt. Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient,
Ah, didnt see that, yeah, you should have something like
new IndexWriter..
for each document, writer.add
writer.optimize()
writer.close()
batching it up will make it faster, yes
On Mon, 03 Jul 2006 11:43:03 +0200, Volodymyr Bychkoviak
<[EMAIL PROTECTED]> wrote:
Problem is hidden
Problem is hidden in these lines:
> writer.optimize();
> writer.close();
You should keep one index writer open for all document additions and
close it only after adding last document.
Optimize() merges all index segments to single segment and as index
grows it takes longer and lon
My guess is if that you actually do a complete select * from you db, and
manage all objects all at once, this will be a problem for your jvm, maybe
running out of memory is the problem you encounter, strings tend to be a
bit of a memory issue in java :(
My suggestion is that you do paginati
When i start the program its fast.. about 10 docs per second. but
after about 15000 it slows down very much. Now it does 1 doc per
second and it is at nr# 40 000 after a whole night indexing. These are
VERY small docs with very little information.. THis is what and how i
index it:
Document d
20 matches
Mail list logo