We'll see, the blind men said.
Otis
- Original Message
From: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, January 27, 2007 6:28:16 AM
Subject: Re: Is the new version of the Lucene book available in any form?
: LIA2 will happen, but Lucene is under
A single index with an id field sounds like a fine approach here.
Otis
- Original Message
From: Joost Schouten <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, January 27, 2007 6:40:51 AM
Subject: lucense index/document architecture
Hi,
I'm setting up lucene to work
Hai ,
I was trying to store to document id's external.
I have found that lucene generates document id's linearly starting
from 0 and are not changed until any document is deleted.
but it did work for me.
Was the above one correct ? if not who could I store document id's
exte
To steal a phrase from Mr. Hatcher... it depends . I'd try keeping it all
in one index at the start until you get some clue how big the index will
eventually grow to and whether your searching is acceptable. Do you have any
idea how big the raw data you're going to ask the index to hold? 1M? 1G?,
Hi,
I'm setting up lucene to work with our webapp to index a database. My db
holds files which can belong to a user or a company or both. I want the
option for my users to search across all content, but also search within the
files for one user or company. What is the best architecture approach fo
regular Lucene BooleanQueries should work fine for this ... but you may
want to customize your Similarity so that the idf and lengthNorms aren't a
factor .. you may want to take the tf out of hte picture too (if you care
more about matching lots of terms and less about matching one term lots of
ti
: LIA2 will happen, but Lucene is undergoing a lot of changes, so Erik and
: I are going to wait a little more for development to calm down
: (utopia?).
you're waiting for Lucene development to calm down? ... that could be a
long wait.
-Hoss
---
Thanks for that Erick, it was a great help in clearing up how the mechanism
works. I have it working now, here is the changed bits method (I would
appreciate any advice you/anyone might have particularly around efficiency -
thanks again):
public BitSet bits(IndexReader reader) throws IOExcept
This is a deficiency in the highlighter functionality that has been
discussed several times before. The summary is - not a trivial fix.
See here for background:
http://marc2.theaimsgroup.com/?l=lucene-user&m=114631181214303&w=1
http://www.gossamer-threads.com/lists/engine?do=post_view_printa
I am successfully using lucene in our application to index 12 different
types of objects located in a database, and their relationships to each
other to provide some nice search functionality for our website. We are
building lots of lucene queries programmatically to filter based upon
categori
Hi,
I'm wondering what the best way is to do highlighting of multiword phrases.
For example, if a search is for "president kennedy", how can I make sure
that "president" is only highlighted if it is next to "kennedy" and
"president" in "president clinton" is not.
I haven't figured out where in the
Funny, I was looking to do the same thing the other day and gave up thinking it
wasn't possible, not being aware of setOmitNorms(). Yeah, a javadoc patch
would be welcome.
Otis
- Original Message
From: Nadav Har'El <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, Jan
It really all dependsright Erik?
On the hardware you are using, complexity of queries, query concurrency, query
latency you are willing to live with, the size of the index, etc. A few
million sounds small even for average/cheap hw. I have several multi-million
document indices that are con
Hi,
I believe CLucene (C++, not C) is getting a lot of exercise, but you should
really ask about production usage on its list.
LIA2 will happen, but Lucene is undergoing a lot of changes, so Erik and I are
going to wait a little more for development to calm down (utopia?).
Otis
- Original
: I used String because the timestamp is a Long and there wasn't any
: SortField.LONG (I guess I should have used SortField.CUSTOM). In this
: case, what should the indexing call look like? Currently, I have:
: doc.add(new
Field("timestamp",Long.toString(timestamp),Field.Store.NO,Field.Index
I think you're only setting one bit in your filter.
You're docs array is only one cell long, and your termDocs.read reads up to
the length of docs (exactly one in this case) entries. So, you're getting
only one doc ID. And setting it. Even if you made your array larger, you
would only set one bec
The current LIA book, while written to the 1.4 code base is a very good
place to start. There will be some incompatibilities with the 2.0 codebase,
but they're relatively minor.
I guess I'm really recommending that you go ahead and spend the bucks on the
current version, it'll be money well spent
Thanks for the insight Chris. You are right-- I was trying to avoid the
FieldCache hit. Because the index is updated frequently, we have to keep
discarding our IndexSearcher.
I used String because the timestamp is a Long and there wasn't any
SortField.LONG (I guess I should have used SortField.
Hi,
I am going mad trying to find out what I am doing wrong with my custom filter
implementation (almost an exact copy of SpecialsFilter from LIA). I have put
together a quick sample to illustrate my problem, if some kind soul has 2
minutes to take a quick look and tell me where I am being so s
I notice that the Lucene book offered by Amazon was published in
2004. I saw some mail on the subject of a new edition.
Is the new edition available in any form?
I promise to buy the new edition as soon as it comes out even if I
get some of the material early. I wrote a book which was publ
Grant,
Is that on a single machine? If so, what kind of hardware specs does the
machine have? I guess you're using a 64-bit JVM?
A slightly unrelated question: if a query matches all the documents in the
index, does that cause the entire index to get loaded into RAM ?
- Original Message
in my applications JVM throws [java.lang.OutOfMemoryError: Java heap
space] when too many java classes has been loaded and/or when i use some
byte code manipulation libraries ... (hibernate, asm, cglib for example) -
JVM has no more memory for compile bytecode.
On Fri, 26 Jan 2007 19:46:06
oh thanks then:)
ÐÑÑÑовалов ÐиÑ
аил <[EMAIL PROTECTED]> wrote: in your java
command line, of course :)
Example : java -Xms128m -Xmx1024m -server -Djava.awt.headless=true
-XX:MaxPermSize=128m protei.Starter
On Fri, 26 Jan 2007 19:39:13 +0300, maureen tanuwidjaja
wrote:
>
in your java command line, of course :)
Example : java -Xms128m -Xmx1024m -server -Djava.awt.headless=true
-XX:MaxPermSize=128m protei.Starter
On Fri, 26 Jan 2007 19:39:13 +0300, maureen tanuwidjaja
<[EMAIL PROTECTED]> wrote:
E...where shall I put that" -XX:MaxPermSize=128m"?
Th
E...where shall I put that" -XX:MaxPermSize=128m"?
Thanks Pustovalov
Regards,
Maureen
ÐÑÑÑовалов ÐиÑ
аил <[EMAIL PROTECTED]> wrote: try this :
-XX:MaxPermSize=128m
On Fri, 26 Jan 2007 19:32:45 +0300, maureen tanuwidjaja
wrote:
> Hi Mike and Eric
try this : -XX:MaxPermSize=128m
On Fri, 26 Jan 2007 19:32:45 +0300, maureen tanuwidjaja
<[EMAIL PROTECTED]> wrote:
Hi Mike and Erick and all,
I have fixed my code and yes,indexing is much faster than previously
when I do such "hammering" with IndexWriter
However,I am now encountering th
Hi Mike and Erick and all,
I have fixed my code and yes,indexing is much faster than previously when I
do such "hammering" with IndexWriter
However,I am now encountering the error while indexing
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
This error n
I just indexed a collection w/ 15+ million docs in one index. Index
size is roughly 42 gb.
On Jan 26, 2007, at 12:45 AM, Bill Taylor wrote:
I have used Lucene to index a small collection - only a few hundred
documents. I have a potential client who wants to index a
collection which will
I don't believe there is any b-tree strategy in Lucene. I would say
that it is segment based, I guess, in that it indexes documents in
memory based on your merge factors and then flushes to disk, at then
end you can choose to merge the segments together via optimize(). I
find it to have a
I went through that document. It mentions about the Lucene's Indexing
algorithm that it uses incremental algorithm. So, can i say that it uses a
combination of segment-based and b-tree based strategies. If i am wrong
please correct me.
On 1/26/07, Damien McCarthy <[EMAIL PROTECTED]> wrote:
This
I'm aware of a single index with the following characteristics:
Single index size = 33.2GB
Documents: 263 million
Searchable fields = 7
Query Response times: <1 second for a single term search
Anything from 5-20 seconds for more complex searches (e.g fuzzy matching on
multiple fields)
This is
This document should contain the information you need :
http://lucene.sourceforge.net/talks/inktomi/
Damien.
-Original Message-
From: Sairaj Sunil [mailto:[EMAIL PROTECTED]
Sent: 26 January 2007 03:22
To: java-user@lucene.apache.org
Subject: Re: Lucene Indexing
Hi
I was asking what exac
Bill Taylor wrote:
I have used Lucene to index a small collection - only a few hundred
documents. I have a potential client who wants to index a collection
which will start at about a million documents and could easily grow to
two million.
Has anyone used Lucene with an index that large?
I
26 jan 2007 kl. 06.45 skrev Bill Taylor:
I have used Lucene to index a small collection - only a few hundred
documents. I have a potential client who wants to index a
collection which will start at about a million documents and could
easily grow to two million.
The maximum number of d
34 matches
Mail list logo