Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields

2008-03-19 Thread sandyg
Hi, And is it not passibe to sort on the result we get instead of on all the values like Hits hits = searcher.search(query); and it will be good if got sorting on the hits i.e on the result thanks for the reply markrmiller wrote: > > To sort on 13mil docs will take like at least 400 mb fo

RE: Contrib Highlighter and Phrase search

2008-03-19 Thread Itamar Syn-Hershko
I'm not sure how the current Highlighter works - haven't had the time to look into it yet - but I thought about the following implementation. Judging by your question, this works in a slightly different way than the current Highlighter: 1. Build a Radix tree (PATRICIA) and populate it with all se

Relevance

2008-03-19 Thread luceneuser
Hi All, I need help on retrieving results based on relevance + freshness. As of now, i get based on either of the fields, either on relevance or freshness. how can i achieve this. Lucene retrieves results on relevance but also fetches old results too. i need more relevant results with freshne

Re: Relevance

2008-03-19 Thread Mathieu Lecarme
luceneuser a écrit : Hi All, I need help on retrieving results based on relevance + freshness. As of now, i get based on either of the fields, either on relevance or freshness. how can i achieve this. Lucene retrieves results on relevance but also fetches old results too. i need more relevan

Re: Relevance

2008-03-19 Thread Grant Ingersoll
Have a look at the FunctionQuery capabilities in Lucene in org.apache.lucene.search.function You can use this to have field values factor into the score. -Grant On Mar 19, 2008, at 3:43 AM, luceneuser wrote: Hi All, I need help on retrieving results based on relevance + freshness. As o

IndexReader getFieldNames()

2008-03-19 Thread varun sood
Hi All, Can someone please guide me on how to use IndexReader's getFieldNames() method properly? I want to get all the filed names in the index. Currently I am getitng it via Document object but that not wt i want. I am implementing the code below and what I get is a very long string of character

Re: IndexReader getFieldNames()

2008-03-19 Thread Shai Erera
Can you give an example of the output? What does out.print() do? Does it print spaces between records on new-lines? On Wed, Mar 19, 2008 at 3:17 PM, varun sood <[EMAIL PROTECTED]> wrote: > Hi All, > Can someone please guide me on how to use IndexReader's > getFieldNames() method properly? > I wa

Multi process writer access to an index

2008-03-19 Thread Eran Sevi
Hi, I'm trying to write to a specific index from several different processes and encounter problems with locked files (deletable for example). I don't perform any specific locking because as I understand it there should be file-specific locking mechanism used by lucene API. This doesn't seem to

Re: Multi process writer access to an index

2008-03-19 Thread Erick Erickson
You'll get more meaningful answers if you provide some details: Things that come to mind: op system (windows? *nix?) file system (NFS? local? NTFS?) An example of the error you receive (a stack trace would be good). The code you're executing when you get the error. Imagine you're trying to adv

Multi process writer access to an index

2008-03-19 Thread Eran Sevi
Hi, I'm trying to write to a specific index from several different processes and encounter problems with locked files (deletable for example). I don't perform any specific locking because as I understand it there should be file-specific locking mechanism used by lucene API. This doesn't seem to be

Re: Multi process writer access to an index

2008-03-19 Thread Michael McCandless
Are you using multiple computers? Probably what's happening is: because older versions of Lucene store the lock file in the /tmp directory by default, multiple computers sharing an index will be able to open multiple writers because they have their own /tmp directories. They don't see eac

Re: Multi process writer access to an index

2008-03-19 Thread Eran Sevi
Sorry for any duplicate posts. Actually I'm using the latest "final" Lucene.Net and I hope this problem is not unique to this version. The OS is windows, FS - NTFS. Here's an example of what I do in each process (which may reside on a different computer): writer = new IndexWriter(

Lucene on a cluster environment

2008-03-19 Thread Vinicius Carvalho
Hello there! I have just started with lucene. Bought the Lucene in action book [right now I'm at chap 4, plus the 10th chapter, great explanation by Terence from jGuru, really nice stuff], also I'm reading most that I can at the wiki :) Still a bit lost with some stuff, mostly with clusters :) Our

Re: Lucene on a cluster environment

2008-03-19 Thread Robert . Hastings
We went through this a couple of years ago. I couldn't find the thread in the archive but the jist of it is as follows: 1. We have a singleton thread that does all of the writing. new Documents and deletions are queued to the writer via a database table. 2. Since searchers are "point in time

Fwd: Lucene on a cluster environment

2008-03-19 Thread Vinicius Carvalho
Thanks a lot for sharing this :) I'll try to follow your guidelines Regards -- Forwarded message -- From: <[EMAIL PROTECTED]> Date: Wed, Mar 19, 2008 at 12:16 PM Subject: Re: Lucene on a cluster environment To: java-user@lucene.apache.org We went through this a couple of years a

Adding new documents to index

2008-03-19 Thread Vinicius Carvalho
Hello there! Since I've just begun with lucene, some concepts are kinda new for me :). One of the is the whole indexing process. Well, AFAIK, indexing should happen in a batch process right, to maximize the time spent on this operation. One issue tough, is that our client wants "instants search res

Re: IndexReader getFieldNames()

2008-03-19 Thread varun sood
Hi Shai, The code I pasted is not working.. sorry abt that.. The code which is working is .. Collection c = ir.getFieldNames(IndexReader.FieldOption.ALL); int i = 0; while (c.iterator().hasNext()) { out.print(c.iterator().next();); i++; } This hangs my machine for minutes minutes on

Re: IndexReader getFieldNames()

2008-03-19 Thread Constantin Radchenko
Try this : for (Iterator iter = reader.getFieldNames(FieldOption.ALL).iterator(); iter.hasNext();) { String fieldName = (String)iter.next(); } Your code creates iterator each time when you call next() Also, if your method out.print() gets String as parameter, casting is redundan

RE: Lucene on a cluster environment

2008-03-19 Thread Dragon Fly
Hi Robert, Did you run into any performance issues (because multiple searchers accessed a single index on a shared directory)? Also, did you employ some redundancy scheme to ensure that the shared directory is always "available"? Thank you. > To: java-user@lucene.apache.org > Subject: Re: Lucen

Re: IndexReader getFieldNames()

2008-03-19 Thread mark harwood
You are asking for a new iterator each time around the loop - you'll just be printing the first field forever. - Original Message From: varun sood <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 19 March, 2008 4:26:39 PM Subject: Re: IndexReader getFieldNames()

Re: IndexReader getFieldNames()

2008-03-19 Thread varun sood
Hi Constantin and others, Thanks very much for the reply. The code fragment works. thanks Varun On Wed, Mar 19, 2008 at 12:42 PM, Constantin Radchenko <[EMAIL PROTECTED]> wrote: > Try this : > > for (Iterator iter = reader.getFieldNames(FieldOption.ALL).iterator(); > iter.hasNext();) { >

RE: Lucene on a cluster environment

2008-03-19 Thread Robert . Hastings
No noticeable performance hit, searches are not a bottleneck in our system. We don't have disk redundancy. Dragon Fly <[EMAIL PROTECTED]> 03/19/2008 11:47 AM Please respond to java-user@lucene.apache.org To cc Subject RE: Lucene on a cluster environment Hi Robert, Did you run into

Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields

2008-03-19 Thread Mark Miller
Heres what happens: in order to sort all of the hits you get back on a field, you need to get the value of that field for comparisons right? Well it turns out that reading a field value from the index is pretty slow (its on the disk after all)...so Lucene will read all of the terms in the field

Re: Scoring a query with OR's

2008-03-19 Thread Ghinwa Choueiter
Hi, I emailed a question earlier about the difference between OR and AND in a Boolean query. So in what I am trying to do, I need AND to behave like an OR ( or what I like to call "soft AND"), and I need OR to behave like a logic OR, meaning that I don't want to reward documents that have more

[noobie question] Can't index :(

2008-03-19 Thread Vinicius Carvalho
Hello there! This is really a dumb question, but I just need to get things started :( I'm just trying to get things working here, and I'm not being able to index :(. Here's my code: public abstract class AbstractLuceneIndexer implements LuceneIndexer{ protected String INDEX_DIR = ""; pu

Re: [noobie question] Can't index :(

2008-03-19 Thread Vinicius Carvalho
Doh Sorry, never mind, returning different indexWriter instances :P On Wed, Mar 19, 2008 at 7:21 PM, Vinicius Carvalho < [EMAIL PROTECTED]> wrote: > Hello there! This is really a dumb question, but I just need to get things > started :( I'm just trying to get things working here, and I'm not

Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields

2008-03-19 Thread Daniel Noll
On Thursday 20 March 2008 07:22:27 Mark Miller wrote: > You might think, if I only ask for the top 10 docs, don't i only read 10 > field values? But of course you don't know what docs will be returned as > each search comes in...so you have to cache them all. If it lazily cached one field at a tim

Re: Adding new documents to index

2008-03-19 Thread Michael McCandless
Hi, Adding new documents to the index is not very costly even when the index is large. The newly added documents result in additional segment(s). If you then optimize then that is extremely costly for a large index. Test your search performance to see if you really need to optimize.

Re: Contrib Highlighter and Phrase search

2008-03-19 Thread Daniel Noll
On Wednesday 19 March 2008 18:28:15 Itamar Syn-Hershko wrote: > 1. Build a Radix tree (PATRICIA) and populate it with all search terms. > Phrase queries will be considered as one big string, regardless their > spaces. > > 2. Iterate through your text ignoring spaces and punctuation marks, and for >

RE: Contrib Highlighter and Phrase search

2008-03-19 Thread Itamar Syn-Hershko
I'm building this for my application which uses (will at least) query inflation - no stemming, just basic tokenizing. I'm using Radix and letter based lookup (which is how Radix works) since I want to execute the highlighting on large documents too, possibly with a lot of terms (since I'm inflatin

Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields

2008-03-19 Thread Chris Hostetter
: You might think, if I only ask for the top 10 docs, don't i only read 10 field : values? But of course you don't know what docs will be returned as each search : comes in...so you have to cache them all. Arguements have been made in the past that when you have an index large enough that the Fi

Re: Scoring a query with OR's

2008-03-19 Thread Chris Hostetter
: I emailed a question earlier about the difference between OR and AND in a : Boolean query. So in what I am trying to do, I need AND to behave like an OR ( : or what I like to call "soft AND"), and I need OR to behave like a logic OR, : meaning that I don't want to reward documents that have more

Re: Relevance

2008-03-19 Thread Karl Wettin
luceneuser skrev: Hi All, I need help on retrieving results based on relevance + freshness. As of now, i get based on either of the fields, either on relevance or freshness. how can i achieve this. Lucene retrieves results on relevance but also fetches old results too. i need more relevant r