Re: query: order of search

2010-04-02 Thread Erick Erickson
I'm pretty sure that order doesn't matter. Again, though, don't worry about this level of trick until you can demonstrate performance issues, your time is usually best spent in other places Best Erick On Thu, Apr 1, 2010 at 11:54 PM, wrote: > Hello Erick, > > I was trying to optimise the se

RE: IndexWriter and memory usage

2010-04-02 Thread Woolf, Ross
I have this and the heap dump is 63mb zipped. The info stream is much smaller (31 kb zipped), but I don't know how to get them to you. We are not using the NRT readers -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Thursday, April 01, 2010 5:21 P

Re: custom scoring help

2010-04-02 Thread Christopher Tignor
This code is in fact working. I had an error in my test case. Things seem to work as advertised. sorry / thanks - C>T> On Fri, Apr 2, 2010 at 10:20 AM, Christopher Tignor wrote: > Hello, > > I'm having a hard time implementing / understanding a very simple custom > scoring situation. > > I ha

custom scoring help

2010-04-02 Thread Christopher Tignor
Hello, I'm having a hard time implementing / understanding a very simple custom scoring situation. I have created my Similarity class for testing which overrides all the relevant (I think) methods below, returning 1 for all but coord(int, int) which returns q / maxOverlap so scores are scaled bet

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-02 Thread Grant Ingersoll
On Apr 1, 2010, at 11:13 PM, Michel Nadeau wrote: > My big question is how do you loop 1M records, sum up field(s), and then > sort on that field... all in memory (could use too much ram) ? In a > temporary index (could take a while to re-write a lot of documents in a new > index) ? > You're g

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-02 Thread prasenjit mukherjee
Pig generally takes csv-type flat files as input. And then you do join/group-by/sum/count etc on the variables ( aka relations ) For Michael's example with following data: *Affiliate / SaleDate / SaleAmount* * mike / 2010-03-01 / 10.00 * john / 2010-03-01 / 10.00 One can write following pig-scri

Re: Memory use and Lucene

2010-04-02 Thread Michael McCandless
OS level tools (top, ps, activity monitor, task manager) aren't great ways to measure Java's memory usage, since they only see how much heap java has allocated from the OS. Within that heap, java can have lots of free space that it knows about but the OS does not (this is Runtime.freeMemory()). Y

Re: Designing a multilingual index

2010-04-02 Thread henrib
I agree that if you dont know the "source" language - or can't determine it - there is a lot of uncertainty in trying to transmogriphy the query from one language to another! TIKA and Nutch do have language determination tools though (ngram profiles if I'm not mistaken). And you also can interact

Re: Designing a multilingual index

2010-04-02 Thread Paul Libbrecht
Le 01-avr.-10 à 16:29, henrib a écrit : By issuing multiple queries, one against each localized index, results being clustered by locale. You can further refine by translating the end-user input query terms for each locale and issue "translated" queries against the respective indices. I've