Re: a proof that every word is indexing properly

2010-12-01 Thread Lance Norskog
This is what unit tests are for. On Wed, Dec 1, 2010 at 6:57 PM, David Fertig wrote: > Stop words are never indexed; you may need to empty your stop list. > > Luke (open-source w/code available) can browse and re-create documents > in indexes using their terms already.  Compare that to the origin

RE: a proof that every word is indexing properly

2010-12-01 Thread David Fertig
Stop words are never indexed; you may need to empty your stop list. Luke (open-source w/code available) can browse and re-create documents in indexes using their terms already. Compare that to the original to see if you are satisfied. -Original Message- From: David Linde [mailto:davidli

a proof that every word is indexing properly

2010-12-01 Thread David Linde
Has anyone figured out a way to logically prove that lucene indexes ever word properly? Our company has done alot of research into lucene, all of our IT department is really impressed and excited about lucene *except* one of the older search/indexing experts. Who doesn't want to move to a new sear

Re: Hit score and confidence ratio in results

2010-12-01 Thread Erick Erickson
If I understand your problem space, you're trying to compare scores across two different queries. Don't do this, it is meaningless. Scores are only valid for comparing documents' relevance within a single query. How would one compare scores between two queries like text:nonsense name:terrible ? Dif

RE: Wikileaks Iraq log

2010-12-01 Thread Uwe Schindler
The question is: What does this have to do with Lucene? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Seid Muhie [mailto:seidy...@gmail.com] > Sent: Wednesday, December 01, 2010 4:05 PM > To: java-user

Re: Wikileaks Iraq log

2010-12-01 Thread Petite Abeille
On Dec 1, 2010, at 7:29 AM, Seid Muhie wrote: > anybody who can give me a hint please http://lmgtfy.com/?q=WikiLeaks+War+Diary%3A+Iraq+War+Logs - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional co

Re: Wikileaks Iraq log

2010-12-01 Thread Seid Muhie
No Henok. I have searched the web and found nothing. Any link you might propose On Wed, Dec 1, 2010 at 3:07 PM, henok sahilu wrote: > channe > good luck > dont'you think that can be found on the internet. i heard about wikileaks > that > some classified information is on the internet > > good lu

Hit score and confidence ratio in results

2010-12-01 Thread Hassan Saneifar
Hello, I'm using lucene to retrieve relevant segments of a corpus based on a given query. Every segment is represented as Document in the indexing. Once the relevant segments are retrieved, I search for a Regex in them to capture the requested information. It can happens that I found the regex in

Re: precision and recall in lucene

2010-12-01 Thread Yakob
On 12/1/10, Robert Muir wrote: > > you fill the topics files with list of queries, like the lia2 example > that has a single query for "apache source": > > > Number: 0 > apache source > Description: > Narrative: > > > then you populate the qrels file with the "answers" for your document > c

Re: Wikileaks Iraq log

2010-12-01 Thread henok sahilu
channe good luck dont'you think that can be found on the internet. i heard about wikileaks that some classified information is on the internet good luck man fero henok sahilu From: Seid Muhie To: java-user Sent: Wed, December 1, 2010 9:29:43 AM Subject:

Re: precision and recall in lucene

2010-12-01 Thread Robert Muir
On Wed, Dec 1, 2010 at 7:25 AM, Yakob wrote: > can you give me an example of how to populate the topics file and > qrels file other than those on the LIA2 sample code? I still don't > understand of how these 2 files text work anyway. :-) > > let me get this straight. I need to fill topics file wit

Re: precision and recall in lucene

2010-12-01 Thread Yakob
On 12/1/10, Robert Muir wrote: > > well you can't use those files with your own document collection. > you need to populate the topics file with queries that you care about > measuring. > then you need to populate the qrels file with judgements for each > query, *for your collection*. you are sa

Re: PayloadSpanUtil and unstored fields.

2010-12-01 Thread Fabiano Nunes
Please, ignore this thread. It's *my misunderstanding* of query.getSpans(). Thanks! On Wed, Dec 1, 2010 at 10:15 AM, Fabiano Nunes wrote: > PayloadSpanUtil can't retrieve payloads from unstored fields ( > Field.Store.NO). Since the payloads is stored in terms, why do I need > store the fields?

Re: PayloadSpanUtil and unstored fields.

2010-12-01 Thread Fabiano Nunes
Sorry. I'm opening it again. On Wed, Dec 1, 2010 at 10:18 AM, Fabiano Nunes wrote: > Please, ignore this thread. > It's *my misunderstanding* of query.getSpans(). > > Thanks! > > On Wed, Dec 1, 2010 at 10:15 AM, Fabiano Nunes wrote: > >> PayloadSpanUtil can't retrieve payloads from unstored fie

PayloadSpanUtil and unstored fields.

2010-12-01 Thread Fabiano Nunes
PayloadSpanUtil can't retrieve payloads from unstored fields (Field.Store.NO). Since the payloads is stored in terms, why do I need store the fields? Example: PayloadSpanUtil psu = new PayloadSpanUtil(ireader); Collection tests = psu.getPayloadsForQuery(query); Assert.assertTrue((tests.size() > 0)

Re: Retrieving payload attribute in highlighter

2010-12-01 Thread Fabiano Nunes
Thanks Erick. This solved my problem. Now I can retrieve payloads using on the fly readers. On Tue, Nov 30, 2010 at 6:49 PM, Erick Erickson wrote: > Warning, ignorance alert. I'm not all that up on the guts of this one. > > But take a look at MemoryIndex, there's an example there. The gist > is

Re: precision and recall in lucene

2010-12-01 Thread Robert Muir
On Wed, Dec 1, 2010 at 5:53 AM, Yakob wrote: > > well yes your information is really helpful.I did find a topics and > qrels file that come in /src/lia/benchmark in the LIA2 sample code. > and the result did change slightly.but the precision and recall value > is still zero. I did also happen to u

Re: precision and recall in lucene

2010-12-01 Thread Yakob
On 11/30/10, Robert Muir wrote: > On Tue, Nov 30, 2010 at 10:46 AM, Yakob wrote: > >> can you tell me what went wrong? what is the difference between >> topicsFile and qrelsFile anyway? >> > > well its hard to tell what you are supplying as topics and qrels. > have a look at /src/lia/benchmark in

Re: problem with incremental update in lucene

2010-12-01 Thread Yakob
On 12/1/10, Ian Lea wrote: > It's probably this line: > > Directory.copy(ramDir, FSDirectory.open(indexDir), false); // 3 > > the javadocs say > > Copy contents of a directory src to a directory dest. If a file in src > already exists in dest then the one in dest will be blindly > overwritten. > >

Re: Directory objects for index

2010-12-01 Thread Michael McCandless
The FSDirectory does not itself hold any open files. All that its close method actually does is disable future operations against it. Still, it is a good practice to close it (after you've closed all IRs and IWs) since this can help you catch bugs in your app (ie, if you hit AlreadyClosedExceptio

Re: problem with incremental update in lucene

2010-12-01 Thread Ian Lea
It's probably this line: Directory.copy(ramDir, FSDirectory.open(indexDir), false); // 3 the javadocs say Copy contents of a directory src to a directory dest. If a file in src already exists in dest then the one in dest will be blindly overwritten. I don't think you gain anything by using an i