Re: where is the proper place to report lucene bugs?

2007-01-11 Thread Chris Hostetter
: public String highlightTerm(String originalText , TokenGroup group) : { : if(group.getTotalScore()<=0) : { : return originalText; : } : return "" + originalText + ""; : } : : I'm getting '< em> some text '

Re: where is the proper place to report lucene bugs?

2007-01-11 Thread Jason
Thanks Grant, I did make an initial posting on this list but got zero responses so I'm guessing nobody else has seen the problem. basically from this: public String highlightTerm(String originalText , TokenGroup group) { if(group.getTotalScore()<=0) {

Re: where is the proper place to report lucene bugs?

2007-01-11 Thread Grant Ingersoll
From the resources section of the website, the Issue Tracking link is: http://issues.apache.org/jira/browse/LUCENE Also, it is helpful if you have done a preliminary search on the topic and some reasonable investigation to confirm that it is in fact a bug. If your not sure, please ask on t

where is the proper place to report lucene bugs?

2007-01-11 Thread Jason
can someone please tell me where the most appropriate place to report bugs might be - in this case for the hit-highlighter contribution Thanks Jason. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Mark Miller
Incidently, is field1(foo bar) a shortcut for field1(foo) field1(bar) like in the regular QueryParser? I believe I just or the queries together: example = "field1,field2((search & old) ~3 horse)"; expected = "(+spanNear([field1:search, field1:horse], 3, false) +spanNear([fie

Re: Modifying StandardAnalyzer

2007-01-11 Thread Erick Erickson
Would it be simpler just to modify the input with a regex rather than risk messing with StandardANalyzer? Or wouldn't that do what you need? On 1/11/07, Van Nguyen <[EMAIL PROTECTED]> wrote: Hi, I need to modify the StandardAnalyzer so that it will tokenize zip codes that look like this:

Re: how can I filter my search to not include items containing a particular field and value?

2007-01-11 Thread Jason
Thanks Erick, this is what I ended up doing more or less but I'm not happy with it really, hparser.setDefaultOperator(QueryParser.Operator.AND); Query hideQuery =hparser.parse("properties@"+hideterm+":"+hidevalue); cquery.add(hideQuery, Boole

Luke diffs

2007-01-11 Thread Benson Margulies
begin 666 luke_diffs.dat M3VYL>2!I;B N+B]L=6ME.B!B=6EL9 ID:69F("UU("UR("XO$9I96QDF5D*"DL"BT@ M(" @(" @(" @(" @(" @("!F6S!=+FES5&5R;59E8W1OF5D([EMAIL PROTECTED]"AT86)S6VE=+" B8V)4 M;VME;FEZ960B*3L*(" @(" @($]B:F5C="!C8E161B ](&9I;F0H=&%B&5D+" B<[EMAIL PROTECTED];&5A;[EMAIL PROTECTED] M+2 @(" @("

Re: Use the Luke, Force

2007-01-11 Thread Mark Miller
Yes. The other option is to download the jar and run it with a command line argument pointing to the lucene2 jar. I would love to avoid that step. Hook me up. (responding to list to point out second option - I prefer this one) - Mark Benson Margulies wrote: My experience tonight is that the s

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Mark Miller
Chris Hostetter wrote: : I wasn't clear on this answer. The problem was not grammar ambiguity but : from a user standpoint...I wanted to differentiate the proximity binary : operator from the phrase distance operator...even though they are : similar. Perhaps the differentiation is more confusin

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Chris Hostetter
: This sounds troubling to me now :) I may need to clear up my : understanding of this and rework the parser: : "A | B | C ! D ! E" wold get parsed as allFields:a allFields:b : (+allFields:c -allFields:d -allFields:e) : This is because ! binds tighter than |... : Sounds like I need to bone up on h

RE: Use the Luke, Force

2007-01-11 Thread Benson Margulies
I don't know how to build up that magic, signed, auto-launching device. I'll post diffs, however, momentarily. -Original Message- From: Joe Shaw [mailto:[EMAIL PROTECTED] Sent: Thursday, January 11, 2007 8:55 PM To: java-user@lucene.apache.org Subject: Re: Use the Luke, Force Hi, Benson

Re: Use the Luke, Force

2007-01-11 Thread Joe Shaw
Hi, Benson Margulies wrote: My experience tonight is that the stock 1.9-based Luke won't open my 2.0 indices. So I fixed up a version of the source. I've been seeing this too. Anyone else want it? That would be great, if you don't mind. A jar would be nice too. :) Joe --

Use the Luke, Force

2007-01-11 Thread Benson Margulies
My experience tonight is that the stock 1.9-based Luke won't open my 2.0 indices. So I fixed up a version of the source. Anyone else want it?

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Chris Hostetter
: I wasn't clear on this answer. The problem was not grammar ambiguity but : from a user standpoint...I wanted to differentiate the proximity binary : operator from the phrase distance operator...even though they are : similar. Perhaps the differentiation is more confusing then helpful. it's not

Re: Modifying StandardAnalyzer

2007-01-11 Thread Mark Miller
I would try adding this (or your regex) | (("-" )|()) between the EMAIL and HOST line or something, And change this: org.apache.lucene.analysis.Token next() throws IOException : { Token token = null; } { ( token = | token = | token = | token = | token = | token = |

Modifying StandardAnalyzer

2007-01-11 Thread Van Nguyen
Hi, I need to modify the StandardAnalyzer so that it will tokenize zip codes that look like this: 92626-2646 I think the part I need to modify is in here - specifically: // floating point, serial, model numbers, ip addresses, etc. // every other segment must have at least

Re: Excluding partial match for query result

2007-01-11 Thread Erick Erickson
What analyzers are you using for your queries and your indexing? StandardAnalyzer (I believe) will break "A.B" into two tokens , so your index could contain both tags. So what you really have in the index would be story1 A C E story2: A B P Q (note no '.'). searching for B.A would really searc

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Mark Miller
you kind of lost me there ... i get that ~ is a binary operator, but in both cases the intent is to say "these words must appear near eachother" ...s oi'm wondering why you cose to use "hard knocks dude":3 instead of "hard knocks dude"~3 oh wiat, i think i get it ... was it to eliminate amb

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Mark Miller
: > so do you convert A ! B ! C into a three clause boolean query, or a two : > clause BooleanQuery that contains another two clause BooleanQuery? : > : It becomes a three clause boolean query...would there be a difference in : scoring? I assumed not and it used to make a boolean that contained

Excluding partial match for query result

2007-01-11 Thread M A
Hi, We are having some trouble with the results that we get from certain queries. Basically .. we have documents that we index, each document has a bunch of tags, the tags could be of the sort tags: A, B, C, D.P, E.A etc .. Each story will contain only a subset of the tags .. For example Sto

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Chris Hostetter
: > so do you convert A ! B ! C into a three clause boolean query, or a two : > clause BooleanQuery that contains another two clause BooleanQuery? : > : It becomes a three clause boolean query...would there be a difference in : scoring? I assumed not and it used to make a boolean that contained :

Re: Text storing design and performance question

2007-01-11 Thread Chris Hostetter
In general, if you are having performance issues with highlighting, the first thing to do is double check what the bottleneck is: is it accessing the text to by highlighted, or is it running the highlighter? you suggested earlier in the thread that the problem was with accessing the text... : >>

Re: Text storing design and performance question

2007-01-11 Thread Jason Pump
Say my query is "apple banana orange". The word "apple" is near the start of the document, "banana" and "orange" at the end. Wouldn't your optimization stop at the word "apple" and just return this word highlighted? Yes Or do you know of a way to quantify the match? I guess you could count how

Re: Speed of grouped queries

2007-01-11 Thread sdeck
So, to reply to myself with info learned (since I like reading this forum for what people have done with lucene. kudos to cnet by the way) In tracking down some of the speed, I was able to manage some speed improvements Again, the movie examples are just metaphors for what I am really working on

RE: Text storing design and performance question

2007-01-11 Thread Renaud Waldura
Jason: Interesting idea, thanks. But how do you know whether the highlighting is any good? I thought highlighter implemented some kind of strategy to find the best fragment. Say my query is "apple banana orange". The word "apple" is near the start of the document, "banana" and "orange" at the en

Re: Huge Index

2007-01-11 Thread Otis Gospodnetic
Nah, see that "Don't worry about RAMDir, those guys who wrote Lucene in Action were smoking something when they wrote that section. ;)" line from my earlier email. Aha, another Basis Tech person! Good, good :) Otis - Original Message From: Benson Margulies <[EMAIL PROTECTED]> To: jav

RE: Huge Index

2007-01-11 Thread Benson Margulies
Given the ram directory inside of the IndexWriter in 2.0, is there still any reason to stage this manually? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Huge Index

2007-01-11 Thread Otis Gospodnetic
Increase maxBufferedDocs as much as you can (the more RAM the better). Increase max heap for the JVM (-Xmx). If you are on UNIX, increase the max open file descriptors limits and then increase mergeFactor somewhat, say 100. User -server mode for the JVM. Don't worry about RAMDir, those guys who wr

RE: Huge Index

2007-01-11 Thread Alice
You're right. The computer doesn't crash but slows to death. I'm developing in my workstation and the application will run in a server, a better machine but not that great. The indexing will sure run at night no matter the performance but it is not desired to take many hours. -Original Messa

RE: Huge Index

2007-01-11 Thread Alice
I never got to index all the data but it is too slow. I got 3 million in 2,5 hours. As suggested in Lucene in Action, I use ramDir and after I write 5000 documents I merge them to the fsDir. The merge factor is now 100 I tried other variations but didn't make much difference. -Original Mess

Re: Huge Index

2007-01-11 Thread Ruslan Sivak
Alice, If you have a computer that crashes once you put a lot of load on it, I'd say you have bigger problems then the speed of the indexing. A computer should not crash, no matter how much load you put on it. If you have such a huge database, I can't believe that you don't have access to o

Re: Huge Index

2007-01-11 Thread Grant Ingersoll
Hi Alice, Can you define slow (hours, days, months and on what hardware)? Have you done any profiling, etc. to see where the bottlenecks are? What size documents are you talking about here? What are your merge factors, etc.? Thanks, Grant On Jan 11, 2007, at 10:47 AM, Alice wrote: H

Re: Huge Index

2007-01-11 Thread James Rhodes
I used the following settings for speeding up indexing on a similarly sized db table If you have enough ram it might help you. IndexWriter writer = *new* IndexWriter(fdDir,*new* StandardAnalyzer(), *true *); writer.setMergeFactor(100); writer.setMaxMergeDocs(99); writer.setMaxBufferedDocs(

Re: Huge Index

2007-01-11 Thread Rangarirayi Muvavarirwa
One option: https://cool-apps-distributedindex.dev.java.net/ caveat: you would have to setup an account (you get 10CPUhr & 10GB account upon signup) On 1/11/07, Alice <[EMAIL PROTECTED]> wrote: Unfortunately I can't use multiple machines. And I cannot start lots of threads because the server

RE: Huge Index

2007-01-11 Thread Alice
Unfortunately I can't use multiple machines. And I cannot start lots of threads because the server crashes. -Original Message- From: Russ [mailto:[EMAIL PROTECTED] Sent: quinta-feira, 11 de janeiro de 2007 14:33 To: java-user@lucene.apache.org Subject: Re: Huge Index Can you use multipl

Re: Huge Index

2007-01-11 Thread Russ
Can you use multiple threads/machines to index the data into separate indexes, and then combine them? Russ Sent wirelessly via BlackBerry from T-Mobile. -Original Message- From: "Alice" <[EMAIL PROTECTED]> Date: Thu, 11 Jan 2007 13:47:36 To: Subject: Huge Index Hello! I have to i

Re: Technology Preview of new Lucene QueryParser

2007-01-11 Thread Mark Miller
: It works like this: "A -B -C" would be expressed as "A ! B ! C" : By binary, I mean that each operator must connect two clauses...in that : case A is connected to B and C is connected to A ! B. : I avoid the single prohibit clause issue, -query, by not really allowing so do you convert A ! B

"docs out of order" IllegalStateException during merge (LUCENE-140)

2007-01-11 Thread Michael McCandless
Hi all, I would like to draw your attention to an open and rather devious long-standing index corruption issue that we've only now finally gotten to the bottom of: https://issues.apache.org/jira/browse/LUCENE-140 If you hit this, you will typically see a "docs out of order" IllegalStateExc

Re: SpellChecker Index - remove words?

2007-01-11 Thread Otis Gospodnetic
The value of the word - the word itself, should be your unique identifier. Otis - Original Message From: Josh Joy <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, January 11, 2007 5:39:24 AM Subject: Re: SpellChecker Index - remove words? Thanks for the replyI gue

Re: SpellChecker Index - remove words?

2007-01-11 Thread Josh Joy
Thanks for the replyI guess my concern is that I want to ensure that I don't accidentally delete other words than the intended word. Example, if I build a custom index, I can delete a word not based on the term itself, though in the term I can include a "unique" identifier as well. Can the

Re: SpellChecker Index - remove words?

2007-01-11 Thread Otis Gospodnetic
Josh, The spellchecker index is just another Lucene index, so you can delete documents/words from it the same way you delete documents from any Lucene index - using IndexReader's delete(...) methods. You can pass that delete method a Term where the field name is "word" and the value is the miz