Using lucene in NFS

2010-04-29 Thread Vijay Veeraraghavan
dear all, I have a problem using lucene in NFS. A scheduler runs periodically generating reports in pdf format and saves it to a file server. The drive of the file server is mounted to the scheduler server (NFS). After generating reports finally the scheduler indexes the names of the report and i

Re: Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
I think I've figured out what the problem is. Given the inputs, Input1: C1C2,C3C4,C5C6,C7,C8C9C10 Input2: C1C2 C3C4 C5C6 C7 C8C9C10 Input1 gets parsed as Query1: (text: "C1C2 C3C4 C5C6 C7 C8C9C10") whereas Input2 gets parsed as Query2: (text: "C1C2") (text: "C3C4") (text: "C5C6") (text:

Re: Highlighter usage

2010-04-29 Thread Justin
Yeah, I understood the difference between StandardAnalyzer and StandardTokenizer. I wasn't sure that I want to use the StopFilter for highlighting, particularly in conjunction with phrase queries (e.g. "Alexander The Great"). StandardAnalyzer and StandardTokenizer are final, but I found my so

Re: Highlighter usage

2010-04-29 Thread Erick Erickson
That's the *StandartTokenizer*, which is not at all identical to StandardAnalyzer. From the Javadoc for StandardAnalyzer: Filters StandardTokenizer with StandardFilter

Re: Highlighter usage

2010-04-29 Thread Justin
I'm using my own analyzer so I can interject HTMLStripCharFilter as described in a previous thread. private static Analyzer htmlStripAnalyzer = new ReusableAnalyzerBase() { @Override protected TokenStreamComponents createComponents( final String fieldName, final Reader

RE: Lucene QueryParser and Analyzer

2010-04-29 Thread Sudarsan, Sithu D.
---sample code- >> Analyzer analyzer = new LingPipeAnalyzer(); >> Searcher searcher = new IndexSearcher(directory); >> QueryParser qParser = new MultiFieldQueryParser(Version.LUCENE_30, >> SEARCH_FIELDS, analyzer); >> Query query = qParser.parse(queryLine[1]); >> ScoreDoc[] result

Re: Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
Sorry, I guess "discarding the punctuation" was a bit misleading. I meant that given the two input strings, Input1: C1C2,C3C4,C5C6,C7,C8C9C10 Input2: C1C2 C3C4 C5C6 C7 C8C9C10 The analyzer I implemented tokenizes both Input1 and Input2 as "C1C2", "C3C4", "C5C6", "C7", "C8C9C10" - that is, i

Re: Highlighter usage

2010-04-29 Thread Erick Erickson
What analyzer are you using at index time? My guess is something like WhitespaceAnalyzer that doesn't stem or change case. Try a different analyzer, SimpleAnalyzer comes to mind HTH Erick On Thu, Apr 29, 2010 at 4:21 PM, Justin wrote: > I'm trying to use Highlighter with QueryScorer aft

Highlighter usage

2010-04-29 Thread Justin
I'm trying to use Highlighter with QueryScorer after reading: https://issues.apache.org/jira/browse/LUCENE-1685 The problem is: I'm not getting a result unless my the query term is an exact match. Am I missing filters? Is there a more complete example of how this should work? String con

RE: Lucene QueryParser and Analyzer

2010-04-29 Thread Sudarsan, Sithu D.
If so, Input1: c1c2c3c4c5c6c7 Input2: c1c2 c3c4 ... I guess, they are different! Add a whitespace after commas and see if that works... Sincerely, Sithu D Sudarsan -Original Message- From: Wei Ho [mailto:we...@princeton.edu] Sent: Thursday, April 29, 2010 4:04 PM To: java-user@

Re: IndexWriter and memory usage

2010-04-29 Thread Michael McCandless
OK I think you may be hitting this: https://issues.apache.org/jira/browse/LUCENE-2422 Since you have very large docs, the reuse that's done by IndexInput/Output is tying up alot of memory. Ross can you try the patch I just attached on that issue (merge it w/ the other issues) and see if that

Re: Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
No, there is no whitespace after the comma in Input1 Input1: C1C2,C3C4,C5C6,C7,C8C9C10 Input2: C1C2 C3C4 C5C6 C7 C8C9C10 Input1 is basically one big long word with commas and Chinese characters one after the other. Input2 is where I manually separated the string into the component terms by

RE: Lucene QueryParser and Analyzer

2010-04-29 Thread Sudarsan, Sithu D.
Hi, Is there a whitespace after the comma? Sincerely, Sithu D Sudarsan -Original Message- From: Wei Ho [mailto:we...@princeton.edu] Sent: Thursday, April 29, 2010 3:51 PM To: java-user@lucene.apache.org Subject: Lucene QueryParser and Analyzer Hello, I'm using Lucene to index and s

Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
Hello, I'm using Lucene to index and search through a collection of Chinese documents. However, I'm noticing an odd behavior in query parsing/searching. Given the two queries below: (Ci refers to Chinese character i) Input1: C1C2,C3C4,C5C6,C7,C8C9C10 Input2: C1C2 C3C4 C5C6 C7 C8C9C10 Inp

RE: Using Sort

2010-04-29 Thread Uwe Schindler
Look at the other constructors of Sort taking SortFields. These ones are deprecated and were removed in 3.0. They are no longer supported. When constructing SortFields you can specify the type of sort and also the order. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaph

Using Sort

2010-04-29 Thread Sirish Vadala
I have a requirement where in the results have to be sorted in ascending order for few fields, and descending order for one field. Currently I am using: String[] sortOrder = { IFIELD_YEAR, IFIELD_TYPE, IFIELD_NUM, IFIELD_SESSION }; Sort sort = new Sort(sortOrder); hits = indexSearcher.search(boo

Re: Relevancy Practices

2010-04-29 Thread Mark Bennett
Hi Grant, You're welcome to use any of my slides (Dave's got them), with attribution of course. BUT Have you considered a section something like "why the hell do you think Relevancy tweaking is gonna save you!?!?" Basically that, as a corpus grows exponentially, so do results list sizes, so

RE: Relevancy Practices

2010-04-29 Thread Fornoville, Tom
We've only been using Lucene for a couple of weeks and we're still in the evaluation and R&D phase but there's one single thing that has helped us out enormously with the relevance testing: a set of reference documents and queries. We basically sat together with the business people a created a list

Relevancy Practices

2010-04-29 Thread Grant Ingersoll
I'm putting on a talk at Lucene Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical Relevance" and I'm curious as to what people put in practice for testing and improving relevance. I have my own inclinations, but I don't want to muddy the water just yet. So, if you

Re: IOExceptions when optimising the index

2010-04-29 Thread Ian Lea
Hi It is not necessary to run optimize. At a guess there is some job such as a backup or virus check that is running overnight and locking files and parts of the file system. If that is the case, and you do want to run optimize, perhaps you could schedule around it. Or switch to a unix based s

IOExceptions when optimising the index

2010-04-29 Thread Anna Hunecke
Hi! we are using Lucene 2.4.1 in our app. It works great so far, but now a customer ran into a strange problem. During the day, the search index is updated regularly with the newest changes in the application. At night, when nothing much is happening in the application, the index is optimised.

Re: Right memory for search application

2010-04-29 Thread Toke Eskildsen
On Wed, 2010-04-28 at 16:57 +0200, Erick Erickson wrote: > And you can extend this ad nauseum. For instance, you could use 6 > fields, yy, mm, dd, HH, MM, SS and have a very small number of > unique values in each using really tiny amounts of memory to sort down > to the second in this case. A ref