RE: KStemFilter

2013-06-14 Thread Sirish Vadala
e/org/apache/lucene/analysis/Analyzer.html > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: > uwe@ > > >> -Original Message- >> From: Sirish Vadala [mailto: > sirishreddy@ > ] >> S

KStemFilter

2013-06-14 Thread Sirish Vadala
Hello All, I have a new requirement within my text search implementation to perform stemming. I have done some research and implemented snowball, but however the customers found it too aggressive and eventually I got them to agree to compromise on k-stem algorithm. Currently my existing code is o

Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

2011-03-14 Thread Sirish Vadala
I had exactly the same requirement to parse and index offline html files. I had written my own HTML scanner using javax.swing.text.html.HTMLEditorKit.Parser. It sounds difficult, but pretty simple and straight forward to implement, a simple 40 line java class did the job for me. shrinath.m wrote:

Issue with disk space on UNIX

2011-03-14 Thread Sirish Vadala
Hello All: Background: I have a text based search engine implemented in Java using Lucene 3.0. Indexing and re-indexing happens every night at 1 am as a scheduled process. The index size is around 1 gig and is recreated every night. Issues 1. Now I have a peculiar problem that happens only on my

RE: Issue with sentence specific search

2010-10-07 Thread Sirish Vadala
Hi Steven, I have implemented sentence specific proximity search as suggested below. However, unfortunately it still doesn't identify the sentence boundaries for my search. I am using # as a delimiter between my sentences while indexing the content: ArrayList sentencesList = senten

RE: Issue with sentence specific search

2010-10-06 Thread Sirish Vadala
Awesome! Thanks a lot Steven! This is exactly what I wanted. Hi Sirish, Have you looked at SpanQuery's yet?: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/spans/package-summary.html See also this Lucid Imagination blog post by Mark Miller:

RE: Issue with sentence specific search

2010-10-06 Thread Sirish Vadala
Hmmm... My mistake. In fact it is not a phrase search, but its a proximity search. My screen gives four options to the user: -All words, -Exact phrase, -At least one word, -Within proximity of xx words. In case of -All words and -At least one word, this is irrelevant an everything works fine.

Issue with sentence specific search

2010-10-06 Thread Sirish Vadala
Hello All: Can any one suggest me the best way to implement both sentence specific and non sentence specific phrase search? The user is going to have a check box for phrase search on the screen that says 'within sentence'. If s/he selects 'within sentence', then I should perform sentence specifi

RE: Problem searching in the same sentence

2010-09-30 Thread Sirish Vadala
I have tried the below code: Field field = new Field(fieldName, validFieldValue, (store) ? Field.Store.YES : Field.Store.NO, (tokenize) ? Field.Index.ANALYZED : Field.Index.NOT_ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); However, I still have the same problem. It

Re: Problem searching in the same sentence

2010-09-29 Thread Sirish Vadala
Hello All: I am performing the sentence specific phrase search, by adding sentence by sentence to the same field as suggested below. Everything works fine, but when I display my results, highlighter is not able to find the search text phrase. The following is my code: SentenceScanner sentenceSc

Problem searching in the same sentence

2010-09-16 Thread Sirish Vadala
Hello All: Can any one suggest me the best way to allow me to perform a sentence specific phrase search? Eg: Let the indexed text be: If you are posting a question, please try search first. Your question may have already been answered. Don't post repeatedly. Wait for a few days. People will

Re: Problem using TopFieldCollector

2010-06-15 Thread Sirish Vadala
I was able to get this whole thing to work using the delegation pattern. In my custom collector object, internally delegate to the TopFieldCollector after doing my custom processing. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-using-TopFieldCollector-tp

Re: Problem using TopFieldCollector

2010-06-14 Thread Sirish Vadala
Thanks for the response. Yeah, eventually I choose to extend the Collector method, since none of the other collectors viz. TopFieldCollector, TopDocsCollector does allow me to extend them and override. I could not grasp what exactly the below means: Rebecca Watson wrote: > > i keep a copy of

Problem using TopFieldCollector

2010-06-11 Thread Sirish Vadala
Currently I am on Lucene 2.2, migrating to 2.9 before eventually plan to move to 3.1. In Lucene 2.2, I have a custom hit collector that does both filtering and sorting my search results. Let me put the functionality achieved. When a user includes advance search criteria with text search, I execu

Problem fetching number of occurrences

2010-06-01 Thread Sirish Vadala
Hello All: Can any one suggest me the best way to get the no. of occurrences of each word per document in Lucene? Eg: Let the indexed text be: If you are posting a question, please try search first. Your question may have already been answered. Now if I search for the word 'question', then I w

Using Sort

2010-04-29 Thread Sirish Vadala
I have a requirement where in the results have to be sorted in ascending order for few fields, and descending order for one field. Currently I am using: String[] sortOrder = { IFIELD_YEAR, IFIELD_TYPE, IFIELD_NUM, IFIELD_SESSION }; Sort sort = new Sort(sortOrder); hits = indexSearcher.search(boo

Re: Problem with search

2010-04-14 Thread Sirish Vadala
Hmmm... Seems like a lot of work to be done. I will try these options and update. Thanks a lot. Best. -- View this message in context: http://n3.nabble.com/Problem-with-search-tp717137p719604.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Problem with search

2010-04-13 Thread Sirish Vadala
Hello All, I am kind of new to Lucene, and having problem filtering search results. Background: My Indexed documents have multiple bills and each bill has multiple versions. Each version of the same bill has a different bill Version Id, but the same bill Id. In most likely case, the text in d

Out Of Memory during Indexing

2008-02-06 Thread Sirish Vadala
r loop ends ... ... ... ... ... Things work well, but not sure if there is any other better way to solve this problem. Thanks. Sirish -- View this message in context: http://www.nabble.com/Out-Of-Memory-during-Indexing-tp15312692p15312692.html Sent from the Lucene - Java Users mailing list archive at

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
Hmmm... I had come up with a temporary solution for the time being. This is how I am initializing the StandardAnalyzer to fix my problem. String[] STOP_WORDS = {}; this.analyzer = new StandardAnalyzer(STOP_WORDS); This now indexes all my stop words, and gladly it didn't increase my indexing time

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
quot;positionIncrement" property for the next valid Token after each > omiited stop word. > > This would retain the benefit of removing stopwords from your index and > yet > prevent your example phrases matching (because the remaining words are not > recorded as being directl

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
ers out "and" (also "or, "in" and others) as stop > words during indexing, and the QueryParser filters those > words out also. > > Best regards, Lisheng > > -Original Message- > From: Sirish Vadala [mailto:[EMAIL PROTECTED] > Sent: Monday,

Phrase Query Problem

2007-12-17 Thread Sirish Vadala
sing standard analyzer while indexing my records. Any help on this is greatly appreciated. Sirish Vadala -- View this message in context: http://www.nabble.com/Phrase-Query-Problem-tp14373945p14373945.html Sent from the Lucene - Java Users mailing list archive

Re: Indexing Problem

2007-11-15 Thread Sirish
On Nov 15, 2007 1:42 PM, Sirish <[EMAIL PROTECTED]> wrote: > >> >> The following is my code snippet for indexing the text: >> >> document.add(Field.Text(IFIELD_TEXT, billMeasureDoc.getText())); >> >> When ever the text is less or short, it works perfectly.

Indexing Problem

2007-11-15 Thread Sirish
The following is my code snippet for indexing the text: document.add(Field.Text(IFIELD_TEXT, billMeasureDoc.getText())); When ever the text is less or short, it works perfectly. But in few of the cases if the text is too lengthy; i.e. around 1000 lines or more then it causes a problem. The prob