e/org/apache/lucene/analysis/Analyzer.html
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail:
> uwe@
>
>
>> -Original Message-
>> From: Sirish Vadala [mailto:
> sirishreddy@
> ]
>> S
Hello All,
I have a new requirement within my text search implementation to perform
stemming. I have done some research and implemented snowball, but however
the customers found it too aggressive and eventually I got them to agree to
compromise on k-stem algorithm.
Currently my existing code is o
I had exactly the same requirement to parse and index offline html files. I
had written my own HTML scanner using
javax.swing.text.html.HTMLEditorKit.Parser. It sounds difficult, but pretty
simple and straight forward to implement, a simple 40 line java class did
the job for me.
shrinath.m wrote:
Hello All:
Background:
I have a text based search engine implemented in Java using Lucene 3.0.
Indexing and re-indexing happens every night at 1 am as a scheduled process.
The index size is around 1 gig and is recreated every night.
Issues
1. Now I have a peculiar problem that happens only on my
Hi Steven,
I have implemented sentence specific proximity search as suggested below.
However, unfortunately it still doesn't identify the sentence boundaries for
my search.
I am using # as a delimiter between my sentences while indexing the content:
ArrayList sentencesList = senten
Awesome! Thanks a lot Steven! This is exactly what I wanted.
Hi Sirish,
Have you looked at SpanQuery's yet?:
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/spans/package-summary.html
See also this Lucid Imagination blog post by Mark Miller:
Hmmm... My mistake.
In fact it is not a phrase search, but its a proximity search.
My screen gives four options to the user: -All words, -Exact phrase, -At
least one word, -Within proximity of xx words.
In case of -All words and -At least one word, this is irrelevant an
everything works fine.
Hello All:
Can any one suggest me the best way to implement both sentence specific and
non sentence specific phrase search? The user is going to have a check box
for phrase search on the screen that says 'within sentence'. If s/he selects
'within sentence', then I should perform sentence specifi
I have tried the below code:
Field field = new Field(fieldName, validFieldValue,
(store) ? Field.Store.YES : Field.Store.NO,
(tokenize) ? Field.Index.ANALYZED : Field.Index.NOT_ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS);
However, I still have the same problem. It
Hello All:
I am performing the sentence specific phrase search, by adding sentence by
sentence to the same field as suggested below. Everything works fine, but
when I display my results, highlighter is not able to find the search text
phrase.
The following is my code:
SentenceScanner sentenceSc
Hello All:
Can any one suggest me the best way to allow me to perform a sentence
specific phrase search?
Eg: Let the indexed text be:
If you are posting a question, please try search first. Your question may
have already been answered. Don't post repeatedly. Wait for a few days.
People will
I was able to get this whole thing to work using the delegation pattern. In
my custom collector object, internally delegate to the TopFieldCollector
after doing my custom processing.
Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-using-TopFieldCollector-tp
Thanks for the response.
Yeah, eventually I choose to extend the Collector method, since none of the
other collectors viz. TopFieldCollector, TopDocsCollector does allow me to
extend them and override.
I could not grasp what exactly the below means:
Rebecca Watson wrote:
>
> i keep a copy of
Currently I am on Lucene 2.2, migrating to 2.9 before eventually plan to move
to 3.1.
In Lucene 2.2, I have a custom hit collector that does both filtering and
sorting my search results.
Let me put the functionality achieved. When a user includes advance search
criteria with text search, I execu
Hello All:
Can any one suggest me the best way to get the no. of occurrences of each
word per document in Lucene?
Eg: Let the indexed text be:
If you are posting a question, please try search first. Your question may
have already been answered.
Now if I search for the word 'question', then I w
I have a requirement where in the results have to be sorted in ascending
order for few fields, and descending order for one field.
Currently I am using:
String[] sortOrder = { IFIELD_YEAR, IFIELD_TYPE, IFIELD_NUM, IFIELD_SESSION
};
Sort sort = new Sort(sortOrder);
hits = indexSearcher.search(boo
Hmmm... Seems like a lot of work to be done. I will try these options and
update.
Thanks a lot.
Best.
--
View this message in context:
http://n3.nabble.com/Problem-with-search-tp717137p719604.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
Hello All,
I am kind of new to Lucene, and having problem filtering search results.
Background:
My Indexed documents have multiple bills and each bill has multiple
versions.
Each version of the same bill has a different bill Version Id, but the same
bill Id. In most likely case, the text in d
r loop ends
... ... ... ... ...
Things work well, but not sure if there is any other better way to solve
this problem. Thanks.
Sirish
--
View this message in context:
http://www.nabble.com/Out-Of-Memory-during-Indexing-tp15312692p15312692.html
Sent from the Lucene - Java Users mailing list archive at
Hmmm... I had come up with a temporary solution for the time being. This is
how I am initializing the StandardAnalyzer to fix my problem.
String[] STOP_WORDS = {};
this.analyzer = new StandardAnalyzer(STOP_WORDS);
This now indexes all my stop words, and gladly it didn't increase my
indexing time
quot;positionIncrement" property for the next valid Token after each
> omiited stop word.
>
> This would retain the benefit of removing stopwords from your index and
> yet
> prevent your example phrases matching (because the remaining words are not
> recorded as being directl
ers out "and" (also "or, "in" and others) as stop
> words during indexing, and the QueryParser filters those
> words out also.
>
> Best regards, Lisheng
>
> -Original Message-
> From: Sirish Vadala [mailto:[EMAIL PROTECTED]
> Sent: Monday,
sing standard
analyzer while indexing my records.
Any help on this is greatly appreciated.
Sirish Vadala
--
View this message in context:
http://www.nabble.com/Phrase-Query-Problem-tp14373945p14373945.html
Sent from the Lucene - Java Users mailing list archive
On Nov 15, 2007 1:42 PM, Sirish <[EMAIL PROTECTED]> wrote:
>
>>
>> The following is my code snippet for indexing the text:
>>
>> document.add(Field.Text(IFIELD_TEXT, billMeasureDoc.getText()));
>>
>> When ever the text is less or short, it works perfectly.
The following is my code snippet for indexing the text:
document.add(Field.Text(IFIELD_TEXT, billMeasureDoc.getText()));
When ever the text is less or short, it works perfectly. But in few of the
cases if the text is too lengthy; i.e. around 1000 lines or more then it
causes a problem.
The prob
25 matches
Mail list logo