JIRA updated. Includes new testcase which shows highlighter not working as expected.
On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman <ami...@gmail.com>wrote: > Hi > > I have found that it is not issue with POI. I extracted text using PoI but > differenlty and the term is extracted properly. When I store the text and > retrieve it the term exists. However running the text through highlighter > doesn't work > > I will post test case with plain text file on JIRA. Currently on a cramped > train! > > Cheers > > > > On 11 Mar 2009, at 18:11, markharw00d <markharw...@yahoo.co.uk> wrote: > > If you can supply a Junit test that recreates the problem I think we can >> start to make progress on this. >> >> >> >> Amin Mohammed-Coleman wrote: >> >>> Hi >>> >>> Apologies for re sending this mail. Just wondering if anyone has >>> experienced the below. I'm not sure if this could happen due nature of >>> document. It does seem strange one term search returns summary while another >>> does not even though same document is being returned. >>> >>> I'm asking this so I can code around this if is normal. >>> >>> >>> Apologies again for re sending this mail >>> >>> Cheers >>> >>> Amin >>> >>> Sent from my iPhone >>> >>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <ami...@gmail.com> wrote: >>> >>> Hi >>>> >>>> I am seeing some strange behaviour with the highlighter and I'm >>>> wondering if anyone else is experiencing this. In certain instances I >>>> don't >>>> get a summary being generated. I perform the search and the search returns >>>> the correct document. I can see that the lucene document contains the text >>>> in the field. However after doing: >>>> >>>> SimpleHTMLFormatter simpleHTMLFormatter = new >>>> SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>"); >>>> //required for highlighting >>>> Query query2 = multiSearcher.rewrite(query); >>>> Highlighter highlighter = new Highlighter(simpleHTMLFormatter, >>>> new QueryScorer(query2)); >>>> ... >>>> >>>> String text= doc.get(FieldNameEnum.BODY.getDescription()); >>>> TokenStream tokenStream = >>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new >>>> StringReader(text)); >>>> String result = highlighter.getBestFragments(tokenStream, >>>> text, 3, "..."); >>>> >>>> >>>> the string result is empty. This is very strange, if i try a different >>>> term that exists in the document then I get a summary. For example I have >>>> a >>>> word document that contains the term "document" and "aspectj". If I search >>>> for "document" I get the correct document but no highlighted summary. >>>> However if I search using "aspectj" I get the same doucment with >>>> highlighted summary. >>>> >>>> Just to mentioned I do rewrite the original query before performing the >>>> highlighting. >>>> >>>> I'm not sure what i'm missing here. Any help would be appreciated. >>>> >>>> Cheers >>>> Amin >>>> >>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <ami...@gmail.com> >>>> wrote: >>>> Hi >>>> >>>> Got it working! Thanks again for your help! >>>> >>>> >>>> Amin >>>> >>>> >>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman < >>>> ami...@gmail.com> wrote: >>>> Thanks! The final piece that I needed to do for the project! >>>> >>>> Cheers >>>> >>>> Amin >>>> >>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <u...@thetaphi.de> wrote: >>>> > cool. i will use compression and store in index. is there anything >>>> > special >>>> > i need to for decompressing the text? i presume i can just do >>>> > doc.get("content")? >>>> > thanks for your advice all! >>>> >>>> No just use Field.Store.COMPRESS when adding to index and Document.get() >>>> when fetching. The decompression is automatically done. >>>> >>>> You may think, why not enable compression for all fields? The case is, >>>> that >>>> this is an overhead for very small and short fields. So you should only >>>> use >>>> it for large contents (it's the same like compressing very small files >>>> as >>>> ZIP/GZIP: These files mostly get larger than without compression). >>>> >>>> Uwe >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>>> >>> ------------------------------------------------------------------------ >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: >>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00 >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>