Highlighter in lucene doesn't return any fragment

2010-05-21 Thread Diego Campoy
I'm trying to implement highlighting in my lucene application and I can't
get any fragment. getBestFragment always returns null.

My code:

QueryParser parser = new QueryParser(Version.LUCENE_30, "text",
myAnalyzer);
Query realQuery = parser.parse(query);

Highlighter highlighter = new Highlighter(new QueryScorer(realQuery,
"text"));
for (ScoreDoc scoredoc : luceneTopDocs.scoreDocs ) {
Document doc = null;
doc = searcher.doc(scoredoc.doc);
String bestFragment = highlighter.getBestFragment(myAnalyzer,
"text", doc.get("text"));
if (bestFragment != null) doSomething();
}

Thank you,
Diego


Re: Stemming and Wildcard Queries

2010-05-21 Thread Ivan Provalov
Thanks, everyone!

--- On Thu, 5/20/10, Herbert Roitblat  wrote:

> From: Herbert Roitblat 
> Subject: Re: Stemming and Wildcard Queries
> To: java-user@lucene.apache.org
> Date: Thursday, May 20, 2010, 4:48 PM
> At a general level, we have found
> that stemming during indexing is not advisable. 
> Sometimes users want the exact form and if you have removed
> the exact form during indexing, obviously, you cannot
> provide that.  Rather, we have found that stemming
> during search is more useful, or maybe it should be called
> anti-stemming.  For any given input for which the user
> wants to stem, we could derive the variations during the
> query processing.  E.g., plan can be expanded to
> include plans, planning, planned, etc.
> 
> In our application we provide a feature that is sometimes
> called a word wheel.  When someone enters plan in this
> tool, we show all of the words in the index that start with
> plan. Here are some of the related words:
> plan
> plane
> planes
> planet
> planificaci
> planned
> plannedoutages.xls
> planner
> planners
> 
> Just a thought.
> Herb
> 
> - Original Message - From: "Ivan Provalov" 
> To: 
> Sent: Thursday, May 20, 2010 1:16 PM
> Subject: Stemming and Wildcard Queries
> 
> 
> > Is there a good way to combine the wildcard queries
> and stemming?
> > 
> > As is, the field which is stemmed at index time, won't
> work with some wildcard queries.
> > 
> > We were thinking to create two separate index fields -
> one stemmed, one non-stemmed, but we are having issues with
> our SpanNear queries (they require the same field).
> > 
> > We thought to try combining the stemmed and
> non-stemmed terms in the same field, but we are concerned
> about the stats being skewed as a result of this (especially
> for the TermVector stats).  Can overloading the
> non-stemmed field with stemmed terms cause any issues with
> the TermVector?
> > 
> > Any suggestions?
> > 
> > Ivan Provalov
> > 
> > 
> > 
> > 
> >
> -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > 
> > 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Stemming and Wildcard Queries

2010-05-21 Thread Erick Erickson
Another approach to stemming at index time but still providing exact matches
when requested is to index the stemmed version AND the original version at
the same position (think synonyms). But here's the trick, index the original
token with a special character. For instance, indexing "running" would look
like indexing "run" and "running$". Now, whenever you want the exact match,
just add the "$" to the end of the token.

With this approach, you have to watch that your analyzers don't strip the
'$'...

Of course, each approach has its trade-offs, and the characteristics of your
particular problem may determine which is preferable...

FWIW
Erick

On Thu, May 20, 2010 at 4:48 PM, Herbert Roitblat  wrote:

> At a general level, we have found that stemming during indexing is not
> advisable.  Sometimes users want the exact form and if you have removed the
> exact form during indexing, obviously, you cannot provide that.  Rather, we
> have found that stemming during search is more useful, or maybe it should be
> called anti-stemming.  For any given input for which the user wants to stem,
> we could derive the variations during the query processing.  E.g., plan can
> be expanded to include plans, planning, planned, etc.
>
> In our application we provide a feature that is sometimes called a word
> wheel.  When someone enters plan in this tool, we show all of the words in
> the index that start with plan. Here are some of the related words:
> plan
> plane
> planes
> planet
> planificaci
> planned
> plannedoutages.xls
> planner
> planners
>
> Just a thought.
> Herb
>
> - Original Message - From: "Ivan Provalov" 
> To: 
> Sent: Thursday, May 20, 2010 1:16 PM
> Subject: Stemming and Wildcard Queries
>
>
>
>  Is there a good way to combine the wildcard queries and stemming?
>>
>> As is, the field which is stemmed at index time, won't work with some
>> wildcard queries.
>>
>> We were thinking to create two separate index fields - one stemmed, one
>> non-stemmed, but we are having issues with our SpanNear queries (they
>> require the same field).
>>
>> We thought to try combining the stemmed and non-stemmed terms in the same
>> field, but we are concerned about the stats being skewed as a result of this
>> (especially for the TermVector stats).  Can overloading the non-stemmed
>> field with stemmed terms cause any issues with the TermVector?
>>
>> Any suggestions?
>>
>> Ivan Provalov
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: spatial searches

2010-05-21 Thread Julian Atkinson
Hi Klaus,

I suggest you take a look at the code in TestCartesian.java for
working examples of the search and as a staring point to trace
through.

in more depth, if you look at DistanceQueryBuilder.java you'll see 2
filters are being setup.

The first pass filter is created by CartesianPolyFilterBuilder and
this makes sure you only consider documents near to the area you are
searching by looking in the right tier and pulling out the relevant
grid cells.

The second filter is dependent on which method you are using Lat/Lng
or Geohash - this is where the more precise filtering is done based on
the calculated distance.

The use of the second pass filter is optional and driven by a boolean.

If you want to custom score then there is an example in the
TestCartesian.class with CustomScoreQuery

Hope this helps,
Julian



On 11 May 2010 15:18, Klaus Malorny  wrote:
>
> Hi all,
>
> I hope someone can enlighten me. I am trying to figure out how spatial
> searches are to be implemented with Lucene. From walking through mailing
> lists and various web pages, looking at the JavaDoc and source code, I
> understand how the tiers work and how the search is limited by a special
> term query containing the ID(s) of the relevant grid cells.
>
> However, it still puzzles me how, where and when the final distance
> filtering takes place. I see three possibilities: the "Filter" class, the
> "ValueSourceQuery" or the use of a subclass of "Collector". With my limited
> understanding of the inner working of Lucene, it seems to me that the first
> two ways more or less operate on the whole document set, i.e. prior to the
> moment where the term query for the tiers comes into effect, rendering it
> useless. The "Collector" approach seems to be much more appropriate, but
> additionally to the decision whether the document meets the distance
> condition or not, I would like to have different scores depending on the
> distance (lower score for larger distances). Originally I thought that the
> solution would be some kind of subclass of "Query", but haven't seen any
> hints pointing in this direction and I don't know whether I am able to
> implement that on my own. I fear that I completely misunderstand something.
> Thanks in advance for any hints.
>
> Regards,
>
> Klaus
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org