crossposting to the user list as I think this issue belongs there. See
my comments inline
On Fri, Feb 5, 2010 at 10:27 AM, lionel duboeuf
wrote:
> Hi,
>
> Sorry for asking again, **I still have not found a scalable solution to get
> the document frequency of a term t according a set of documents.
On Feb 3, 2010, at 8:57 PM, Max Lynch wrote:
> Hi,
> I would like to do a search for "Microsoft Windows" as a span, but not match
> if words before or after "Microsoft Windows" are upper cased.
>
> For example, I want this to match: another crash for Microsoft Windows today
> But not this: anoth
Niclas,
I looked at your initial post, you are creating document with field "abc*"
- nothing related to "wildcard query"!
Of course, query [useragents:abcdefghijklm] will return no results, and
[q=useragents:abc] no results, but [q=useragents:abc*] will return something.
text_nav is specific SO
Hi Max,
On 02/05/2010 at 10:18 AM, Grant Ingersoll wrote:
> On Feb 3, 2010, at 8:57 PM, Max Lynch wrote:
> > Hi, I would like to do a search for "Microsoft Windows" as a span, but
> > not match if words before or after "Microsoft Windows" are upper cased.
> >
> > For example, I want this to match
http://en.wikipedia.org/wiki/Crossposting
-Original Message-
From: Niclas Rothman [mailto:n...@lechill.com]
Sent: Saturday, February 06, 2010 12:12 AM
To: gene...@lucene.apache.org
Cc: java-user@lucene.apache.org
Subject: RE: Wildcard searches
Hi Fuad and thanks for your reply!
The
>
>
> I *think* you can get what you want using SpanNotQuery - something like the
> following, using your "Microsoft Windows" example:
>
> SpanNot:
>include:
>SpanNear(in-order=true, slop=0):
>SpanTerm: "Microsoft"
>SpanTerm: "Windows"
>exclude:
>Span
Hi Niclas,
"generalization" of the user agent "without including the versions numbers"...
How will you separate Mozilla/5.0 (Browser) from Mozilla/5.0 (Googlebot)?
And, going to the root of a problem... why do you use SOLR such a way? Is it
search service showing different content depending on
I understand this:
> So what I need is to have a "generalization" of the user agent in my
> index
So that we may end up with 5 - 10 different tokens. It has to be hard-coded,
for instance, via synonym dictionary or something similar (it is very easy in
SOLR). WAP, HTML, and etc. Most importan