Checkstyle has a onetoplevelclass rule that would enforce this
On October 17, 2017 3:45:01 AM EDT, Uwe Schindler wrote:
>Hi,
>
>this has nothing to do with the Java version. I generally ignore this
>Eclipse-failure as I only develop in Eclipse, but run from command
>line. The reason for this beha
Oh thanks Alan that's a good suggestion, but I already wrote max and sum double
values sources since it was easy enough. If you think that's a good approach I
could post a patch.
On October 13, 2017 3:57:30 AM EDT, Alan Woodward wrote:
>Hi,
>
>Yes, moving stuff over to DoubleValuesSource is onl
These are only used in classical Greek I think, explaining probably why they
are not covered by the simpler filter.
On September 27, 2017 9:48:37 AM EDT, Ahmet Arslan
wrote:
>I may be wrong about ASCIIFoldingFilter. Please go with the
>ICUFoldingFilter.
>Ahmet
>On Wednesday, September 27, 2017,
There was some interesting work done on optimizing queries including
very common words (stop words) that I think overlaps with your problem.
See this blog post
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
from the Hathi Trust.
The upshot in a nutshel
Maybe high frequency terms that are not evenly distributed throughout
the corpus would be a better definition. Discriminative terms. I'm
sure there is something in the machine learning literature about
unsupervised clustering that would help here. But I don't know what it
is :)
-Mike
On 0
l_text" field and only read _the_start_ of it?
Otherwise, I'm thinking I'll go with an extra 1st page field for the too-huge
documents.
-Paul
-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: Saturday, June 23, 2012 7:16 PM
To: java-user@lucene.ap
e the decision about
whether to highlight.
-Mike Sokolov
On 6/23/2012 6:17 PM, Jack Krupansky wrote:
Simply have two fields, "full_body" and "limited_body". The former
would index but not store the full document text from Tika (the
"content" metadata.) The latter would
'memory')
See:
http://wiki.apache.org/solr/FunctionQuery#tf
Lucene does have "FunctionQuery", "ValueSource", and
"TermFreqValueSource".
See:
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
-- Jack Krupansky
-Orig
I imagine this is a question that comes up from time to time, but I
haven't been able to find a definitive answer anywhere, so...
I'm wondering whether there is some type of Lucene query that filters by
term frequency. For example, suppose I want to find all documents that
have exactly 2 occ
It sounds me as if there could be a market for a new kind of query that
would implement:
A w/5 (B and C)
in the way that people understand it to mean - the same A near both B
and C, not just any A.
Maybe it's too hard to implement using rewrites into existing SpanQueries?
In term of the Pos
r,
but I don't know if it would be worth the trouble.
It turns out in my very specific case I have a term that appears in
every document in a particular field, so I am just using a search for
that at the moment.
-Mike
On 5/6/2012 8:04 PM, Mike Sokolov wrote:
I think what I have in min
itions for the whole document? Maybe it could be a "fake span" for
each document of 0 ... Integer.MAX_VALUE?
I think it would be nice to have as long as its not going to be too
inefficient...
On Sun, May 6, 2012 at 5:26 PM, Mike Sokolov wrote:
does anybody know how to express a MatchAllDocs
No, that doesn't work either - it works for the lucene query parser, but
not for the *surround* query parser, which I'm using because it has a
syntax for span queries.
On 5/6/2012 6:10 PM, Vladimir Gubarkov wrote:
Do you mean
*:*
?
On Mon, May 7, 2012 at 1:26 AM, Mike Sokolov wr
does anybody know how to express a MatchAllDocsQuery in surround query
parser language? I've tried
*
and()
but those don't parse. I looked at the grammar and I don't think there
is a way. Please let us all know if you know otherwise!
Thanks
I think you have hit on all the best solutions.
The Jira issues you mentioned do indeed hold out some promising
solutions here, but they are a ways away, requiring some significant
re-plumbing and I'm not sure there is a lot of attention being paid to
that at the moment. You should vote for t
My personal view, as a bystander with no more information than you, is
that one has to assume there will be further index format changes before
a 4.0 release. This is based on the number of changes in the last 9
months, and the amount of activity on the dev list.
For us the implication is we
oint me in the right direction?
Jeroen
-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: woensdag 13 juli 2011 15:23
To: java-user@lucene.apache.org
Cc: Jeroen Lauwers
Subject: Re: Advanced NearSpanQuery
Can you wrap a SpanNearQuery around an DisjunctionSumQuery with
Can you wrap a SpanNearQuery around an DisjunctionSumQuery with
minNrShouldMatch=8?
-Mike
On 07/13/2011 08:53 AM, Jeroen Lauwers wrote:
Hi,
I was wondering if anyone could help me on this:
I want to search for:
1. a set of words (eg. 10)
2. only a couple of words may come in be
Our apps use highlighting, and I expect that highlighting is an
expensive operation since it requires processing the text of the
documents, but I ran a test and was surprised just how expensive it is.
I made a test index with three fields: path, modified, and contents. I
made the index using
Down to basics, Lucene searches work by locating terms and resolving
documents from them. For standard term queries, a term is located by a
process akin to binary search. That means that it uses log(n) seeks to
get the term. Let's say you have 10M terms in your corpus. If you stored
that in a si
l the documents that contain foo, but I want them
sorted by frequency.
Then, I would have doc1, doc2.
Now, I want to search for all the documents that contain foon, but I want them
sorted by weight1.
Then, I would have doc2, doc1
Does that clarify?
On May 5, 2011, at 3:01 PM, Mike Sokolov
Are the tokens unique within a document? If so, why not store a document
for every doc/token pair with fields:
id (doc#/token#)
doc-id (doc#)
token
weight1
weight2
frequency
Then search for token, sort by weight1, weight2 or frequency.
If the token matches are unique within a document you will
It's an idea - sorry I don't have an implementation I can share easily;
it's embedded in our application code and not easy to refactor. I'm not
sure where this would fit in the solr architecture; maybe some subclass
of SearchHandler? I guess the query rewriter would need to be aware of
which
Background: I've been trying to enable hit highlighting of XML documents
in such a way that the highlighting preserves the well-formedness of the
XML.
I thought I could get this to work by implementing a CharFilter that
extracts text from XML (somewhat like HTMLStripCharFilter, except I am
us
24 matches
Mail list logo