Hey Mike,
My only concern is that I am replacing a large number of fields inside of a
Document with a (very large ~50e6) number of Documents. Will I not run into
the same memory issues? Or do I create only one doc object and reuse it? With
so many Doc/Token pairs, won't searching the index t
I think the solution I gave you will work. The only problem is if a
token appears twice in the same doc:
doc1 has foo with two different sets of weights and frequencies...
but I think you're saying that doesn't happen
On 05/05/2011 06:09 PM, Chris Schilling wrote:
Hey Mike,
Let me clarify:
Oh, yes, they are unique within a document. I was also thinking about
something like this. But I would be replacing a large number of fields within
a document by a large number of documents. Let me see if I can work that out.
On May 5, 2011, at 3:01 PM, Mike Sokolov wrote:
> Are the tokens
Hey Mike,
Let me clarify:
The tokens are not unique. Let's say doc1 contains the token
foo and has the properties weight1 = 0.75, weight2 = 0.90, frequency = 10
Now, let's say doc2 also contains the token
foo with properties: weight1 = 0.8, weight2 = 0.75, frequency = 5
Now, I want to search
Are the tokens unique within a document? If so, why not store a document
for every doc/token pair with fields:
id (doc#/token#)
doc-id (doc#)
token
weight1
weight2
frequency
Then search for token, sort by weight1, weight2 or frequency.
If the token matches are unique within a document you will
Hi,
I am trying to figure out how to solve this problem:
I have about 500,000 files that I would like to index, but the files are
structured. So, each file has the following layout:
doc1
token1, weight11, frequency1, weight21
token2, weight12, frequency2, weight22
.
.
.
etc for 500,000 docs.
Hi,
I am new to Lucene, so I apologize if this has been answered, but I've had
no success finding the answer after googling around. I am using Compass as
a Lucene front end and have run into an issue in querying Lucene docs. I am
looking for a way to search a property based on it's complete and
It's an idea - sorry I don't have an implementation I can share easily;
it's embedded in our application code and not easy to refactor. I'm not
sure where this would fit in the solr architecture; maybe some subclass
of SearchHandler? I guess the query rewriter would need to be aware of
which
Also, have a look at the patch on this issue:
https://issues.apache.org/jira/browse/LUCENE-2995
That issue factors out spell checking / auto suggest from Lucene &
Solr into a shared module.
Mike
http://blog.mikemccandless.com
On Thu, May 5, 2011 at 8:54 AM, Clemens Wyss wrote:
> I have im
Hi Michael
sounds excellent to me.
Is it a QParserPlugin or what is it?
Regards
Bernd
Am 05.05.2011 14:05, schrieb Michael Sokolov:
In our applications, we catch ParseException and then take one of the following
actions:
1) report an error to the user
2) rewrite the query, stripping all p
If you check out the source code of solr/lucene, look at FSTLookup class and
FSTLookupTest -- you can populate FSTLookup manually with terms/ phrases
from your index and then use the resulting automaton for suggestions.
Dawid
On Thu, May 5, 2011 at 2:54 PM, Clemens Wyss wrote:
> I have implemen
I have implemented my index (in fact it's a plugable indexing API) in "plain
Lucene". It tried to implement a term suggestion mechanism on my own, being not
to happy so far.
At
http://search-lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest
I have seen Solr's auto suggestion for search terms.
On 05/05/2011 11:59, Ian Lea wrote:
See
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
for an excellent article and solution to the problem with common
words.
Would this work when the user doesnt actualy use a phrase query
You could also consider using
In our applications, we catch ParseException and then take one of the
following actions:
1) report an error to the user
2) rewrite the query, stripping all punctuation, and try again
3) rewrite the query, quoting all punctuation, and try again
would that work for you?
On 5/5/2011 3:26 AM, Bern
See
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
for an excellent article and solution to the problem with common
words.
You could also consider using, and caching and reusing, filters for
the tnum and tracks fields.
--
Ian.
On Thu, May 5, 2011 at 11
On 05/05/2011 11:13, Ahmet Arslan wrote:
Yes correct, but I have looked and the list of
optimizations before. What was clear from profiling was that
it wasnt the searching part that was slow (a query run on
the same index with only a few matching docs ran super fast)
the slowness only occurs when
> Yes correct, but I have looked and the list of
> optimizations before. What was clear from profiling was that
> it wasnt the searching part that was slow (a query run on
> the same index with only a few matching docs ran super fast)
> the slowness only occurs when there are loads of matching
> do
Dear list,
I need a QueryValidator and don't mind writing one but don't want
to reinvent the wheel in case there is already something.
Is this the right list for talking about a QueryValidator or
should it belong to SOLR?
What do I mean with a QueryValidator?
I think about something like valida
On 05/05/2011 00:24, Ahmet Arslan wrote:
Thanks again, now done that but still not having much
effect on total
ime,
So your main concern is enhancing the running time? , not to decrease the
number of returned results.
Additionally http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Yes c
On 05/05/2011 00:24, Chris Hostetter wrote:
: Well I did extend QuerParser, and the method is being called but rather
: disappointingly it had no noticeablke effect on how long queries took. I
: really thought by reducing the number of matches the corresponding scoring
: phase would be quicker.
20 matches
Mail list logo