It depends what you call a server :
- 4 dual Xeon, 64G RAM, 1TB of 15000 rpm raid10 hard-disks is one thing
- 1 P4, 512M RAM, 40G 5400 rpm hard-disk, Win2K is completly something else
It depends on index structure and the size of the documents you index/store .
It depends on the way you query
Hello all,
I am using default English Snowball analyzer to index and search English
documents. There may be chances to index European, Chinese documents. What
will be the impact to use English Analyzer for European or Chinese language
documents? Whether i could do index and search as expected?
I want to index document conents in two ways, one just a simple
content, and the other as named entity.
the senario is like this.
if i have this document "the source of Nile is Ethiopia"
then I want to index "source" as a normal content, "Nile" as river
name, and "Ethiopia" as Country name. so that
hi:
how many size of the index is the lucene's limit on per server ? I mean that
the speed of the search is very fast and doesn't be affected by the huge
index !
which is the limit on per server,if the index is bigger than it ,the speed
of the search will be low!
any expert have a experience to te
Thanks for your suggestion Michael and thanks to Uwe for clarifying.
Payload is currently used to store only the start positions.
What I gathered from your suggestion is that we could possibly
store the end position, or span, or some other complex
encoding in order to store the extra informati
On Feb 25, 2009, at 2:52 PM, Tim Williams wrote:
Is there a syntax to set the term position in a query built with
queryparser? For example, I would like something like:
PhraseQuery q = new PhraseQuery();
q.add(t1, 0);
q.add(t2, 0);
q.setSlop(0);
As I understand it, the slop defaults to 0, bu
On 3/2/09 4:23 PM, "Ken Williams" wrote:
> On 3/2/09 1:58 PM, "Erik Hatcher" wrote:
>
>> On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
>>> In the output, I get explanations like "0.88922405 = (MATCH) product
>>> of:"
>>> with no details. Perhaps I need to do something different in
>>> ind
On 3/2/09 1:58 PM, "Erik Hatcher" wrote:
>
> On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
>> In the output, I get explanations like "0.88922405 = (MATCH) product
>> of:"
>> with no details. Perhaps I need to do something different in
>> indexing?
>
> Explanation.toString() only returns t
On 3/2/09 4:19 PM, "Steven A Rowe" wrote:
> On 3/2/2009 at 4:22 PM, Grant Ingersoll wrote:
>> On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
>>> Also, while perusing the threads you refer to below, I saw a
>>> reference to the following link, which seems to have gone dead:
>>>
>>> https://i
On 3/2/2009 at 4:22 PM, Grant Ingersoll wrote:
> On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
> > Also, while perusing the threads you refer to below, I saw a
> > reference to the following link, which seems to have gone dead:
> >
> > https://issues.apache.org/bugzilla/show_bug.cgi?id=31841
>
So then all is good.
We were only pursuing this to explain it. Now that we know your
directories are empty, that explains it.
So you should call maybeReopen() inside get(), as long as it does not
slow queries down.
Mike
Amin Mohammed-Coleman wrote:
I think that is the case. When my
I think that is the case. When my SearchManager is initialised the
directories are empty so when I do a get() nothing is present. Subsequent
calls seem to work. Is there something I can do? or do I accept this or
just do a maybeReopen and do a get(). As you mentioned it depends on
timiing but I
On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
Hi Grant,
It's true, I may have an X-Y problem here. =)
My basic need is to sacrifice recall to achieve greater precision.
Rather
than always presenting the user with the top N documents, I need to
return
*only* the documents that seem rele
Have a look at the MoreLikeThis contrib module in the contrib section
of Lucene. You can start with that, and then do the additions and
subtractions from there.
On Mar 2, 2009, at 9:35 AM, Gregory Gay wrote:
Hi,
I'm a complete novice at Lucene, and I'm looking for a little bit of
help
Well the code looks fine.
I can't explain why you see no search results if you don't call
maybeReopen() in get, unless at the time you first create
SearcherManager the Directories each have an empty index in them.
Mike
Amin Mohammed-Coleman wrote:
Hi
Here is the code that I am using, I'
You mean on calling IndexWriter.close, with a deletion policy that's
functionally equivalent to KeepOnlyLastCommitDeletionPolicy, you
somehow see that last 2 commits remaining in the Directory once
IndexWriter is done closing? That's odd. Are you sure "onCommit()"
is really calling dele
Hi
Here is the code that I am using, I've modified the get() method to include
the maybeReopen() call. Again I'm not sure if this is a good idea.
public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {
final String searchTerm = searchRequest.getSearchTerm();
Hello,
In Solr, when a user calls commit, the IndexWriter is closed (causing a
commit). It is opened again only when another document is added or, a delete
is performed. In order to support replication, Solr trunk now uses a
deletion policy. The default policy is (should be?) equivalent to
KeepOnl
On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
Finally, I seem unable to get Searcher.explain() to do much useful -
my code
looks like:
Searcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser(LuceneIndex.CONTENT,
analyzer);
Query query = pa
Hi Grant,
It's true, I may have an X-Y problem here. =)
My basic need is to sacrifice recall to achieve greater precision. Rather
than always presenting the user with the top N documents, I need to return
*only* the documents that seem relevant. For some searches this may be 3
documents, for so
There are two ways to handle this:
1) During indexing time, expand the group tree and store them to the
documents, like "groups:1 2 3"
2) When indexing, storing only the exact group the document belongs to. Then
during search time, expand group tree to search all the groups the user
belongs to, inc
It makes perfect sense to call maybeReopen() followed by get(), as
long as maybeReopen() is never slow enough to be noticeable to an end
user (because you are making random queries pay the reopen/warming
cost).
If you call maybeReopen() after get(), then that search will not see
the new
Hi Markus,
I need to restrict the resultset to the appropriate rights of the user
who is searching the index.
A document may belong to several groups.
A user must belong to all groups of the document to find it. There's one
additional problem: The groups are a tree. A user is automaticaly
in e
See page 88 in Lucene In Action for a fuller explanation, including
ordering considerations.
But basically, phrase query slop is the maximum number of
"moves" be required to get all the words next to each other
in the proper order. If you can get all the words next to each
other within slop moves,
Hi All,
I had posted the below mentioned query a week back and I have not
received any response from the group so far.
I was wondering if this is a trivial question to the group or it has been
answered previously.
I appreciate your answers or any pointers to the answers are also welcome.
If you have a reasonable way of getting the doc IDs that
your user is allowed to see (and it appears you do), you
probably want a Filter. At root a Filter is just a BitSet
where you turn on the bit for each document that *could*
be allowed in the results and pass that filter to the appropriate
sear
Dear list
I need to restrict the resultset to the appropriate rights of the user
who is searching the index.
A document may belong to several groups.
A user must belong to all groups of the document to find it. There's one
additional problem: The groups are a tree. A user is automaticaly
in ever
Hi
Just out of curiosity does it not make sense to call maybeReopen and
then call get()? If I call get() then I have a new mulitsearcher, so a
call to maybeopen won't reinitialise the multi searcher. Unless I
pass the multi searcher into the maybereopen method. But somehow that
doesn't m
Yes, I don't need a ShingleFilter I understand it by now.
Yes I will have many of these phrases in the documents... this is why I
thought I shouldn't use Lucene fields.
I will investigate further your keyword approach sounds like possible, thx
for the tip.
However I presume I may need to normaliz
Since Lucene doesn't represent/store end position for a token, I don't
think the index can properly represent SYN spanning two positions?
I suppose you could encode this into payloads, and create a custom
query that would look at the payload to enforce the constraint.
Or, if you switch to
Perfect Thanks.
Was also looking at org.apache.lucene.search.ScoreDocComparator
Uwe Schindler wrote:
>
> How about java.util.Arrays.sort() on the array using a simple
> Comparator with a compare() that returns -Float.compare(a.score,
> b.score)? This is just about 7 lines of Java code.
>
>
If I set the boost=0 at query time and the query contains only terms with
boost=0, the scores are NaN (because weight.queryNorm = 1/0 = infinity),
instead of 0.
Peter
On Sun, Mar 1, 2009 at 9:27 PM, Erick Erickson wrote:
> FWIW, Hossman pointed out that the difference between index and
> query
Hi Raymond,
On 3/2/2009 at 10:09 AM, Raymond Balmès wrote:
> suppose I have a tri-gram, what I want to do is index the tri-gram
> "string digit1 digit2" as one indexing phrase, and not index each token
> separately.
As long as you don't want any transformation performed on the phrase or its
comp
How about java.util.Arrays.sort() on the array using a simple
Comparator with a compare() that returns -Float.compare(a.score,
b.score)? This is just about 7 lines of Java code.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original
Dear list
I need to restrict the resultlist to the appropriate rights of the user
who is searching the index.
A document may belong to several groups.
A user must belong to all groups of the document to find it. There's one
additional problem: The groups are a tree. A user is automaticaly
in eve
Is there an existing Utility class which will sort a collection of ScoreDocs
? I have a result set (array of ScoreDocs) stored in JVM and want to sort
them by relevanceScore. I do not want to execute the query again. The stored
result set is sorted by another term and hence the need.
Would highly
I think his problem is, that "SYN" is a synonym for the phrase "WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also match
(or other problems in queries that depend on order of words).
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail:
>
> Hi,
>
> I'm fairly new to Lucene. I'd like to know how we can index synonyms for
> multiple words.
>
> This is the scenario:
>
> Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.
>
> Now assume the two words combined WORD1 WORD2 can be replaced by another
> word SYN.
>
> If I place SYN afte
Well,
In the mean time I've looked at the details of the implementation and it
gave me an idea for what I'm looking for :
suppose I have a tri-gram, what I want to do is index the tri-gram "string
digit1 digit2" as one indexing phrase, and not index each token separately.
In the shingler filter,
Shouldn't WORD2's position be 1 more than your SYN?
Ie, don't you want these positions?:
WORD1 2
WORD2 3
SYN 2
The position is the starting position of the token; Lucene doesn't
store an ending position
Mike
Sumukh wrote:
Hi,
I'm fairly new to Lucene. I'd like to know how we
This has been discussed in the user list, so searching there
might get you answer quicker.
See: http://wiki.apache.org/lucene-java/MailingListArchives
I don't remember the results, but...
Best
Erick
On Mon, Mar 2, 2009 at 9:13 AM, Sumukh wrote:
> Hi,
>
> I'm fairly new to Lucene. I'd like to
Hi,
I'm a complete novice at Lucene, and I'm looking for a little bit of help
with something.
How can I extract the TF*IDF vector for each document in the indexed
collection? Also for the query?
I need to build a user-feedback system which manipulates the query based on
the liked and disliked do
Hi,
I'm fairly new to Lucene. I'd like to know how we can index synonyms for
multiple words.
This is the scenario:
Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.
Now assume the two words combined WORD1 WORD2 can be replaced by another
word SYN.
If I place SYN after WORD1 with positionIn
In my test case I have a set up method that should populate the indexes
before I start using the document searcher. I will start adding some more
debug statements. So basically I should be able to do: get() followed by
maybeReopen.
I will let you know what the outcome is.
Cheers
Amin
On Mon,
Is it possible that when you first create the SearcherManager, there
is no index in each Directory?
If not... you better start adding diagnostics. EG inside your get(),
print out the numDocs() of each IndexReader you get from the
SearcherManager?
Something is wrong and it's best to exp
Hi Raymond,
On 3/1/2009, Raymond Balmès wrote:
> I'm trying to index (& search later) documents that contain tri-grams
> however they have the following form:
>
> <2 digit> <2 digit>
>
> Does the ShingleFilter work with numbers in the match ?
Yes, though it is the tokenizer and previous filter
Nope. If i remove the maybeReopen the search doesn't work. It only works
when i cal maybeReopen followed by get().
Cheers
Amin
On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> That's not right; something must be wrong.
>
> get() before maybeReopen() sh
That's not right; something must be wrong.
get() before maybeReopen() should simply let you search based on the
searcher before reopening.
If you just do get() and don't call maybeReopen() does it work?
Mike
Amin Mohammed-Coleman wrote:
I noticed that if i do the get() before the maybeRe
I noticed that if i do the get() before the maybeReopen then I get no
results. But otherwise I can change it further.
On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> There is no such thing as final code -- code is alive and is always
> changing ;)
>
>
Hi:
The point to catch with bad performance during merging a database
result is to reduce the number of rows visited by your first query.
As an example take a look a these two queries using Lucene Domain
Index, the two are equivalents:
Option A:
select * from (select rownum as ntop_pos,q.* fro
There is no such thing as final code -- code is alive and is always
changing ;)
It looks good to me.
Though one trivial thing is: I would move the code in the try clause
up to and including the multiSearcher=get() out above the try. I
always attempt to "shrink wrap" what's inside a try
Hi
Document.setBoost(float boost) where boost is either your score as is,
or a value based on that score, might do the trick for you.
Other boosting and custom score options include BoostingQuery,
BoostingTermQuery and CustomScoreQuery.
A google search for "lucene boosting" throws up lots of h
Not sure what you are asking about, but you might want to take a look at
http://lucene.apache.org/java/2_4_0/api/contrib-surround/index.html
The Surround parser offers many features around the span query (which I
suspect is what you are looking for)
Shashi
On Mon, Mar 2, 2009 at 4:57 AM, shb w
Hi,
I would like to add to lucene's score another factor - a score between
words.
I have an index that holds couple of words with their score.
How can I take it into account when using Lucene search?
Many thanks,
Liat
hi i need help.
i need to search by word in sentences with lucene. for example by the word
"bbb" i got the right results of all the sentences :
"text ok ok ok bbb" , "text 2 bbb text " , "bbb text 4...".
but i need the result by the word offset in the sentence like this:
"bbb text 4...".
Hi there
Good morning! Here is the final search code:
public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {
final String searchTerm = searchRequest.getSearchTerm();
if (StringUtils.isBlank(searchTerm)) {
throw new SearchExecutionException("Search string ca
56 matches
Mail list logo