Hello Lucene members.
i tried to do reindexing using Lifecycle
interface of Hibernate
,but i'm stuck up with the implementation part of this interface.
I wrote the code for it but i'm now stuck up with the concept of Hibernate.
It uses methods lkie Onsa
I have thought about that. I couldn't figure out a way to make it work.
Fortunately, I have managed to solve the problem (excepting prefix or
wildcard searches) which is very close to what Rajesh suggested (also
see my response to his response).
Thanks for taking a look.
Colin
-Original Mes
Actually, I arrived at a very similar solution for indexing as you did,
but I've been away from a connection, so I haven't been able to post it
here.
Essentially I'm adding the items as you suggest, but I've built a
synonym injector (actually I'm just using the one from "Lucene in
Action") to prod
Have you considered evaluating doc-score thresholds for limiting your
results? Since the perfect answers to these situations lie in the constant
tweaking and twiddling of analysis and tokenization, one way I've found to
help is to evaluate result scores. In your "Ontario CA" example, limiting
res
hey chris,
i was using the hits.doc method while iterating,,,
you've given me some hope!! i will look into the FieldCache
Chris Hostetter <[EMAIL PROTECTED]> wrote:
: currently , i am iterating through about 200-300 of the top docs and
: creating the groups (so, as of now, the groups
PrefixQuery is implimented as a BooleanQuery using term expansion. what
that means is that a prefix query on a common prefix is much more
expensive then a prefix query on a less common prefix. not just in terms
of hte number of documents that match, but because of the number of terms
that match
: currently , i am iterating through about 200-300 of the top docs and
: creating the groups (so, as of now, the groups are partial) , my
: response time HAS to be at most 500-600 milli (query + groupings) or my
: company will probably go with a commercial search engine such as FAST or
: somethin
thanks for the advice guys!
currently , i am iterating through about 200-300 of the top docs and creating
the groups (so, as of now, the groups are partial) , my response time HAS to be
at most 500-600 milli (query + groupings) or my company will probably go with a
commercial search engine
I am curious what would be the difference between searching for a number
verses a character.
I have a large index consisting of a few fields (So index would look
something like: " 123123123 my description my catalog"
Searching for 12* is much slower than searching for de*
I don't have a
Hello,
Does anyone know of a Java port of Sean M. Burke's Unidecode?
http://interglacial.com/~sburke/tpj/as_html/tpj22.html
http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm
TIA.
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/
For now, the best I could come up with is the following scheme
SAMPLE DOCUMENTS:
Lets say there are four documents:
Doc1: st louis, missouri, usa
Doc2: st louis du ha ha, quebec, canada
Doc3: new york, NY, united states of america
Doc4: ny, usa
INDEX PHASE:
-
An approach like mark is describing sould should be a lot more space
efficient then the BitSet intersection approach i described before, but
depending on how many groupings you want, i can immagine that it might be
slower some cases.
Unfortunately, it also only works if the grouping you wnat are
> A simple solution if you only have 20,000 docs is
> just to iterate
> through the hits and count them up against each
> color etc,
The one thing to avoid is reader.document() calls in
such a tight loop. This is always a killer.
The best way I've found is to create one bitset for
all the matchin
: 1] Why do use BitSet Class ?
: 2] Is it required in Filtering / Sorting of results or to Index ?
BitSet is a usefull class for lots of things. the only time (that i know
of) where it is part of the public lucene API is in the interaction
between a Filter and the IndexSearcher ... so unless you
A simple solution if you only have 20,000 docs is just to iterate
through the hits and count them up against each color etc, this could be
in a HitCollector. The balance here is performance vs memory usage, if
you have a lot of users I would go for a solution that was less
efficient but used a lot
hey Jim,
thanks alot for the quick reply! much appreciated
i will look a little closer into what is done in C|Net , seems more cost
efficient than what im currently doing ;)
however i am not sure how scaleable the solution is
if , for example, i recieved 20,000 results and i ha
I have a periodic process that runs as a timer task that periodically
optimizes my search index. However, I am having difficulties with this
process failing:
java.io.IOException: Cannot overwrite: C:\04950_04959\deleteable.new
at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory
Dmitry Goldenberg wrote:
a) if I index "function()" as "function()" rather than "function", does that mean that if
I search for "function", then it won't be found? -- the problem is that in some cases, the user will want to
find function(), and in some cases just function -- can I accommodate
Michael,
Yes, you're describing pretty much what I was thinking of but --
a) if I index "function()" as "function()" rather than "function", does that
mean that if I search for "function", then it won't be found? -- the problem is
that in some cases, the user will want to find function(), and
Please do not cross-post your questions. Your questions are best
asked solely to java-user.
Erik
On Jan 30, 2006, at 9:23 AM, Vikas Khengare wrote:
Hi Friends
I am very New to Lucene World !!! As this world is interesting
to me
So I want to go in deep level of it to
Hi Friends
I am very New to Lucene World !!! As this world is interesting to me
So I want to go in deep level of it to realize the beauty of it.
So can you help me to realize that beauty ?
I have question
1] Why do use BitSet Class ?
2] Is it required in Filtering / Sorting o
There are a number of ways of doing this. One way I would suggest if simply to
store the CONTENTS fields and prefix it with the field name. So instead of
storing a single CONTENTS field for a document, store a CONTENTS field for each
other field with the field name prefixing each field value. E.
Use BitSets to intersect the two queries. First knock up a HitCollector
that generates a bit set for the document set you want to search
(A,B,C,X,Y,Z). Then do another query generating a bit set for the
criteria on (C,X,Y). Then just interest the two bits sets using the
"and" method.
Mike
www.ard
How do you search only certain documents. In the app I am writing
before I start searching with Lucene I know all the documents that I
want to search. For example I have documents A,B,C,X,Y,Z so before I
start the search I know that I only want to search docs C,X,Y due to
other non lucene criteria.
I cranked up the dial on my query tester and was able to get the rate up to
325 qps. Unfortunately, the machine died shortly thereafter (memory errors
:-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit
indexing speed, yet.
Peter
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote
Hi,
Does anyone know if it is possible to show related searches with lucene, for
example if someone searched for "car insurance" you could bring back the
results and related searches like these
Automobile Insurance
Car Insurance Quote
Car Insurance Quotes
Auto Insurance
Cheap Car Insurance
Car
hi, thats exactly what i did :) works perfectly
thanks
_gk
- Original Message -
From: "Chris Hostetter" <[EMAIL PROTECTED]>
To:
Sent: Monday, January 30, 2006 5:56 AM
Subject: Re: deleting duplicate documents from my index
: Hi, im trying to delete duplicate documents from my inde
I was happy to take the hit of storing the text twice.
I have created an aggregate field called "CONTENTS" that has all the other
fields concatenated together.
I also created a list of the other fields (because they can vary from doc to
doc) in another field "FIELDLIST"
I search this field and fo
28 matches
Mail list logo