What part of Grant and Karl's answers to you the last time you asked this
question wasn't clear? have you tried it?
http://www.nabble.com/Re%3A-Common-Words-ignoring-problem-p9550886.html
http://www.nabble.com/Re%3A-Common-Words-ignoring-problem-p9567881.html
: I want to be make sure, if this s
Hi,
I've looked the uses of MergeFactor and MaxBufferedDocs.
If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first 100
segments will be merged in RAMDir when 100 docs arrived. At the end of 350th
doc added to writer , RAMDir have 2 merged segment files + 50 seperate
segment files
Thanks for your reply and this useful links.
On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
see also the FAQ "Why am I getting no hits / incorrect hits?" which points
to...
http://wiki.apache.org/lucene-java/BooleanQuerySyntax
...I've just added some more words of wisdom there from p
I want to be make sure, if this statement is Right or not?
"I am using StatndardAnaylyzer for Indexing documents. Bydefault it ignores
some words when doing indexing. But when we search something, Lucene again
include the ignore words in searching".???
Myproblem is that:-
I indexed a word documen
Maryam wrote:
Hi,
I have three questions about indexing:
1) I am indexing HTML documents, how can I do "stop
removal" before indexing, I dont want to index stop
words?
The same way you would do it for indexing text documents: StopFilter.
2) I can have an access to the terms in one documen
Hi Grant, I think you resolved the question already, but just to
make sure...
Grant Ingersoll <[EMAIL PROTECTED]> wrote on 22/03/2007 20:41:27:
>
> On Mar 22, 2007, at 11:21 PM, Grant Ingersoll wrote:
>
> > I think I see in the ReadTask that it is the res var that is being
> > incremented and wou
: Thanks Erick, I've been using TopDocs, but am playing with my own HitCollector
: variant of TopDocHitCollector. The problem is not adjusting the score, it's
: what to adjust it by, i.e. is it possible to re-evaluate the scores of H1 and
H2
: knowing that the original query resulted in hits on H
23 mar 2007 kl. 03.07 skrev Melanie Langlois:
Thanks Karl, the performances graph is really amazing!
I have to say that it would not have think this way around would be
faster, but sounds nice if I can use this, make everything easier
to manage. I'm just wondering what did you consider when
23 mar 2007 kl. 04.25 skrev Ryan McKinley:
Is there any way to find frequent phrases without knowing what you are
looking for?
I think you are looking for association rules. Try searching for
Levelwise-Scan.
Weka contains GPLed Java code.
Cite seer is your best friend for whitepapers. htt
On Mar 22, 2007, at 11:21 PM, Grant Ingersoll wrote:
I think I see in the ReadTask that it is the res var that is being
incremented and would have to be altered. I guess I can go by
elapsed time, but even that seems slightly askew. I think this is
due to the withRetrieve() function overh
Is there any way to find frequent phrases without knowing what you are
looking for?
I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that
seems kind of clunky particularly if the phrase length is large. Is
there any position offset magic that will surface frequent phrases
automati
OK, Doron (and other benchmarkers!), on to search:
Here's my alg file:
#Indexing declaration up here
OpenReader
{ "SrchSameRdr" Search > : 5000
{ "SrchTrvSameRdr" SearchTrav > : 5000
{ "SrchTrvSameRdrTopTen" SearchTrav(10) > : 5000
{ "SrchTrvRetLoadAllSameRdr" SearchTravRet > :
Hi,
I have three questions about indexing:
1) I am indexing HTML documents, how can I do "stop
removal" before indexing, I dont want to index stop
words?
2) I can have an access to the terms in one document,
but how can I have access to the document name that
these terms has been appeared?
3)
Thanks Karl, the performances graph is really amazing!
I have to say that it would not have think this way around would be faster, but
sounds nice if I can use this, make everything easier to manage. I'm just
wondering what did you consider when you build your graph, only the time to run
the que
23 mar 2007 kl. 02.09 skrev Daniel Noll:
Maryam wrote:
Hi, I have written this piece of code to read the index,
mainly to see what terms are in each document and what
the frequency of each term in the document is. This
piece of code correctly calculates the number of docs
in the index, but I d
23 mar 2007 kl. 02.12 skrev Melanie Langlois:
I want to manage user subscriptions to specific documents. So I
would like to store the subscription (query) into the lucene
directory, and whenever I receive a new document, I will search all
the matching subscriptions to send the documents to
Hello,
I want to manage user subscriptions to specific documents. So I would like to
store the subscription (query) into the lucene directory, and whenever I
receive a new document, I will search all the matching subscriptions to send
the documents to all subcribers. For instance if a user s
Maryam wrote:
Hi,
I have written this piece of code to read the index,
mainly to see what terms are in each document and what
the frequency of each term in the document is. This
piece of code correctly calculates the number of docs
in the index, but I don’t know why variable
myTermFreq[] is nul
Hi,
I have written this piece of code to read the index,
mainly to see what terms are in each document and what
the frequency of each term in the document is. This
piece of code correctly calculates the number of docs
in the index, but I dont know why variable
myTermFreq[] is null. Would you ple
Mike O'Leary wrote:
Please forgive the laziness inherent in this question, as I haven't looked
through the PDFBox code yet. I am wondering if that code supports extracting
text from PDF files while preserving such things as sequences of whitespace
between characters and other layout and formattin
Oh yeah.. By only loading the relevant fields, my query times
reduced by over 90%. I actually wrote that up on the mailing list if
you wanted to try to find it, but it took Andreas' message to
remind me...
Erick
On 3/22/07, Santa Clause <[EMAIL PROTECTED]> wrote:
Another thing you may want to
Official job description & info to submit a resume:
http://www.systemsalliance.com/careers/internal-jobs/baltimore/Software_
Engineer_MD.html
Located 15 minutes North of Baltimore in Sparks, MD
Position is on a team, working with myself and others, maintaining and
developing an existing co
Another thing you may want to look at is the newer version 2.1.0 and
getFieldable. I think that will lazy load the data, that way you are only
reading the parts of the document that you need at that moment rather than the
whole thing. Someone please correct me if I am wrong or point to what
Erick Erickson wrote:
Don't know if it's useful or not, but if you used TopDocs instead,
you have access to an array of ScoreDoc which you could modify
freely. In my app, I used a FieldSortedHitQueue to re-sort things
when I needed to.
Thanks Erick, I've been using TopDocs, but am playing with
Mike O'Leary wrote:
Please forgive the laziness inherent in this question, as I haven't looked
through the PDFBox code yet. I am wondering if that code supports extracting
text from PDF files while preserving such things as sequences of whitespace
between characters and other layout and formattin
see also the FAQ "Why am I getting no hits / incorrect hits?" which points
to...
http://wiki.apache.org/lucene-java/BooleanQuerySyntax
...I've just added some more words of wisdom there from past emails.
: Date: Thu, 22 Mar 2007 09:51:15 -0400
: From: Erick Erickson <[EMAIL PROTECTED]>
: Reply
Well, you don't index phrases, it's done for you. You should try
something like the following
Create a SpanNearQuery with your terms. Specify an appropriate
slop (probably 0 assuming you want them all next to each other).
Now use call getSpans and count ... You may have to do
something with
Hi,
I know how to index terms in lucene, now I wanna see
how can I index phrases like "information retreival"
in lucene and calculate the number of times that
phrase has appeared in the document. Is there any way
to do it in Lucene?
Thanks
rubdabadub wrote:
On 3/22/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Nice idea and I can see the benefit of it to you and I don't mean to
be a wet blanket on it, I just wonder about the legality of it.
So long as it meets the Apache license conditions regarding the
distribution it's not f
Your timing differences are probably because of caching. But this
has been mentioned many times in the archive, that a Hits object
is intended to allow fast, simple retrieval of the first few documents
in a result set (100 if memory serves). Each 100 or so calls to
next() causes the search to be r
Hi,
While looking into performance enhancement for our search feature I
noticed a significant difference in Documents access time while looping
over Hits.
I wrote a test application search for a list of search terms and then
for each returned Hits object loops twice over every single hits.doc(i).
This is a pretty common issue that I've been grappling with by chance
recently. The main point is that the parser is NOT a boolean logic
parser.
Search the mail archive for the thread "bad query parser bug" and
you'll find a good discussion.
I tried using PrecedenceQueryParser, but that didn
Good to hear :-)
I am curious, how many custom changes are you making to the code that
this is even an issue? Perhaps submitting patches and working to get
them committed would be a more efficient strategy.
Well there are 3 problems I see.
1. There are very good patches on all of the lucene
Don't know if it's useful or not, but if you used TopDocs instead,
you have access to an array of ScoreDoc which you could modify
freely. In my app, I used a FieldSortedHitQueue to re-sort things
when I needed to.
ERick
On 3/22/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:
I have indexed obj
On Mar 22, 2007, at 8:16 AM, rubdabadub wrote:
On 3/22/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Nice idea and I can see the benefit of it to you and I don't mean to
be a wet blanket on it, I just wonder about the legality of it.
People may find it and think it is the official Apache Luce
Otis,
I hadn't really thought about this, but it would be easy to build a
dictionary from an existing Lucene index. Tha main caveat is that it would
only work with "stored" fields. That's because this spellchecker boosts
accuracy using pair frequencies in addition to term frequencies, and Lucene
On 3/22/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Nice idea and I can see the benefit of it to you and I don't mean to
be a wet blanket on it, I just wonder about the legality of it.
People may find it and think it is the official Apache Lucene, since
it is branded that way. I'm not a lawye
Nice idea and I can see the benefit of it to you and I don't mean to
be a wet blanket on it, I just wonder about the legality of it.
People may find it and think it is the official Apache Lucene, since
it is branded that way. I'm not a lawyer, so I don't know for sure.
I think you have t
On 3/22/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Is the point of this that you can make "commits" to Lucene so that
you don't lose your changes on trunk?
Not only that. But I can make as many local branch as I like ..for example
customer X, customer Y. This way I can support X and Y as th
Is the point of this that you can make "commits" to Lucene so that
you don't lose your changes on trunk?
On Mar 22, 2007, at 7:14 AM, rubdabadub wrote:
Hi:
First of all apology to those friends who follow all the list.
Often times I work offline and I do not have any commit rights to any
of
Hi:
First of all apology to those friends who follow all the list.
Often times I work offline and I do not have any commit rights to any
of the projects. All the modifications I make for various clients and
trying to keep up to date with latest trunk somehow makes it difficult
for me to just sti
Hi,
Can anyone explain how lucene handles the belowed query?
My query is *field1:source AND (field2:name OR field3:dest)* . I've
given this string to queryparser and then searched by using searcher. It
returns correct results. It's query.toString() print is :: +field1:source
+(field2:name f
Hi Erick,
excellent insight, thanks a lot. As you would expect, this method works a treat.
thanks a lot for your time!
Emanuel
- Original Message -
From: "Erick Erickson" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, March 21, 2007 2:12:49 PM (GMT+0100) Europe/Berl
Melanie Langlois wrote:
Well, thanks, sounds like the best option to me. Does anybody use the
PerFieldAnalyzerWrapper? I'm just curious to know if there is any impact on
the performances when using different analyzers.
I've not done any specifc comparisons between using a single Analyzer and
m
44 matches
Mail list logo