Hi Maryam,
You can index the content of specific field as UN_TOKENIZED and then you can
do phrase search on that field..
It will search for only phrases not tokens...
To index HTML pages you can use any HTML parser...
this may be useful to you..
http://lucene.apache.org/java/docs/api/org/apache
- Original Message -
From: "Maryam" <[EMAIL PROTECTED]>
To:
Sent: Thursday, March 15, 2007 7:55 AM
Subject: Indexing HTML pages and phrases
Hi,
I am wondering if we can index a phrase (not term) in
Lucene? Also, I am not usre if it can index HTML
pages? I need to have access to the
15 mar 2007 kl. 04.09 skrev Otis Gospodnetic:
eks dev and others - have you tried using the code from
LUCENE-584? Noticed any performance increase when you disabled
scoring? I'd like to look at that patch soon and commit it if
everything is in place and makes sense, so I'm curious if you
14 mar 2007 kl. 21.47 skrev Ryan O'Hara:
Is there a SpellChecker.jar compatible with Lucene 2.1. After
updating to Lucene 2.1, I seem to have lost the ability to create a
spell index using spellchecker-2.0-rc1-dev.jar. Any help would be
greatly appreciated.
Can you explain the problem
Thanks for the detailed reponse Hoss. That's the sort of in depth golden nugget
I'd like to see in a copy of LIA 2 when it becomes available...
I've wanted to use Filter to cache certain of my Term Queries, as it looked
faster for straight Term Query searches, but Solr's DocSet interface abstr
eks dev and others - have you tried using the code from LUCENE-584? Noticed
any performance increase when you disabled scoring? I'd like to look at that
patch soon and commit it if everything is in place and makes sense, so I'm
curious if you or anyone else already tried this patch...
Thanks,
Hello Dear Lucene Users!
Back in the old days (well, last year) the lucene/java/trunk subversion
path was always stable enough for everyone to use into production code.
Now, with the 2.0/2.1/2.2 braches, is it still the case?
In December, I 'ported' my app to use the lucene 2.0 release.
Hi,
I am wondering if we can index a phrase (not term) in
Lucene? Also, I am not usre if it can index HTML
pages? I need to have access to the text of some of
tags, I am not sure if this can be done in Lucene. I
would be so glad if you help me in this case.
Thanks
If I remember correctly, I once searched over 40G of indexes using
multi-searcher with 512M max heap size, how much memory did you give the JVM?
Thanks,
Xiaocheng
senthil kumaran <[EMAIL PROTECTED]> wrote: Hi.
I have more index directories (>6) all in GB,and searching my query with
single Ind
Hey, thanks for the quick reply.
I've considered using a secondary index just for this data but
thought I would look at storing the data in lucene first, since
ultimately this data gets transported to an outside system, and it's
a lot easier if there's only one "thing" to transfer. The
d
If you search the mail archive for "update in place" (no quotes),
you'll find extensive discussions of this idea. Although you're
raising an interesting variant because you're talking about a non-
indexed field, so now I'm not sure those discussions are relevant.
I don't know of anyone who has do
Hi there,
I'm using lucene to index and store entries from a database table for
ultimate retrieval as search results. This works fine. But I find
myself in the position of wanting to occasionally (daily-ish) bulk-
update a single, stored, non-indexed field in every document in the
index,
: > the only real reason you should really need 2 searchers at a time is if
: > you are searching other queries in parallel threads at the same time ...
: > or if you are warming up one new searcher that's "ondeck" while still
: > serving queries with an older searcher.
:
: Hoss, I hope I misunder
Is there a SpellChecker.jar compatible with Lucene 2.1. After
updating to Lucene 2.1, I seem to have lost the ability to create a
spell index using spellchecker-2.0-rc1-dev.jar. Any help would be
greatly appreciated.
Thanks,
Ryan
--
Chris Hostetter wrote:
the only real reason you should really need 2 searchers at a time is if
you are searching other queries in parallel threads at the same time ...
or if you are warming up one new searcher that's "ondeck" while still
serving queries with an older searcher.
Hoss, I hope I mi
just to complete this fine answer,
there is also Matcher patch (https://issues.apache.org/jira/browse/LUCENE-584)
that could bring the best of both worlds via e.g. ConstantScoringQuery or
another abstraction that enables disabling Scoring (where appropriate)
- Original Message
From: Ch
: I'm thinking something like +pizza^0 garlic^1 "goat cheese"^-1
that does in fact work.
: 2) Once I have this list of results, can I change their rank order without
: having to do a full scale search again?
the frequency of "pizza' won't affect the score at all, so you should need
to do much
Most search engine technologies return result sets based some weighted
frequency of the search terms found. I've got a new problem, I want to rank
by different criteria than I searched for.
For example, I might want to return as my result set all documents that
contain the word pizza, but rank t
it's kind of an Apples/Oranges comparison .. in the examples you gave
below, one is executing an arbitrary query (which oculd be anything) the
other is doing a simple TermEnumeration.
Asuming that Query is a TermQuery, the Filter is theoreticaly going to be
faster becuase it does't have to comput
: I just have two IndexSearchers opened now most of the time, which is
: deprecated,
: But I think that's my only choice !
2 searchers is fine ... it's "N" where N is not bound that you want to
avoid.
from what i understand of your requirements, you don't *really* need two
searchers open ... ope
OK, I caused more confusion than rendered help by my stemming
statement. The only reason I mentioned it was to illustrate that
performance is not linearly related to size.
It took some effort to put stemming into the index, see
PorterStemmer etc. This is NOT the default. So I took it out
to see w
Thanks Steven and Antony.
I read the FAQ not very long ago, but that slipped my attention. Or
perhaps it's a recent change.
- Øystein -
--
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt
27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47
I'm searching a 20GB index and my searching JVM is allocated 1Gig.
However, my indexing app only had 384mb availible to it, which means you
can get away with far less. I believe certain index tables will need to
be swapped in and out of memory though so it may not search as quickly.
With a 1.
14 mar 2007 kl. 14.51 skrev Bhavin Pandya:
what i am looking for is dictionary for spell checker.
I am trying to customised lucene spell checker for phrase.
so thinking if anyhow i am able to fetech phrases from the index
itself then i can train my spellchecker.
I tried with query logs but
When your app gets a java.lang.OutOfMemory exception.
--
Ian.
On 3/14/07, Dennis Berger <[EMAIL PROTECTED]> wrote:
Ian Lea schrieb:
> No, you don't need 1.8Gb of memory. Start with default and raise if
> you need to?
how do I know when I need it?
> Or jump straight in at about 512Mb.
>
>
> -
Ian Lea schrieb:
No, you don't need 1.8Gb of memory. Start with default and raise if
you need to?
how do I know when I need it?
Or jump straight in at about 512Mb.
--
Ian.
On 3/14/07, Dennis Berger <[EMAIL PROTECTED]> wrote:
Do I have to keep something in mind to do searching on large ind
hi Erick,
Well, typically my application will start with some hundreds of
indexes...and then grow at a rate of several per day, for ever. At
some point I know I can do some merging etc if needed.
Size is dependant on the customer, could be up to a 1G per index. That
is way I would like to minim
No, you don't need 1.8Gb of memory. Start with default and raise if
you need to?
Or jump straight in at about 512Mb.
--
Ian.
On 3/14/07, Dennis Berger <[EMAIL PROTECTED]> wrote:
Do I have to keep something in mind to do searching on large indices?
I actually have an index with a size of 1.8g
Do I have to keep something in mind to do searching on large indices?
I actually have an index with a size of 1.8gb. I have indexed 1.5
million items from Amazon.
How much memory do I have to give to the jvm?
As a sidenote I have to tell you that I optimized the index so it's one
segment file.
Hi erick,
what i am looking for is dictionary for spell checker.
I am trying to customised lucene spell checker for phrase.
so thinking if anyhow i am able to fetech phrases from the index itself then
i can train my spellchecker.
I tried with query logs but it has lot of spell mistakes...
Any
I found that reducing my index from 8G to 4G (through not stemming) gave me
about a 10% performance improvement.
How did you do this? I don't see this as an option.
Jeff
How much memory are you allocating for your JVM? Because you're
paying a huge search time penalty by opening and closing your
searcher sequentially, it would be a good thing to not do this.
But, as you say, if you're getting OOM errors, that's a problem.
What is the total size of all your indexes
Store as little as possible, index as little as possible .
How big is your index, and how much do you expect it to grow?
I ask this because it's probably not worth your time to try to
reduce the index size below some threshold... I found that
reducing my index from 8G to 4G (through not stemm
Hi.
I have more index directories (>6) all in GB,and searching my query with
single IndexSearcher to all indexes one after another.i.e. I create one
IndexSearcher for index1 and search over that.Finally I close that and
create new IndexSearcher for index2 and so on. If i get 200 total results
Your problem statement lends itself to flippant answers like "just
use a PhraseQuery". So I clearly don't understand what you're trying
to accomplish. Are you trying to find all of the occurrences of a
particular phrase? All the phrases (however that's defined) for
all the documents? What problem
From
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.TermVector.html:
"A term vector is a list of the document's terms and their number of
occurences in that document."
--
Ian.
On 3/14/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
Yes but what is a term vector?
---
Yes but what is a term vector?
-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: 13 March 2007 19:28
To: java-user@lucene.apache.org
Subject: Re: IndexReader.GetTermFreqVectors
It means it return the term vectors for all the fields on that document
where you have
Hi,
I want to make my index as small as possible. I noticed about
field.setOmitNorms(true), I read in the list the diff is 1 byte per
field per doc, not huge but hey...is the only effect the score being
different? I hardly mind about the score so that would be ok.
And can I add to an index witho
Hello guys,
I am using lucene 1.9 and i have 3GB of index.
I know we can extract tokens from index easily but can we extract phrase ?
Regards.
Bhavin pandya
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e
>> Is that the same reader that is used in IndexSearcher?
I opened an IndexSearcher on the path (String) to the index.
Now I tried to open on the clone IndexReader and use the constructor
that has an IndexReader as param,
and I got everything working now
I just have two IndexSearchers opene
40 matches
Mail list logo