This is somewhat related to a question sent to this list a while ago: Is
there an efficient way to count the number of occurrences of a phrase (not
term) in an index?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional c
You need to make sure you are indexing with Term Vectors in order for
IndexReader.getTermFreqVector to return anything meaningful. You do not
need to implement it.
QueryTermVector is meant to provide similar information to the Document
side for Queries.
For an example demo of indexing and using t
Hi Otis,
The Lucene server is actually CPU and network bound, as the index gets
memory mapped pretty quickly. There is little disk activity observed.
I was also able to run the server on a Sun box last night with 4 dual core
opterons (same Linux and JVM) and I'm observing query rates of 400 qps!
Can nutch be made to use lucene query parser?
Rgds
Prabhu
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
>
> Hi Otis,
>
> The Lucene server is actually CPU and network bound, as the index gets
> memory mapped pretty quickly. There is little disk activity observed.
>
> I was also able to run
I would give the IBM or blackdown JVM a try on linux - I've seen pretty
wide variance in their speed on different operations.
Sometimes better than Sun, sometimes worse - it depended on the task (I
did some adhoc tests at one point that showed sun was faster for
indexing, but IBM was faster fo
Hi,
I need to find all distinct values for a keyword field in a Lucene index.
Is this easily done? If so how?
Many thanks,
Hugh
Hi,
We have a custom built document repository which is searchable / indexed via
lucene.
I want to put together some kind of navigation framework based on the
repository metadata (which is also indexed with lucene).
Is there a best-practice way to do this.?
Thanks,
Hugh
Hugh Ross wrote:
I need to find all distinct values for a keyword field in a Lucene index.
I think the IndexReader.terms() method will do what you want. Good luck!
--MDC
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For addi
Many Thanks.
Hugh
-Original Message-
From: Michael D. Curtin [mailto:[EMAIL PROTECTED]
Sent: 23 February 2006 17:39
To: java-user@lucene.apache.org
Subject: Re: SQL DISTINCT functionality in Lucene
Hugh Ross wrote:
> I need to find all distinct values for a keyword field in a Lucene i
I reindexed with the path as a keyword field and now the PrefixQuery filter
does exactly what I need. Thanks!
I'm going to hold off on the paragraph-level indexing for now, but that does
sound interesting.
many thanks,
John
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECT
Hi,
Please ask on the Nutch mailing list (I answered your question in general@
already).
Also, please don't steal other people's threads - it's considered inpolite for
obvious reasons.
Otis
- Original Message
From: Raghavendra Prabhu <[EMAIL PROTECTED]>
To: java-user@lucene.apache.or
Hi
Sorry for the trouble
I was sending my first mail to the group
and replied to this thread and then later on sent a direct mail.
I would like to apologise for the inconvenience caused.
Rgds
Prabhu
On 2/23/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Please ask on the Nutch m
Hi luceners,
I have a problem that I don't know what to do.
I want to use ISOLatin1AccentFilter that I found In lucene trunks
The code in my analyzer is:
public final TokenStream tokenStream(String fieldName, Reader reader) {
if (fieldName == null) throw new IllegalArgumentException("fiel
We discovered that the kernel was only using 8 CPUs. After recompiling for
16 (8+hyperthreads), it looks like the query rate will settle in around
280-300 qps. Much better, although still quite a bit slower than the
opteron.
Peter
On 2/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Hmmm, n
Peter,
Have you given JRockit JVM a try? I've seen it help throughput
compared to Sun's JVM on a dual xeon/linux machine, especially with
concurrency (up to 6 concurrent searches happening). I'm curious to
see if it makes a difference for you.
-chris
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]>
On Feb 23, 2006, at 1:22 PM, Daniel Cortes wrote:
Hi luceners,
I have a problem that I don't know what to do.
I want to use ISOLatin1AccentFilter that I found In lucene trunks
The code in my analyzer is:
public final TokenStream tokenStream(String fieldName, Reader
reader) {
if (fie
On Feb 23, 2006, at 12:37 PM, Hugh Ross wrote:
Hi,
We have a custom built document repository which is searchable /
indexed via
lucene.
I want to put together some kind of navigation framework based on the
repository metadata (which is also indexed with lucene).
Is there a best-practice
On Feb 22, 2006, at 9:01 PM, David Pratt wrote:
Hi Erik. Many thanks for your reply. I'll likely see if I can find
a list to pose a couple of questions there way. I am having fun
with Lucene since it is new to me and I am impressed with the speed
I am getting. I am reading anything I can ge
Chris,
I tried JRockit a while back on 8-cpu/windows and it was slower than Sun's.
Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next
(32 with hyperthreading), on LinTel. I may give JRockit another go around
then.
Thanks,
Peter
On 2/23/06, Chris Lamprecht <[EMAIL PROTECT
I have been trying to figure out why my query below would not return any
hits.
I use two custom analyzers for indexing and searching. The one I use for
indexing uses this:
public TokenStream tokenStream(String fieldName, Reader reader)
{
TokenStream result = new StandardTokenizer
Wow, some resources!
Would it be cheaper / more scalable to copy the index to multiple
boxes and loadbalance requests across them?
-Yonik
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next
> (32 with hyperthreading), o
Yonik,
We're investigating both approaches.
Yes, the resources (and permutations) are dizzying!
Peter
On 2/23/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Wow, some resources!
> Would it be cheaper / more scalable to copy the index to multiple
> boxes and loadbalance requests across them?
>
>
1) Have you looked at what tokens your indexing analyzer produces when you
tokenize "ES-20D" ?
2) Have you looked at what tokens your query analyser products when you
tokenize "ES-20D" ?
3) Have you tried a simpler query (ie: just "content:es\-20d" ) ?
4) When giving QueryParser a (quoted) p
In my earlier email i put in the wrong query that I am searching on. The
correct query is: EOS-20D
And this is the query under question that is producing no hits still:
+(+content:eos\-20d) +entity:product +(title:"eos\-20d"~2^40.0
((title:eos\-20d)^10.0) content:"eos\-20d"~2^20.0 (content:eos
Hi everyone,
Sorry for not replying to original post (from Muffadal Khumri, 22/2) - I'm
new to the list.
I also had this problem, but it seems not to be in the source - downloading
and building the1.9-rc1 source fixed the problem for me.
Steve
Stephen Gray
Archive Research Officer
Austral
Follow up on my previous email ...
When I execute this query using luke using the standard analyzer on the
same index, i get 8 hits.
+(+content:eos\-20d) +entity:product +(title:"eos\-20d"~2^40.0
((title:eos\-20d)^10.0) content:"eos\-20d"~2^20.0 (content:eos\-20d)
categoryName:"eos\-20d"^80.0)
I searched my question in the mail archive, and found that I really want to
get a phrase frequency, it is an old question which was not solved well.
I traced Lucene source code, and discover that I can get a phrase's IDF from
the Hits object
weight= PhraseQuery$PhraseWeight (id=62)
idf= 8.
Thanks Erik. I am continuing to experiment and making good progress. I
have got my basic functionality established and am now looking at
sorting and ranking. I guess the good thing is I can adjust and modify
things as I learn more. I am reading some archived material from the
list as well to g
Not sure if this is what you want, but what I have done is to issue
exact phrase queries to Lucene and counted the number of hits found.
On 2/23/06, Eric Jain <[EMAIL PROTECTED]> wrote:
> This is somewhat related to a question sent to this list a while ago: Is
> there an efficient way to count the
29 matches
Mail list logo