Dave,
Thanks for the pointer. The Wrapper worked marvellously! This was exactly the
situation - wanting to treat the standard fields and keyword fields differently
as far as stemming is concerned (no stemming for the latter).
- Dmitry
From: Dave Kor [mailt
: for this site, but would you cash all manufacturers and intersect all with
: the initial query in one page load? Seems like that would be alot.
Yep it is a lot, but if you've got the RAM, it's not that time intensive.
At CNET, depending on what page you are looking at, i'm doing
anywhere from 1
Chris.. thanks for you quick response.
:doing a few thousand BitSet intersections doesn't take as much time as you
think
Even if the BitSet is around 4-5 million? and I would have to quickly go
through about a thousand of these?
I guess I would have to decide what sub-cats to cache the bitsets fo
Yes, it's hyperthreaded (16 cpus show up in task manager - the box is
running 2003). I plan to turn off hyperthreading to see if it has any
effect.
Peter
On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> > It's a 3GHz Intel box with Xeo
You will likely find this thread interesting...
http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html
: 1) Do queries for each sub-category using the results of the first initial
: query and use the hits count to select the sub-categories to displa
On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> It's a 3GHz Intel box with Xeon processors, 64GB ram :)
Nice!
Xeon processors are normally hyperthreaded. On a linux box, if you
cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU
system. Are you positive you have 8 physical X
The index is non-compound format and optimized. Yes, I did try
MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors)
Peter
On 1/25/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Peter Keegan wrote:
> > This is just fyi - in my stress tests on a 8-cpu box (that's 8 real
> cpus
: I want a query of the form:
:
: x AND ( a OR b OR c OR d)
what your code is currenlty doing is adding 5 term queries to a single
boolean query.
The structure you want is not a single boolean query, it's a boolean query
containing two mandatory clauses: the first being a term query, and the
sec
It's a 3GHz Intel box with Xeon processors, 64GB ram :)
Peter
On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Thanks Peter, that's useful info.
>
> Just out of curiosity, what kind of box is this? what CPUs?
>
> -Yonik
>
> On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> > This is
Hi,
Quick reactions:
- Do use -server option, it makes a difference, and I don't think there is much
to test there (I've never run a daemon-like service without the -server option,
and have seen the improvement in performance due to HotSpot with my own eyes)
- Optimizing every hour sounds like a
Peter Keegan wrote:
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
the maximum throughput occurred with just 4 query threads. The query
throughput decreased with fewer than 4 or greater than 4 query threads. The
entire index was most likely in the file system cache, t
Thanks Peter, that's useful info.
Just out of curiosity, what kind of box is this? what CPUs?
-Yonik
On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
> the maximum throughput occurred with just 4 query threads. The
I have ~5 million documents that are in categories and subcategories. Let us
say that my query is for search terms in one top-level category and it
returns a large amount of documents and I want to list the top 5
sub-categories by highest count total. I know I can't go one by one counting
through t
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
the maximum throughput occurred with just 4 query threads. The query
throughput decreased with fewer than 4 or greater than 4 query threads. The
entire index was most likely in the file system cache, too. Periodic
snapshots
On Jan 25, 2006, at 6:39 AM, Gwyn Carwardine wrote:
Yes I think you're right. On reading the "lucene in action"
chapted on
highlighting I found it squirreled in the middle of the text. I get
the
feeling that whilst I have so far found query parser to be the
primary
method of building queri
Hi,
I have looked at MoreLikeThis functionality. I would like to
add moreDisLikeThis functionality as well. It is important for me to
learn from similarity as well as dissimilarity with other documents. I
have done the basic ground work of forming two queries (one with
MoreLikeThis c
Michael Pickard wrote:
Can anyone help me with the formation of a BooleanQuery ?
I want a query of the form:
x AND ( a OR b OR c OR d)
You're going to need 2 BooleanQuery objects, one for the OR'd expression
in parentheses, and another for the AND and expression. Something like
this:
Can anyone help me with the formation of a BooleanQuery ?
I want a query of the form:
x AND ( a OR b OR c OR d)
The nearest I've managed to get is
query.add(new TermQuery(new Term(2, "x")),true,false);
Term term = null;
for (int i=1; i
Sorry forgot to mention what you do for floats is take everything to the
left of decimal point encode this to 16 digit hex (via long) then append
of decimal point and everything following it. The only problem we tend
to find is searching across large ranges either produces an exception
about too ma
>> Yes I think you're right. On reading the "lucene in action" chapted on
>> highlighting I found it squirreled in the middle of the text. I get
>> the
>> feeling that whilst I have so far found query parser to be the primary
>> method of building queries that this is not ht eprimary method used
On Jan 24, 2006, at 5:43 PM, Gwyn Carwardine wrote:
Yes I think you're right. On reading the "lucene in action" chapted on
highlighting I found it squirreled in the middle of the text. I get
the
feeling that whilst I have so far found query parser to be the primary
method of building queries
I can recommend this method, this is how we do it, but what we store in
the index is the long converted to a 16 digit number hex. The extended
parser converts entered queries containing longs field to have hex. We
obviously also do the conversion before we display the value. Floating
point numbers
On Jan 25, 2006, at 12:50 AM, Ravi wrote:
I am also have some problem with highlighter when I want to highlight
specific field in the lucene it is not working
Improvements were made to the Highlighter in December to add field-
specific highlighting capability. Here's the svn log:
-
23 matches
Mail list logo