Otis,
If i'm not mistaken block size especially on ext3
becomes an issue when you hit a peak amount of total
blocks and lose performance on inode lookup vs that of
of Reiserfs.. for example you may gain performance by
going to 4k vs 1k on ext3 however Reiserfs at that
block level size should be xx
The reason we don't use Google appliance is that our company doesn't give
recommendations on OSs or Hardwares to run, it would looke a little wierd if we
say, oh, you have to buy this hardware for our search engine, but for our core
technology, feel free to deploy it anywhere you want. It just
Otis Gospodnetic wrote:
Michael,
Actually, one more thing - you said you changed the
store/BufferedIndexOutput.BUFFER_SIZE from 1024 to 4096 and that turned out to
yield the fastest indexing. Does your FS block size also happen to be 4k
(dumpe2fs output) on that FC3 box? If so, I wonder if
Michael,
Actually, one more thing - you said you changed the
store/BufferedIndexOutput.BUFFER_SIZE from 1024 to 4096 and that turned out to
yield the fastest indexing. Does your FS block size also happen to be 4k
(dumpe2fs output) on that FC3 box? If so, I wonder if this is more than just a
Hi,
Thanks for the speedy answer, this is good to know.
However, i was wondering about the FS block size consider a Linux box:
$ dumpe2fs /dev/sda1 | grep "Block size"
dumpe2fs 1.36 (05-Feb-2005)
Block size: 1024
That shows /dev/sda1 has blocks 1k in size. I don't think these
Otis Gospodnetic wrote:
Hi,
I'm wondering if anyone has tested Lucene indexing/search performance with
different file system block sizes?
I just realized one of the servers where I run a lot of Lucene indexing and
searching has an FS with blocks of only 1K in size (typically they are 4k or
I answered it yesterday, please check the archives...
Otis
- Original Message
From: "Aigner, Thomas" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Fri 10 Feb 2006 09:25:14 AM EST
Subject: RE: 1.9 lucene version
Anyone have a comment on the below message?
-Original Me
Hi,
I'm wondering if anyone has tested Lucene indexing/search performance with
different file system block sizes?
I just realized one of the servers where I run a lot of Lucene indexing and
searching has an FS with blocks of only 1K in size (typically they are 4k or
8k, I believe), so I starte
For reading word document as text, you can try AntiWord.
I have written a simplified Lucene that does Max words match.
For example, if you are searching for aa, bb, cc, then, the document that
contains all words (aa, bb, cc) will be definitely ranked higher than
documents containing either aa, bb
PDFBox can handle multi-byte encodings. There are a couple recent fixes
for CJK languages that are not part of 0.7.2 but are part of the nightly
build.
Ben
On Fri, 10 Feb 2006, Zhang, Lisheng wrote:
> Hi,
>
> Currently we are using PDFBox to process PDF files and
> POI to process DOC/XLS fil
: I built a wrong query string "word1,word2,word3" instead of "word1
: word2 word3"
: therefore I got a wrong query: field:"word1 word2 word3" instead of
: field:word1 field:word2 field:word3.
:
: Is this an espected behaviour?
: I used Standard analyzer, probably therefore, the comas were re
Hi,
Currently we are using PDFBox to process PDF files and
POI to process DOC/XLS files, before send strings to lucene
for indexing,
Does any one know if PDFBox or POI can process multi-
byte characters like Japanese with various encodings (whatever
specified in PDF or DOC)?
Thanks very much for
On 2/10/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote:
> However I also want to retrieve those documents (in order) where one or more
> of the terms is missing from either of the fields. i.e,
BooleanQuery.setMinimumNumberShouldMatch() in the development version
(1.9) of Lucene may help out in tha
Does anyone have a good way to formulate the query in terms of performance
as well as ordering of retrieved documents for the following query?
Query: "field1:t1 t2 t3 t4 AND field2:t5 t6 t7"
I want to achieve the following
* The document which matches the query exactly in both the fields gets ran
Dmitry Goldenberg wrote:
Awesome stuff. A few questions: is your Excel extractor somehow
better than POI's? and, what do you see as the timeframe for adding
WordPerfect support? Are you considering supporting any other sources
such as MS Project, Framemaker, etc?
I just committed a WordPerfectE
Anyone have a comment on the below message?
-Original Message-
From: Aigner, Thomas
Sent: Wednesday, February 08, 2006 11:50 AM
To: java-user@lucene.apache.org
Subject: 1.9 lucene version
Hello all,
I have a couple of questions for the community about the 1.9
Lucene version.
On Feb 10, 2006, at 4:37 AM, <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> wrote:
IF QueryParser gets a phrase with a number of words (ie: "here are
words") it uses the implicit operator OR - "here OR are OR words". LIA
on p94 says the operator "by default is OR", implying that there
may be
some
Hi all,
I built a wrong query string "word1,word2,word3" instead of "word1
word2 word3"
therefore I got a wrong query: field:"word1 word2 word3" instead of
field:word1 field:word2 field:word3.
Is this an espected behaviour?
I used Standard analyzer, probably therefore, the comas were repl
Hi,
Instead of using the static parse() method of QueryParser, you will need
to create a new instance, and the call
setOperator(DEFAULT_OPERATOR_AND);
Iain
www.ardentia.com the home of NetSearch
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 10 February 20
Hi guys,
IF QueryParser gets a phrase with a number of words (ie: "here are
words") it uses the implicit operator OR - "here OR are OR words". LIA
on p94 says the operator "by default is OR", implying that there may be
some way to change this.
We'd really like the default to be AND. Is that pos
HI,
I am doing the same.
My design contains.
Index Repository: is responsible for keep up to index.:)Index-configurator.
Index-manager: Real CRUD indexing..
And ofcourse index-searcher: I want results ..:)
U know very well->Searcher and Indexer are both separate functionalities..
So the reason wh
> 2. If I choose to sort the results by date, then recent documents with
> very very low relevancy (say the words searched appears only in
> content, and not in title/bylines/summary fields that are boosted
> higher) are still shown relatively high in the list, and I wish to
> omit them in general.
22 matches
Mail list logo