: for (String key : title.getTitel().split("\\n") ) {
: titleDocument.add(new Field("TI", key,
Field.Store.NO,
: Field.Index.TOKENIZED));
: }
that adds each new title one after the ot
Hi Patrik
Thanks for the thoughtful responses. I am not a pro with Searchers yet, but it
seems like closing + opening searchers would still result in a small period of
unserviceability. I would also like to stick to the Directory API so that I
can keep the option to use FS or RAM based index
Hi,
I'm running load tests with Lucene 2.0, SUN's JDK 6 on Windows XP2, dual core
CPU. I have 8 worker threads adding a few hundred K documents, split between
two Lucene indexes, I've started getting
java.io.IOException: The handle is invalid in places like
java.io.RandomAccessFile.writeByt
Hi Erick,
Thanks for the suggestion of using 2 indexes. The number of documents is small
- about 2000, and it builds quickly - about 3s from a database. I am currently
trying to rebuild every 2 minutes, but could probably reduce that to 5. That
could be as long as 10 minutes, but that's ab
Since you say you are sorting on a field the bulk of the time will be
doing the sort and caching it (FieldCache). Subsequent searches use that
cache to avoid paying the full sort cost again. If you where doing
relevancy sorting you would not experience such a big delay.
- Mark
Bryan Dotzour w
On 12/21/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
I *think* it's really max 2X even with compound file (if no readers)?
Because, in IndexWriter.mergeSegments we:
1. Create the newly merged segment in non-compound format (brings us
up to 2X, when it's the last merge).
2. Co
Yonik Seeley wrote:
On 12/21/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
Harini Raghavan wrote:
> I am using lucene 1.9.1 for search functionality in my j2ee application
> using JBoss as app server. The lucene index directory size is almost
20G
> right now. There is a Quartz job that is
> Something like dd if=/path/to/index/foo.cfs of=/dev/null
Be careful not to mistaken with the 'of' argument of 'dd' - see
http://en.wikipedia.org/wiki/Dd_(Unix)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional command
Otis thanks for your suggestion, it seems to be working pretty well!
I'm just curious if you (or anyone else) could describe what is actually
happening during that initial query that ends up taking so much time.
We have several different indexes for different types of objects and
it's only this one
On 12/21/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
Harini Raghavan wrote:
> I am using lucene 1.9.1 for search functionality in my j2ee application
> using JBoss as app server. The lucene index directory size is almost 20G
> right now. There is a Quartz job that is adding data to the inde
Hi,
On Thu, 2006-12-21 at 10:21 -0800, Otis Gospodnetic wrote:
> Something like dd if=/path/to/index/foo.cfs of=/dev/null
> Basically, force the data through the kernel preemptively, so FS caches it.
> Run vmstat while doing it, and if the index hasn't been cached by the FS,
> you should see a spi
Harini Raghavan wrote:
I am using lucene 1.9.1 for search functionality in my j2ee application
using JBoss as app server. The lucene index directory size is almost 20G
right now. There is a Quartz job that is adding data to the index evey
min and around 2 documents get added to the index e
Bogan,
Something like dd if=/path/to/index/foo.cfs of=/dev/null
Basically, force the data through the kernel preemptively, so FS caches it.
Run vmstat while doing it, and if the index hasn't been cached by the FS, you
should see a spike in IO activity while dd is running.
Otis
- Original Me
This is a different OOM error. This one is due to Java heap space.
I've tried using Otis' suggestion and use the latest nightly build (I've
actually tried using 12/19/2006 and 12/20/2006)... but I am still
getting this OOS: Java heap space error. I will try to profile this app
to see if I can get
I've found that merging a 20G directory into another 20G directory on
another disk required the target disk to have > 50G available during the
merge. I ran out of space on my ~70G disk for the merge and had to do it on
another system with ~170G available, but I'm not sure how much was used
transien
When Lucene optimizes the Index (which it semi does naturally as the
index grows) it creates a copy of the index, so you can expect the space
requirements for an index to be double the index at an absolute minimum.
If you are adding 20,000 docs a day and working with an index that is
already 20
hello,
with a SpanFirstQuery I want to realize a "starts with" search -
that seems to work fine. But I have the Problem that I have documents
with multiple titles and I thought I can do a sfq-search for each tiltle
by adding multiple instances for the specific field:
fo
Hi All,
I am using lucene 1.9.1 for search functionality in my j2ee application
using JBoss as app server. The lucene index directory size is almost 20G
right now. There is a Quartz job that is adding data to the index evey
min and around 2 documents get added to the index every day.When t
On Dec 21, 2006, at 10:49 AM, wawa wrote:
Thanks.. but how do I know whether the filed is tokenized or not?
Look at how you indexed "operatingName".
operatingName field contains name of stores. other fields contains
a single
word or numbers. Those are ok. But this filed contains words.
Thanks.. but how do I know whether the filed is tokenized or not?
I just used code to sort below:
Query query =QueryParser.parse(contents, title, new StandardAnalyzer());
booleanQuery.add(query, true, false);
hits= searcher.search(booleanQuery,new Sort("operatingName"));
..
operat
Thie is probably due to you sorting by a tokenized field. Be sure
you are sorting on an untokenized field!
Erik
On Dec 21, 2006, at 10:00 AM, wawa wrote:
I have some problem to sort words. Somehow it sorts in strange
way. sort
result is below:
...
BILLIARD & CAFE
BIZIM CAFE
BO
I have some problem to sort words. Somehow it sorts in strange way. sort
result is below:
...
BILLIARD & CAFE
BIZIM CAFE
BOLSA CAFE BIDA
BONAMICO CAFE
BONESSIMO CAFE
CAFE BAR AZZURRI
A BICA CAFE
ATRIUM CAFE
CAFE 668
THE APPLE CAFE
.
Is there any way to sort properly?
--
View this messag
On Dec 13, 2006, at 7:24 PM, Andrew Hughes wrote:
I realize that I'm posting LOTS of complicated questions and I
am probably just looking at the equivalent of a HTML indexing/
search implementation.
(sorry for the delay)
I'm doing something sorta relational in my Collex project - http:/
Hi:
Yesterday, I uploaded a new version of the Oracle/Lucene integration
using BLOB as storage for the inverted index and the Oracle JVM for
running the Lucene framework inside the Oracle Database, see it at the
Jira:
http://issues.apache.org/jira/browse/LUCENE-724
This new version includes a fu
On Thursday 21 December 2006 10:55, Martin Braun wrote:
> and in my case I have some documents
> which have same values in many fields (=>same score) and the only
> difference is the year.
Andrzej's response sounds like a good solution, so just for completeness:
you can sort by more than one cri
Martin Braun wrote:
Hi Daniel,
so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should
get a boost of 1.1975 .
The boost is stored with a limited resolution. Try boosting one doc by 10,
the other one by 20 or something like that.
You're right. I thought that w
On Thursday 21 December 2006 10:56, Deepan wrote:
> I am bothered about security problems with lucene. Is it vulnerable to
> any kind of injection like mysql injection? many times the query from
> user is passed to lucene for search without validating.
This is only an issue if your index has perm
On Thu, 2006-12-21 at 05:04 -0500, Erik Hatcher wrote:
> On Dec 21, 2006, at 4:56 AM, Deepan wrote:
> > I am bothered about security problems with lucene. Is it vulnerable to
> > any kind of injection like mysql injection? many times the query from
> > user is passed to lucene for search without va
On Dec 21, 2006, at 4:56 AM, Deepan wrote:
I am bothered about security problems with lucene. Is it vulnerable to
any kind of injection like mysql injection? many times the query from
user is passed to lucene for search without validating.
Rest easy. There are no known security issues with Lu
I am bothered about security problems with lucene. Is it vulnerable to
any kind of injection like mysql injection? many times the query from
user is passed to lucene for search without validating.
--
---
Regards
Deepan Chakravarthy N
http://www.codeshe
Hello!
I'm programming a small search engine using apache lucene.
While indexing I've noticed that the menu has to be removed from the
index, because it influences the search result (searching terms that are in the
menu gives all pages of the web directory as result).
Now, I wan't to put the men
Hi Daniel,
>> so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should
>> get a boost of 1.1975 .
>
> The boost is stored with a limited resolution. Try boosting one doc by 10,
> the other one by 20 or something like that.
You're right. I thought that with the float values the r
Otis,
I am not familiar with the 'dd trick' to warm up the index. Can you please
explain it ?
Bogdan
On 12/20/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
To populate FieldCache, the number of matches doesn't matter. There is no
need to be scrimy there - you don't really save anything by
33 matches
Mail list logo