Re: spnafirstquery and multiple field instances

2006-12-21 Thread Chris Hostetter
: for (String key : title.getTitel().split("\\n") ) { : titleDocument.add(new Field("TI", key, Field.Store.NO, : Field.Index.TOKENIZED)); : } that adds each new title one after the ot

RE: Rebuilding index on a regular basis

2006-12-21 Thread Adam Fleming
Hi Patrik Thanks for the thoughtful responses. I am not a pro with Searchers yet, but it seems like closing + opening searchers would still result in a small period of unserviceability. I would also like to stick to the Directory API so that I can keep the option to use FS or RAM based index

IOException - The handle is invalid

2006-12-21 Thread Antony Bowesman
Hi, I'm running load tests with Lucene 2.0, SUN's JDK 6 on Windows XP2, dual core CPU. I have 8 worker threads adding a few hundred K documents, split between two Lucene indexes, I've started getting java.io.IOException: The handle is invalid in places like java.io.RandomAccessFile.writeByt

RE: Rebuilding index on a regular basis

2006-12-21 Thread Adam Fleming
Hi Erick, Thanks for the suggestion of using 2 indexes. The number of documents is small - about 2000, and it builds quickly - about 3s from a database. I am currently trying to rebuild every 2 minutes, but could probably reduce that to 5. That could be as long as 10 minutes, but that's ab

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Mark Miller
Since you say you are sorting on a field the bulk of the time will be doing the sort and caching it (FieldCache). Subsequent searches use that cache to avoid paying the full sort cost again. If you where doing relevancy sorting you would not experience such a big delay. - Mark Bryan Dotzour w

Re: Merge Index Filling up Disk Space

2006-12-21 Thread Yonik Seeley
On 12/21/06, Michael McCandless <[EMAIL PROTECTED]> wrote: I *think* it's really max 2X even with compound file (if no readers)? Because, in IndexWriter.mergeSegments we: 1. Create the newly merged segment in non-compound format (brings us up to 2X, when it's the last merge). 2. Co

Re: Merge Index Filling up Disk Space

2006-12-21 Thread Michael McCandless
Yonik Seeley wrote: On 12/21/06, Michael McCandless <[EMAIL PROTECTED]> wrote: Harini Raghavan wrote: > I am using lucene 1.9.1 for search functionality in my j2ee application > using JBoss as app server. The lucene index directory size is almost 20G > right now. There is a Quartz job that is

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Doron Cohen
> Something like dd if=/path/to/index/foo.cfs of=/dev/null Be careful not to mistaken with the 'of' argument of 'dd' - see http://en.wikipedia.org/wiki/Dd_(Unix) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional command

RE: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Bryan Dotzour
Otis thanks for your suggestion, it seems to be working pretty well! I'm just curious if you (or anyone else) could describe what is actually happening during that initial query that ends up taking so much time. We have several different indexes for different types of objects and it's only this one

Re: Merge Index Filling up Disk Space

2006-12-21 Thread Yonik Seeley
On 12/21/06, Michael McCandless <[EMAIL PROTECTED]> wrote: Harini Raghavan wrote: > I am using lucene 1.9.1 for search functionality in my j2ee application > using JBoss as app server. The lucene index directory size is almost 20G > right now. There is a Quartz job that is adding data to the inde

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Joe Shaw
Hi, On Thu, 2006-12-21 at 10:21 -0800, Otis Gospodnetic wrote: > Something like dd if=/path/to/index/foo.cfs of=/dev/null > Basically, force the data through the kernel preemptively, so FS caches it. > Run vmstat while doing it, and if the index hasn't been cached by the FS, > you should see a spi

Re: Merge Index Filling up Disk Space

2006-12-21 Thread Michael McCandless
Harini Raghavan wrote: I am using lucene 1.9.1 for search functionality in my j2ee application using JBoss as app server. The lucene index directory size is almost 20G right now. There is a Quartz job that is adding data to the index evey min and around 2 documents get added to the index e

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Otis Gospodnetic
Bogan, Something like dd if=/path/to/index/foo.cfs of=/dev/null Basically, force the data through the kernel preemptively, so FS caches it. Run vmstat while doing it, and if the index hasn't been cached by the FS, you should see a spike in IO activity while dd is running. Otis - Original Me

RE: JAVA JVM Question

2006-12-21 Thread Van Nguyen
This is a different OOM error. This one is due to Java heap space. I've tried using Otis' suggestion and use the latest nightly build (I've actually tried using 12/19/2006 and 12/20/2006)... but I am still getting this OOS: Java heap space error. I will try to profile this app to see if I can get

RE: Merge Index Filling up Disk Space

2006-12-21 Thread Rob Staveley (Tom)
I've found that merging a 20G directory into another 20G directory on another disk required the target disk to have > 50G available during the merge. I ran out of space on my ~70G disk for the merge and had to do it on another system with ~170G available, but I'm not sure how much was used transien

Re: Merge Index Filling up Disk Space

2006-12-21 Thread Mark Miller
When Lucene optimizes the Index (which it semi does naturally as the index grows) it creates a copy of the index, so you can expect the space requirements for an index to be double the index at an absolute minimum. If you are adding 20,000 docs a day and working with an index that is already 20

spnafirstquery and multiple field instances

2006-12-21 Thread Martin Braun
hello, with a SpanFirstQuery I want to realize a "starts with" search - that seems to work fine. But I have the Problem that I have documents with multiple titles and I thought I can do a sfq-search for each tiltle by adding multiple instances for the specific field: fo

Merge Index Filling up Disk Space

2006-12-21 Thread Harini Raghavan
Hi All, I am using lucene 1.9.1 for search functionality in my j2ee application using JBoss as app server. The lucene index directory size is almost 20G right now. There is a Quartz job that is adding data to the index evey min and around 2 documents get added to the index every day.When t

Re: Sorting words

2006-12-21 Thread Erik Hatcher
On Dec 21, 2006, at 10:49 AM, wawa wrote: Thanks.. but how do I know whether the filed is tokenized or not? Look at how you indexed "operatingName". operatingName field contains name of stores. other fields contains a single word or numbers. Those are ok. But this filed contains words.

Re: Sorting words

2006-12-21 Thread wawa
Thanks.. but how do I know whether the filed is tokenized or not? I just used code to sort below: Query query =QueryParser.parse(contents, title, new StandardAnalyzer()); booleanQuery.add(query, true, false); hits= searcher.search(booleanQuery,new Sort("operatingName")); .. operat

Re: Sorting words

2006-12-21 Thread Erik Hatcher
Thie is probably due to you sorting by a tokenized field. Be sure you are sorting on an untokenized field! Erik On Dec 21, 2006, at 10:00 AM, wawa wrote: I have some problem to sort words. Somehow it sorts in strange way. sort result is below: ... BILLIARD & CAFE BIZIM CAFE BO

Sorting words

2006-12-21 Thread wawa
I have some problem to sort words. Somehow it sorts in strange way. sort result is below: ... BILLIARD & CAFE BIZIM CAFE BOLSA CAFE BIDA BONAMICO CAFE BONESSIMO CAFE CAFE BAR AZZURRI A BICA CAFE ATRIUM CAFE CAFE 668 THE APPLE CAFE . Is there any way to sort properly? -- View this messag

Re: Advice on 3NF Data Structures and Lucene Please

2006-12-21 Thread Erik Hatcher
On Dec 13, 2006, at 7:24 PM, Andrew Hughes wrote: I realize that I'm posting LOTS of complicated questions and I am probably just looking at the equivalent of a HTML indexing/ search implementation. (sorry for the delay) I'm doing something sorta relational in my Collex project - http:/

Oracle/Lucene integration -status-

2006-12-21 Thread Marcelo Ochoa
Hi: Yesterday, I uploaded a new version of the Oracle/Lucene integration using BLOB as storage for the inverted index and the Oracle JVM for running the Lucene framework inside the Oracle Database, see it at the Jira: http://issues.apache.org/jira/browse/LUCENE-724 This new version includes a fu

Re: boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Daniel Naber
On Thursday 21 December 2006 10:55, Martin Braun wrote: > and in my case I have some documents > which have same values in many fields (=>same score) and the only > difference is the year. Andrzej's response sounds like a good solution, so just for completeness: you can sort by more than one cri

Re: boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Andrzej Bialecki
Martin Braun wrote: Hi Daniel, so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should get a boost of 1.1975 . The boost is stored with a limited resolution. Try boosting one doc by 10, the other one by 20 or something like that. You're right. I thought that w

Re: lucene injection

2006-12-21 Thread Daniel Naber
On Thursday 21 December 2006 10:56, Deepan wrote: > I am bothered about security problems with lucene. Is it vulnerable to > any kind of injection like mysql injection? many times the query from > user is passed to lucene for search without validating. This is only an issue if your index has perm

Re: lucene injection

2006-12-21 Thread Deepan
On Thu, 2006-12-21 at 05:04 -0500, Erik Hatcher wrote: > On Dec 21, 2006, at 4:56 AM, Deepan wrote: > > I am bothered about security problems with lucene. Is it vulnerable to > > any kind of injection like mysql injection? many times the query from > > user is passed to lucene for search without va

Re: lucene injection

2006-12-21 Thread Erik Hatcher
On Dec 21, 2006, at 4:56 AM, Deepan wrote: I am bothered about security problems with lucene. Is it vulnerable to any kind of injection like mysql injection? many times the query from user is passed to lucene for search without validating. Rest easy. There are no known security issues with Lu

lucene injection

2006-12-21 Thread Deepan
I am bothered about security problems with lucene. Is it vulnerable to any kind of injection like mysql injection? many times the query from user is passed to lucene for search without validating. -- --- Regards Deepan Chakravarthy N http://www.codeshe

How to skip menu structure while parsing HTML sites?

2006-12-21 Thread Jan Francsi
Hello! I'm programming a small search engine using apache lucene. While indexing I've noticed that the menu has to be removed from the index, because it influences the search result (searching terms that are in the menu gives all pages of the web directory as result). Now, I wan't to put the men

boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Martin Braun
Hi Daniel, >> so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should >> get a boost of 1.1975 . > > The boost is stored with a limited resolution. Try boosting one doc by 10, > the other one by 20 or something like that. You're right. I thought that with the float values the r

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Bogdan Ghidireac
Otis, I am not familiar with the 'dd trick' to warm up the index. Can you please explain it ? Bogdan On 12/20/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: To populate FieldCache, the number of matches doesn't matter. There is no need to be scrimy there - you don't really save anything by