Re: how to find out if two fields are identical?

2006-07-12 Thread Chris Hostetter
: Based on these three documents, I want the query to return the third : document where childID=parentID. the the best of my knowledge there is no easy way to do this using the existing lucene query types -- but it would be fairly easy to impliment. Since there are no "scoring" issues involved,

kbforge 2.10 released

2006-07-12 Thread kbforge1
kbforge.com is pleased to announce Release 2.10 of kbforge, a desktop search application of particular interest to people on the move, including software developers. kbforge is different from other desktop search applications because it creates a database you can carry with you practically anywher

how to find out if two fields are identical?

2006-07-12 Thread Van Nguyen
Is there a way to compare the values of two fields to see if they are the same? Let's say we have an index with these fields: ID:2 childID: 7 parentID: 0 ID:3 childID: 6 parentID: 5

RE: RangeQuery question?

2006-07-12 Thread Van Nguyen
Exactly what I was looking for. Thanks! -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 12, 2006 12:47 AM To: java-user@lucene.apache.org Subject: Re: RangeQuery question? 1) RangeQuery is the devil, don't use it. If I weren't so lazy I would c

Re: Lucene index database

2006-07-12 Thread Chris Lu
What Erick and Michael said are all correct, or the same. :) What Lucene can do is search data that stored into Document objects. Lucene is said to be able to search html, pdf, etc, but that's because those formats are relatively fixed. You can easily tell title, content, etc. With database, whi

Re: Reuse of IndexReader

2006-07-12 Thread Mark Miller
I have not seen an expert's comment on the previous code I linked to. It seems (to my young inexperienced eyes) to do an optimal job of providing realtime access to an index. Anyone else have some experience with this code? On 7/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: http://www.nabble.c

Re: modify existing non-indexed field

2006-07-12 Thread Doron Cohen
> I did clean everything but still getting the same problem. I'm using lucene > 2.0. Do you get the same problem on your machine? Please try with this code - http://cdoronc.20m.com/tmp/indexingThreads.zip Regards, Doron - To un

Re: Sort Cache

2006-07-12 Thread Chris Hostetter
: close to realtime as possible). Would it make any sense in trying to save : the sort cache, insert the new doc in that (whatever that entails, I don't : know), and then pass the sort cache to a new searcher? Or something along : those lines...? as crazy as this sounds -- it's even harder then y

Re: Reuse of IndexReader

2006-07-12 Thread Mark Miller
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049 A good implementation for what you need. - Mark On 7/12/06, Dominik Bruhn <[EMAIL PROTECTED]> wrote: Hy, thanks for your answers. Uppon creation of the Reader, does Lucene copy the whole Index into RAM? Or is

Re: Reuse of IndexReader

2006-07-12 Thread Dominik Bruhn
Hy, thanks for your answers. Uppon creation of the Reader, does Lucene copy the whole Index into RAM? Or is this cache filled while searching? How can I find out how long it takes to create the IndexReader? Just time to Create-Call? Thanks -- Dominik Bruhn mailto: [EMAIL PROTECTED] http://www.d

Re: Lucene index database

2006-07-12 Thread Erick Erickson
What Michael said :).

Re: Reuse of IndexReader

2006-07-12 Thread Erick Erickson
This is normal behavior. When you open a reader, it takes a snapshot of the index and uses that snapshot until it is closed, and any updates to the index in the meantime are invisible to that reader. You could periodically close and reopen the reader to get the latest data, it's not necessary to

Re: Lucene index database

2006-07-12 Thread Michael J. Prichard
Ha Erick, we must have sent our responses at the same time :) What Erick said :) Erick Erickson wrote: This has been extensively discussed in the mail archive, I think a search of the archive would help you a lot. The short form is no. There's nothing built into Lucene to help you index a

Re: Reuse of IndexReader

2006-07-12 Thread Erik Hatcher
On Jul 12, 2006, at 12:48 PM, Dominik Bruhn wrote: Hy, I got the following situation: A Servlet runing in Tomcat5. When starting the servlet up it automatically creates a IndexReader and stores it in a static variable. For searching this variable is used. When adding a document to the inde

Re: Lucene index database

2006-07-12 Thread Michael J. Prichard
Hey there Teresa. Short answer: Not directly. Long answer: Lucene is a set of libraries built for indexing text and then searching those indexes. Not sure what you mean by indexing a database per se. You could write some code to get the records you want from the database and then index tho

Re: Lucene index database

2006-07-12 Thread Erick Erickson
This has been extensively discussed in the mail archive, I think a search of the archive would help you a lot. The short form is no. There's nothing built into Lucene to help you index a database. How would you define that anyway? That said, you can write a program to extract data from the data

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Grant Ingersoll
I think Mark's idea is better for this. Although I seem to recall there being some caveats w/ multiple tokens at the same position, but I don't remember the details. I _think_ term vectors don't like it, so if you need them, you might have troubles. Perhaps a search of the mailing lists

Reuse of IndexReader

2006-07-12 Thread Dominik Bruhn
Hy, I got the following situation: A Servlet runing in Tomcat5. When starting the servlet up it automatically creates a IndexReader and stores it in a static variable. For searching this variable is used. When adding a document to the index, I create a IndexWriter, write the Document, and close

Sort Cache

2006-07-12 Thread Mark Miller
I am going to be working with a medium index of 200k to 1m documents. Occasionaly, there will be single document corrections applied to this index. I am worried about this action clearing my sort buffers. I saw the method of priming another searcher, but if you have a bunch of fields that may be s

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Amit Kumar
You are right. I saw your email after pressing send. Let me experiment. Thanks for the tip. Best, Amit On Jul 12, 2006, at 10:55 AM, mark harwood wrote: Appending POS to the terms will create post processing nightmare I think you may have missed the subtle distinction between Grant's su

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread mark harwood
>>Appending POS to the terms will create post processing nightmare I think you may have missed the subtle distinction between Grant's suggestion and mine. His suggestion was to append your POS info to the source token - creating a single token which combined both the original content and your P

Lucene index database

2006-07-12 Thread mcarcelen
Hi, Can Lucene index a database? PostgreSQL, Mysql, Access ? Thanks Cheers Teresa - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Erick Erickson
Sweet!

Re: combined filesystem and web search

2006-07-12 Thread Erick Erickson
I haven't used the multisearcher personally, so I'll let others chime in. And I know nothing about the IndexMergeTool, I've only seen the interface in the Lucene Javadoc. And I must say the documentation isn't real helpful :(. To add to an existing index, just instantiate the IndexWriter with the

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Amit Kumar
We need to be able to search by word and POS and also have POS available for each occurrence. Appending POS to the terms will create post processing nightmare to retrieve term frequencies right? (I would have to add all the foo_NN and foo_ADJ etc.). I can store the POS in a parallel field

Re: query for search through lucene for BLOB

2006-07-12 Thread Steven Rowe
Hi Sudarshan, When your question is Java usage related, you will almost certainly get better responses by asking just on the Java User list. Oddly enough, hitting all of the mailing lists for the project at once with the same question is likely to *reduce* your chances of getting polite/on-to

Re: modify existing non-indexed field

2006-07-12 Thread dan2000
I did clean everything but still getting the same problem. I'm using lucene 2.0. Do you get the same problem on your machine? -- View this message in context: http://www.nabble.com/modify-existing-non-indexed-field-tf1905726.html#a5288759 Sent from the Lucene - Java Users forum at Nabble.com.

PhraseQuery - retrieving the fieldname

2006-07-12 Thread Mile Rosu
Hello, A small problem this time: I would like to retrieve the field name of a PhraseQuery. Could you tell me please which is the best way for this ? Thank you, Mile Rosu - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread mark harwood
Could you not use a custom analyzer to inject "metadata" tokens into the index at the same position as the source tokens? For example, given the text: The cat jumped over the dog your analyzer could emit tokens: [the] [cat,_posNoun] [jumped,_posVerb] [over]

RE: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Mike Streeton
The simplest solution is always the best - when storing the page, do not break up sentences. So a page will be all the sentences that occur on it. If a sentence starts on one page and finishes on the next it will be included in both pages in the index. Hope this helps Mike www.ardentia.com the h

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Grant Ingersoll
Hi Amit, This is definitely something you can do. What are your goals for it? Do you want to search by word and POS or do you just want POS available for post processing? You could just append the POS tag onto the end of your token as it gets indexed, something like foo_NN or foo_ADJ.

query for search through lucene for BLOB

2006-07-12 Thread sudarshan angirash
hi all i have some PDF files stored in Oracle 9i as BLOB. now i want to search for a string in those pdf files using Lucene. then i want to show the selected PDF files which contains The String. if you can give me any pointers about how to do it, then it will be a gr8 help for me. regards sudar

Re: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Mile Rosu
Hello Erick, I have been trying on Google Books some scenarios and apparently found a Google bug ... It looks like they use number 2 approach, as this query illustrates it. http://books.google.com/books?vid=ISBN1564968316&id=14Xx2T8tmMYC&pg=PA8&lpg=PA8&dq=%2B%22the+site+is+unburdened%22&sig=QR

Re: RangeQuery question?

2006-07-12 Thread Chris Hostetter
1) RangeQuery is the devil, don't use it. If I weren't so lazy I would change the javadocs for RangeQuery so that sentence was the class summary. Takes a look at RangeFilter or ConstantScoreRangeQuery. 2) it's not clear what exactly you want your example to mean ... perhaps you mean you want to

Re: Missing fields used for a sort

2006-07-12 Thread Chris Hostetter
: > I can't thank you enough, Yonik :-) : > : : send money . Bah! ... there's lots of money in the world, they print more and more of it every day. Quality Patches ... now there's something I bet Yonik would *really* appreciate! :) -Hoss