indexing going wrong

2007-08-10 Thread nachi
all, No sure if earlier mail went thru..so resending... Im new lucene and Im trying to develope a textual search module. I have written the following code ( this is research code) - File dir = new File("c:/test"); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true); Doc

Reading Existing index

2007-08-10 Thread Aleesh
Hi All, Need your help regarding reading existing index. Actually I am trying to read an existing index ans just wanted to know, is there a way to identify type of 'Analyzer' which was used at the index creation time? thanks in advanced, Neeraj -- View this message in context: http://www.nabb

Re: StandardAnalyzer vs KeywordAnalyzer in Luke

2007-08-10 Thread Kai_testing Middleton
The nutch analyzer is NutchDocumentAnalyzer. Does anyone know how to add this to the Luke classpath? I tried this kind of thing but it didn't work: note that the last line is java -jar lukeall-0.7.1.jar export CLASSPATH=$NUTCH_HOME/lib/jetty-ext/ant.jar export CLASSPATH=$CLASSPATH:$NUTCH_H

Re: scorers and filters

2007-08-10 Thread Paul Elschot
On Friday 10 August 2007 20:27, Yonik Seeley wrote: > On 8/10/07, John Wang <[EMAIL PROTECTED]> wrote: > > Hi Lucene Gurus: > > > > More of a performance question: > > > > When you pass a Filter to a searcher to do a search, the searcher is > > basically doing the full search and then intersect

Re: scorers and filters

2007-08-10 Thread Yonik Seeley
On 8/10/07, John Wang <[EMAIL PROTECTED]> wrote: > Hi Lucene Gurus: > > More of a performance question: > > When you pass a Filter to a searcher to do a search, the searcher is > basically doing the full search and then intersect against the bitset given > by the filter. This seems wasteful whe

scorers and filters

2007-08-10 Thread John Wang
Hi Lucene Gurus: More of a performance question: When you pass a Filter to a searcher to do a search, the searcher is basically doing the full search and then intersect against the bitset given by the filter. This seems wasteful when there are lotsa hits returned by the scorer and filter only

Re: Nested Fields

2007-08-10 Thread Spencer Tickner
Hi everyone, Thanks so much for the responses. So far I use an xslt identity transform with templates to match the fields I want to capture that turns the xml document into a flat text file with duplicated text for nested fields. The compass framework looks really interesting though, I'll spend so

Re: frequent phrases

2007-08-10 Thread Mathieu Lecarme
some tools exist for finding duplicated parts in document. You split document in phrase, and build ngram with word. If you wont complete phrase, work with all words, for a partial, work with 5 words ngram, for example. ngram list is convert to hash, and hash is used as an indexed Field for t

Re: Lucene in large database contexts

2007-08-10 Thread Chris Lu
Hi, Antonello, I think you should try DBSight. Although it's a Java implementation, you don't need to worry about java coding at all. Just point the connection string to a database, and specify by you SQL, then you will have scheduling, incremental indexing, recreating indexes, sync with deleted r

Re: Lucene in large database contexts

2007-08-10 Thread Steven Rowe
Hi Antonello, Antonello Provenzano wrote: > I've been working for a while on the implementation of a website > oriented to contents that would contain millions of entries, most of > them indexable (such as descriptions, texts, names, etc.). > The ideal solution to make them searchable would be to

Re: 答复: 答复: Lucene in large database contexts

2007-08-10 Thread Askar Zaidi
Hey Guys, I am trying to do something similar. Make the content search-able as soon as it is added to the website. The way it can work in my scenario is that , I create the Index for a every new user account created. Then, whenever a new document is uploaded, its contents are added to the users I

Re: 答复: 答复: Lucene in large database contexts

2007-08-10 Thread Erick Erickson
Well, closing/opening an index is MUCH less expensive than rebuilding the whole thing, so I don't understand part of your statements It *may* (but I haven't tried it) be possible to flush the writer rather than close/open it. But, you MUST close/reopen the reader you search with even if flush

Re: formalizing a query

2007-08-10 Thread Erick Erickson
I *strongly* suggest you get a copy of Luke. It'll allow you to form queries and see the results and you can then answer this kind of question as well as many others. Meanwhile, please see http://lucene.apache.org/java/docs/queryparsersyntax.html Erick On 8/10/07, Abu Abdulla alhanbali <[EMAIL P

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Enis, Thanks for your time. I gave a quick glance at Pig and it seems good (seems it is directly based on Hadoop which I am starting to play with :-). It obvious that a huge amount of data (like user queries or access logs) should be stored in flat files which makes it convenient for further analy

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Enis Soztutar
Lukas Vlcek wrote: Hi Enis, Hi again, On 8/10/07, Enis Soztutar <[EMAIL PROTECTED]> wrote: Hi, Lukas Vlcek wrote: Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods

Re: Update boost factor for indexed document using setBoost()

2007-08-10 Thread rohit saini
thanks koji but can u give me a example..it will be so nice of you... thanks & regards Rohit On 8/10/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > > Or, you can use FieldNormModifier class to modify existing fieldNorm: > > $ java org.apache.lucene.index.FieldNormModifier path-to-index > yo

Re: Update boost factor for indexed document using setBoost()

2007-08-10 Thread Koji Sekiguchi
Or, you can use FieldNormModifier class to modify existing fieldNorm: $ java org.apache.lucene.index.FieldNormModifier path-to-index your.Similarity field1 [field2 ...] To do this, you have to write your own Similarity class to adjust boost, via lengthNorm(). Thank you, Koji rohit saini wrote

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Hi Enis, On 8/10/07, Enis Soztutar <[EMAIL PROTECTED]> wrote: > > Hi, > > Lukas Vlcek wrote: > > Hi, > > > > I would like to keep user search history data and I am looking for some > > ideas/advices/recommendations. In general I would like to talk about > methods > > of storing such data, its stru

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Enis Soztutar
Hi, Lukas Vlcek wrote: Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods of storing such data, its structure and how to turn it into valuable information. As for the structure: ==

Re: Update boost factor for indexed document using setBoost()

2007-08-10 Thread rohit saini
Thanks a lot Grant I have been trying to do socould u please send me example of doing the way u r talking . Again thanks so much... Regards, Rohit On 8/10/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > You can't. You have to delete and reindex the document with the new > boost. > >

Re: Update boost factor for indexed document using setBoost()

2007-08-10 Thread Grant Ingersoll
You can't. You have to delete and reindex the document with the new boost. On Aug 9, 2007, at 11:59 PM, rohit saini wrote: Hi, could u pl. tell me how to update boost factor of already indexed document using setBoost. Thanks & regards, Rohit -- VANDE - MATRAM -

Re: 答复: 答复: Lucene in large database contexts

2007-08-10 Thread Antonello Provenzano
Kai, The context I'm going to work with requires a continuous addition of documents to the indexes, since it's user-driven content, and this would require the content to be always up-to-date. This is the problem I'm facing, since I cannot rebuild a 1Gb (at least) index every time a user inserts a

答复: 答复: Lucene in large database contexts

2007-08-10 Thread Kai Hu
Antonello, You are right,I think lucene indexsearcher will search the old information if IndexWriter was not closed(I think lucene release the Lock here),so I only add a few documents every time from buffer to implement index "real time". kai 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROT

Re: 答复: Lucene in large database contexts

2007-08-10 Thread Antonello Provenzano
Kai, Thanks. The problem I see it's that although I can add a Document through IndexWriter or IndexModifier, this won't be searchable until the index is closed and, possibly, optimized, since the score of the document in the index context must be re-calculated on the basis of the whole context. I

Re: Lucene in large database contexts

2007-08-10 Thread Antonello Provenzano
Lukas, Thanks for the fast and clarifying response. Something I haven't specified is I'm working on Mono, that means I use Lucene.Net version. The reason I'm posting to the Java User list is that this is a more active one and with people who could have had my same requirements. Since Lucene and L

答复: Lucene in large database contexts

2007-08-10 Thread Kai Hu
Hi, Antonello You can use IndexWriter.addDocument(Document document) to add single document,same to update,delete operation. kai -邮件原件- 发件人: Antonello Provenzano [mailto:[EMAIL PROTECTED] 发送时间: 2007年8月10日 星期五 17:09 收件人: java-user@lucene.apache.org 主题: Lucene in large database co

Re: Lucene in large database contexts

2007-08-10 Thread Lukas Vlcek
Also you can look at Hibernate Search . BR Lukas On 8/10/07, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > > Hi, > did you have a chance to look at > Compass? > It can do exactly what you want. > Lukas > > On 8/10/07, Antonello Proven

Re: Lucene in large database contexts

2007-08-10 Thread Lukas Vlcek
Hi, did you have a chance to look at Compass? It can do exactly what you want. Lukas On 8/10/07, Antonello Provenzano <[EMAIL PROTECTED]> wrote: > > Hi There! > > I've been working for a while on the implementation of a website > oriented to contents that woul

Lucene in large database contexts

2007-08-10 Thread Antonello Provenzano
Hi There! I've been working for a while on the implementation of a website oriented to contents that would contain millions of entries, most of them indexable (such as descriptions, texts, names, etc.). The ideal solution to make them searchable would be to use Lucene as index and search engine.

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Dmitry, My middle tier is Compass in this case (it is easy extensible so catching all events should be easy). Also it allows for index to be stored in DB as well. The main question is *what* is important to catch and *how* to get maximum information out of it later. Lukas On 8/10/07, Dmitry <[E

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Dmitry
Lucas, Probably one of the solution will be to use database - like my sql and setup Lucene against MySQL - in thi scase you don't need to think less concerning implementaiton based on the content sotrage. ALso you need to create middle tier to catch all event concerning Users Search / Hostory

How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods of storing such data, its structure and how to turn it into valuable information. As for the structure: == For now I don't have exac