Re: lucene and maven2

2006-06-29 Thread Otis Gospodnetic
I put a new Jar there. Plese send email if things still don't work with Maven. -bash-2.05b$ ls -al total 434 drwxrwxr-x 2 martinc apcvs 512 Jun 29 22:43 . drwxrwxr-x 4 martinc apcvs 512 Jun 11 09:01 .. -rw-rw-r-- 1 martinc apcvs2336 Jun 11 09:01 lucene-core-2.0.0.jar -rw-r--r--

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
Ya you are correct. My idea will not work when there are lots of documents in the index and also there are lots of hits for that page. I am going with you :-) Thanx... On 6/29/06, James Pine <[EMAIL PROTECTED]> wrote: Hey, I'm not a performance guru, but it seems to me that if you've got

回复: RE: HTML text extraction

2006-06-29 Thread 田春峰
hi your attachement is empty, have no java source code in it. Liao Xuefeng <[EMAIL PROTECTED]> 写道: hi, all, I wrote my own html parser because it just meets my require and do not depend on 3rd part's lib. and i'd like to share it (in attachment). This class provides some static methods to

HitCollector and Sort Objects

2006-06-29 Thread James Pine
Hey, I've looked at the documentation for: org.apache.lucene.search.Searchable org.apache.lucene.search.Searcher org.apache.lucene.search.IndexSearcher and it struck me that there are no search methods with these signatures: void search(Query query, Filter filter, HitCollector results, Sort sor

RE: Lock File

2006-06-29 Thread Wang, Jeff
I have a clustered environment, with a load-balancer in the front assigning connections. Is it better to have one of the cluster running a searcher as a webservice (to be accessed by the other machines in the cluster) or to have a IndexReader/Searcher for each machine in the cluster? Jeff -O

Re: Lucene Dynamic http Web Page Search

2006-06-29 Thread Clive.
Thanks' for the promped reply I will look for something similar for the dot net version, I posted in this group as it is more active. -- View this message in context: http://www.nabble.com/Lucene-Dynamic-http-Web-Page-Search-tf1867987.html#a5111083 Sent from the Lucene - Java Users forum at Nabb

Re: Lock File

2006-06-29 Thread Michael McCandless
What are the conditions that cause corruption? If there is just one writer and multiple readers, is that safe? The cases are well spelled out in Lucene in Action, section 2.9. Generally, one writer and multiple readers is not safe for disabling locking. For example, the IndexReader, when

Re: Limiting Result-Count

2006-06-29 Thread Andrzej Bialecki
Otis Gospodnetic wrote: Try using HitCollector and break out of it when you collect enough documents. My guess is that if you are not doing anything crazy with Hits (like looping through the all) this won't be that much faster than using Hits. Well, in practice it does help - see the way

Re: Lock File

2006-06-29 Thread joe kim
Lucene uses this lock to ensure the index does not become corrupt when IndexReaders and IndexWriters are working on the same index. What are the conditions that cause corruption? If there is just one writer and multiple readers, is that safe? ---

Re: Limiting Result-Count

2006-06-29 Thread Otis Gospodnetic
Try using HitCollector and break out of it when you collect enough documents. My guess is that if you are not doing anything crazy with Hits (like looping through the all) this won't be that much faster than using Hits. Otis - Original Message From: Dominik Bruhn <[EMAIL PROTECTED]> T

Re: Lock File

2006-06-29 Thread Michael McCandless
> When I create an index withe the class IndexModifier in Lucene 1.9.1there is a lock file created on a temp folder. > My question is: Is it possible to disable this option? > If yes how to procede? Yes, there is. You can call the static FSDirectory.setDisabledLocks() to disable locking enti

Re: HTML text extraction

2006-06-29 Thread MALCOLM CLARK
Hi, Would you please send me your parser too? Thanks! Malcolm - Original Message From: Liao Xuefeng <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, June 23, 2006 12:54:29 AM Subject: RE: HTML text extraction hi, all, I wrote my own html parser because it just meets

Re: Lucene Dynamic http Web Page Search

2006-06-29 Thread joe kim
Hi Clive, Lucene is a general purpose search engine. If you need crawling capabilities on top of Lucene take a look at Nutch: http://lucene.apache.org/nutch/ On 6/29/06, Clive. <[EMAIL PROTECTED]> wrote: Hi, I am working on adding a search feature to a web site that uses single database dri

Lock File

2006-06-29 Thread WATHELET Thomas
When I create an index withe the class IndexModifier in Lucene 1.9.1there is a lock file created on a temp folder. My question is: Is it possible to disable this option? If yes how to procede?

Limiting Result-Count

2006-06-29 Thread Dominik Bruhn
Hy, how can I limit the result-count of a query in order to save time? I searched the web but didn't find a solution. Thanks Dominik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Carrot 2 with lucene prototype!!!

2006-06-29 Thread arun sharma\(rinku\)
Hello gentlemen, I am novice to lucene and carrot 2 but I have urgent requirement for building a prototype using lucene and carrot2. Please help me with working web application demo along with code. Thanks Arun - Sneak p

Re: Searching is taking a lot...

2006-06-29 Thread James Pine
Hey, I'm not a performance guru, but it seems to me that if you've got millions of results coming back then you probably don't want to call ArrayList.add() each time, as it will have to grow itself a bunch of times. Also, even ints take up space in memory, so if you only need 20 of them, then stor

Lucene Dynamic http Web Page Search

2006-06-29 Thread Clive.
Hi, I am working on adding a search feature to a web site that uses single database driven aspx pages and would like to know if Lucene can search using the http url address or database to index from. As current I can only see Lucene being able to search physical files in a windows folder. Any

Re: How to Integrate the WordNet Synonym Index with my Index

2006-06-29 Thread Aleksander M. Stensby
What about the scoring worries you? I would say this is the best approach, and also the suggest approach over at the lucene wordnet page: http://www.tropo.com/techno/java/lucene/wordnet.html Of course, you could say that matches on the original search should return with a higher score. You

Re: How to Integrate the WordNet Synonym Index with my Index

2006-06-29 Thread Ramesh Salla
Yes, That is a good idea and thanks for the suggestion. But isn't that painful? Then the scoring really worries us. Hence, will have to prefer boosting the original content? Can find or suggest a better solution? Thanks... Ramesh.S On Thu, 2006-06-29 at 16:10 +0200, Aleksander M. Stensby w

Re: How to Integrate the WordNet Synonym Index with my Index

2006-06-29 Thread Aleksander M. Stensby
No... Don't think thats the idea. I think that u would make use of the wordnet index after a user have inputted a search. U take each term of the search, look up those terms in the wordnet index, then use the results you get to search your index for all those aggregated terms along with the

Test collection for digital libraries with Lucene

2006-06-29 Thread Trung
Hi everybody, I'm searching a test collection for an academic digital library (with relevant/judgement file like TREC collections). Requirement: documents are scientific articles, with full references. I've heard about collection of INEX with scientific articles from IEEE journals. Are there an

How to Integrate the WordNet Synonym Index with my Index

2006-06-29 Thread Ramesh Salla
Hi, seems like am awe struck. My Index is working fine. Now, have got the WordNet synonym-index. How do I make use of this index to get synonym support search results.? Do I have to Merge these 2 indexes using the Merge class? will that work? or Do I have to inject the field "word" values

Re: question

2006-06-29 Thread Aleksander M. Stensby
If your database table looks like this: ID - Content - Subject - Author you get the fields from you db and assumably store them in some bean, or directly in strings like this; String id, content, subject, author. you can create a lucene document in this fashion: final Document doc = new Doc

Re: question

2006-06-29 Thread amit_kkumar
hi martin, thing is that i am new to lucene and i am not sure how to use it the cnnection through jdbc and select stmt. are all done i just want to know that how can i create lucene document per row? if u provide some pseudo code kind of thing.. as in demo the indexing is done on files amit ku

Re: question

2006-06-29 Thread Martin Braun
[EMAIL PROTECTED] schrieb: > hi, > > my problem is that i am using mysql db in which one table is > present and i want index each row in the table and then search > > plz reply > > how this can be done? http://wiki.apache.org/jakarta-lucene/LuceneFAQ How can I use Lucene to index a database? Co

Re: Lucene indexing RDF

2006-06-29 Thread adasal
Hi Chris, I find this incredibly interesting! Thank you for your full explanation. I was aware of the components, but not the implementation. ... to provide a means to query both document full-text and metadata using an RDF model Is there any thing I can read about how you have some to this ap

question

2006-06-29 Thread amit_kkumar
hi, my problem is that i am using mysql db in which one table is present and i want index each row in the table and then search plz reply how this can be done? amit kumar DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent

AW: MemoryUsage of sorting

2006-06-29 Thread Kroehling, Thomas
That is exactly what I did when I started to realize the effects of using Lucene sorting with millions of documents in the index. I used STORED fields and sorted the results with a generic Comparator, which is configured for a field and a search order. I only do this if the query did not return

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
perhaps that's not what you ment, perhaps you aren't iterating over any results, in which case using a HitCOllector instead isn't neccessary going to bring that 17sec down. As i told earlier that for the same query minimum time is 2-3 sec and this time is after several attempt(so i think upto th

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
This will break performance. It is better to first collect all the document numbers (code without the proper declarations): public void collect(int id, float score) { if(docCount >= startDoc && docCount < endDoc) { docNrs.add(id); // or use int[] docNrs when possible. Why

Re: Searching is taking a lot...

2006-06-29 Thread Paul Elschot
On Thursday 29 June 2006 06:17, James Pine wrote: > A HitCollector object invokes its collect method on > every document which matches the query/filter > submitted to the Searcher.search method. I think all > you would need to do is pass in the page number and > results per page to your HitCollecto