Similarity
Hi All, I'm new to lucene and a have some questions according to the entire system. I) What is exactly written to the index? Is the index just an inverted list? Is there term weight scoring stored? II) How works the retrieval process work? I guess so: 1) Get all the documents from the index via the inverted list. 2) Compute the score for every document and the query with the similarity class. As far as i can see, the similarity is just based on the tf-ddf weighting? Is there no cosine measure or so used, to compare the document and the query vector? Thanks a lot Klaus
AW: Lucene parsing for PDF
Hi, I think the easiest way is ro exclude the pages while you are parsing the pdf document. So you will provide just the necessary pages to lucene. Another solution is to create for each site an own document, this should hafe a field "pagenumber" or, und you can delete the document from the index. Peace - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Finding similar documents
Hi, is there are build-in method for finding similar documents to one given document? Thx, Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RF and IDF
Hi all, do you know how the tf und idf values are computed by the default similarity? I mean the exact mathematical equation. Thx, Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: RF and IDF
Thx, but where can I find this classes? >If you really want to understand how scoring works, I'd suggest also >looking at TermWeight/TermScorer. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Boolean Query
Hi, I have got another question... How do I construct a BooleanQuery, where the terms with the query a connected with OR? I have a list of term, representing to high scored terms in a document. Here is my code BooleanQuery query = new BooleanQuery(); for(Term t: terms) { query = new TermQuery(t); query.add(t, false, false); // ist his wrong? } If I construct the query as a string like "A a OR B b OR C" I get much more results. I assume that the Boolean query uses an AND operator. How can I change that. And I'm wondering what happens if I boost a TermQuery with a value smaller then one. I'm asking because I would like to boost each TermQuery with the td*idf Value of the term in the original document. From my point of view, this should lead to a better precision, but on the first looks the results are worse. THX, Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Boolean Query
Hi, I have tried to study to lucene scoring in the default similarity. Can anyone explain me, how this similarity was designed? I have read a lot of IR literature, but I have never seen an equation like the one used in lucene. Why is this better then the normal cosine-measure? Thanks, Klaus -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Hostetter Gesendet: Mittwoch, 11. Januar 2006 20:55 An: java-user@lucene.apache.org Betreff: Re: Boolean Query : BooleanQuery query = new BooleanQuery(); : for(Term t: terms) : { : query = new TermQuery(t); : query.add(t, false, false); // ist his wrong? : } : : If I construct the query as a string like "A a OR B b OR C" I get much more : results. I assume that the Boolean query uses an AND operator. How can I : change that. The "false, false" on when you add the subclauses should be doing the "OR" behavior, but more then likely the problem you are running into has to do with the analyzer being used by your QueryParser when it parses your string -- when you build the query up by hand, no analyzer is used, so if the analyzer used at indexing time did any lowercasing or steming you'll miss a lot of matches. a quick thing you should try is comparing the toString from each of the queries you are comparing (the one QueryParser built, and the one you built by hand). You should also look at this wiki entry, and pick up a copy of Lucene in Action and read chapter 4. : And I'm wondering what happens if I boost a TermQuery with a value smaller : then one. I'm asking because I would like to boost each TermQuery with the : td*idf Value of the term in the original document. From my point of view, : this should lead to a better precision, but on the first looks the results : are worse. Before you try this, make sure you understand the existing score claculation ... look a the explain info for each document against your query and see what it's already doing. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Use the lucene for searching in the Semantic Web.
Hi Jiang, I'm currently facing a similar problem. Up to now I have to use for the semantic query a graph matching algorithm, but the fulltext search in the semantic web is performed by lucene. At first I wrote the whole text into a one index. The document contains one field for the unique id and on for the whole text. For the semantic markup I use an extra index. Every rdf triple will result in a document with the following fields id, predicate + subject + object. Every query is executed on both indexes. I use an extra index for the rdf data, because this results in a higher score for the documents. You might argue that this would adulterate the result, but from me point of view explicit Meta data should be higher scored then terms in document body. Cheers, Klaus -Ursprüngliche Nachricht- Von: jason [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 17. Januar 2006 15:35 An: java-user@lucene.apache.org Betreff: Use the lucene for searching in the Semantic Web. Hi friends, How do you think use the lucene for searching in the Semantic Web? I am trying using the lucene for searching documents with ontological annotation. But i do not get a better model to combine the keywords information and the ontological information. regards jiang xing - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Use the lucene for searching in the Semantic Web.
Hi, >Actually, my problem is that, for instance, for a document d, Its feature >vector may be keywords and concepts. What do you exactly mean by features vector? You are referring to the predicate - object pairs, connected to one subject node, don't you? >I don't know how to weight the two >items. Right now, i used a stupid method, given a document d, i can obtain >a rank D based on keyword method. Also, it is annotated with a concept c >(The most simple example) . People can have a rank C of these concepts in >the domain ontology, where the most relevant concepts should be the at top >of this concept list. Finally, document's rank is decided by the sum of (C >+D). I'm going to implement something like a pagerank algorithm for my search engine. In Contrast to the google approach I cannot just count the edge, of one node, because of the know semantic I can weight them. Of course this implies a knowledge of the domain ontology. For instance if there is a predicate "cited_in_document" I could rank a document higher, if it is often cited. But I'm not sure about the results... Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Analyzer
Hi, Is there a way to get the unstemmed term out of the lucene index, or do I have to change the analyzer, to save the original term and the stemmed one? Thank, Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Use the lucene for searching in the Semantic Web.
>The feature vector may be bigger than the object-predicate pairs. In my >application, each document may be annotated with several concepts to say >this document contains an instance of a class. How do you do that? I have to reengineer the ontology in my application, but I'm not sure how to express that a document belongs to one or more concepts. Would you mind sending my your ontology? >I am very interesting at your approach. You can see the page rank like >method used in the SWOOGLE. But the relations they used only some simple >relations, Such as "import" (used in OWL files"). IF we can use the >Semantic level relations, It's should be better. But I am not sure it can >succeed, as it requires how to weight the relations. Yes. I will have to provide some meta information about the ontology. You can store this information as an Owl annotation, or in an extra file. I will start to implement this during the weekend. I think it will be hard to find the right weights for the predicates, I will keep you informed. Cheers, Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Document similarity
>In my case, i need to filter similar documents in search results and >therefore determine document similarity during indexing process using >term vectors. Obviously, i can't compare currently indexing document >with all documents in my collection. Yes you can. Right after indexing the new documents fetch the termvector for this document from the index. Computer some kind of weight for each term, und construct a Boolean query from all terms. You can use the termweights to boost the termqueries. The hits will be scored, this score is a measure for the similarity between the documents. peace - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Related searches
Hi Leon, have you tried the WorldNet ad-on? You can easily expand the query with synonyms. -Ursprüngliche Nachricht- Von: xing jiang [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 31. Januar 2006 19:03 An: java-user@lucene.apache.org Betreff: Re: Related searches I think you should build a type of domain specific dictionary first. You should say, for instance, "automobile = car". This approach can satisfy your requirement. On 1/30/06, Leon Chaddock <[EMAIL PROTECTED]> wrote: > > Hi, > Does anyone know if it is possible to show related searches with lucene, > for example if someone searched for "car insurance" you could bring back the > results and related searches like these > > > Automobile Insurance > Car Insurance Quote > Car Insurance Quotes > Auto Insurance > Cheap Car Insurance > Car Insurance Company > Car Insurance Companies > Health Insurance > Car Insurance Rates > Car Insurance Rate > Car Insurance Rental > Insurance Quote > Online Car Insurance Quote > Home Insurance > > Thanks > > Leon > -- Regards Jiang Xing - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: two problems of using the lucene.
Hi, you have to write your own similarity object and pass it to your analyzer. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h tml Cheers, Klaus -Ursprüngliche Nachricht- Von: xing jiang [mailto:[EMAIL PROTECTED] Gesendet: Sonntag, 5. Februar 2006 04:27 An: java-user@lucene.apache.org Betreff: two problems of using the lucene. Hi, I got two problems of using the lucene and may need your help. 1. For each word, how the lucene calculate its weight. I only know for each work in the document will be weighed by its tf/idf values. 2. Can I modify the lucene so that i use the term frequency instead of tf/idf value to calculate the similarity between documents and queries. -- Regards Jiang Xing - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Reindexing
Hi, you have to index all object already contained in the database? Then there is no other way then fetching all objects from the database and index them. On Feb 8, 2006, at 1:18 AM, Raul Raja Martinez wrote: > Hi Eric, I'm in the same situation, I wouldn't normally ask > something related to hibernate here but I posted something similar > in the hibernate forums on Jan 16th but still haven't got any > response. > > http://forum.hibernate.org/viewtopic.php?t=954137&highlight=lucene > > It is really obvious that if they offer lucene indexing out of the > box with the hibernate release, people would have to index all > their persistent objects that were already in the database before. > > Any hint is highly appreciated. > > Erik Hatcher wrote: >> You may likely get better response by posting in the Hibernate list. >> Erik >> On Feb 7, 2006, at 7:58 AM, revati joshi wrote: >>> Hello lucene members, >>> i'm the silent member of >>> this group.last week i had sent some query regarding >>> reindexing,but i dn't received any reply from any one.Still i'm >>> stuck up with the same problem of reindexing. >>> i hve completed with the reindexing code using hibernate >>> Lifecycle class but i don't know where and when to call this >>> class for reindexing purpose during updation or new creation of >>> any file in ur system. >>> I just want to know the precise procedure or method for this. >>> So plz do suggest some solution to this as early as possible. >>> Thanks for ur cooperation. >>> Byee for now. >>> >>>- >>> Yahoo! Mail - Helps protect you from nasty viruses. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Suggesting refine searches with Lucene
A simple approach is to count the most common words in the result set and present them in combination with the original query. If you have any meta information you could use them the refine the query. -Ursprüngliche Nachricht- Von: Chun Wei Ho [mailto:[EMAIL PROTECTED] Gesendet: Montag, 13. Februar 2006 10:35 An: java-user@lucene.apache.org Betreff: Suggesting refine searches with Lucene Hi, I am trying to suggest refine searches for my Lucene search. For example, if a search turned out too many searches, it would list a number of document title subsequences that occurred frequently in the results of the previous search, as possible candidates for refining the search. Does anyone know the right/any approach to implementing this in a Lucene-based search app? Thanks. CW - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: Suggesting refine searches with Lucene
>And next time if it is a refined search I will merge current query with How do you recognize a refined query? And how are you the queries refined? Cheers, klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene in multithreaded enviroment
Hi I'm using Lucene in a web application. Every time a new object is added to the system the index will be updated. May there be any problems, if two objects were created at the same moment? I know Lucene has some locking mechanism. Thx klaus -Ursprüngliche Nachricht- Von: Amany Moussa [mailto:[EMAIL PROTECTED] Gesendet: Montag, 20. Februar 2006 21:22 An: java-user@lucene.apache.org Betreff: Re: Lucene CPU Utilization Thank you so much for your reply. I know that you answered this question before. I just wanted to post the question to receive more feedbacks and share the information. Thanks again. Amany M. --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I think I answered that question just the other > day privately... > No, there is nothing in Lucene to help you with CPU > utilization. > However, if you are running this on a UNIX box of > some kind, you can (re)nice the process and thus > lower its priority, giving other processes more time > with the CPU. Windows may have something similar. > > Otis > > - Original Message > From: Amany Moussa <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, February 20, 2006 9:50:57 AM > Subject: Lucene CPU Utilization > > > Hello, > > I am building a Lucene index with over a million > documents retrieved from database. I am running the > application on Unix, I am getting a 100% CPU > utilization the moment the application start. > The application creates a list of small indices in a > temp directory then merge them all in the main index > file. > > Is there any way I can tune the indexing process and > reduce the CPU utilization. > Thanks much. > > Amany M. > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: RE: Stemming and Wildcard - or fire and water
I've encountered the same problem and tried to use your workaround. But overwriting the parser hasn't done the job. I do not understand why the stemming is done anyway. Uwe wrote > This is a well-known problem: Wildcards cannot be analyzed by the query > parser, because the analysis would destroy the wildcard characters; > also stemming of parts of terms will never work. > ... The actual behavior doesn't work either. The english word families will not be found in case the user types the query familie* So why solve the problem by postulate one oppinion as right and another as wrong? A simple flag which allows or suppresses the stemming would solve everyones problem. All who have no need of change can use the old form, everyone else can set the appropriate flag. If this problem is so well known, there seems to be the need for a clean solution to this. > A possible workaround could be to modify search terms with wildcard > tokens by stemming them manually and creating a new search string. > Searches for hersen* would be modified to hers* and return what you > expect. > Con is of course that you search for more than you specified. > > Lars-Erik > > -Original Message- > From: Bayer Dennis [mailto:dennis.ba...@cursor.de] > Sent: Tuesday, December 11, 2012 10:50 AM > To: java-user@lucene.apache.org > Subject: Stemming and Wildcard - or fire and water > > Hello there, > my colleague and I ran into an example which didn't return the result > size which we were expecting. We discovered that there is a mismatch > in handling terms while indexing and searching. This issue is already > discussed several times in the internet as we found out later on, but > in our point of view it's a buggy behavior if, at least, using a German > stemmer. > > Tl;dr: a Junit testcase is available (http://pastebin.com/AdeFdW1k) > > Setup: > * Lucene 4.0.0 > * Use the GermanAnalyzer which internally uses a GermanStemmer > > Issue: > * Create an index for "Hersener" which has a common ending in German > -> the string is shortend to "hers" > * Search for "Hers" -> a result is found > * Search for "Hersen" -> a result is found because the input token is > also stemmed to "hers" > * Search for "Hers*" -> a result is found > * Search for "Hersen*" -> nothing is found because the analyzer does > not run > > Similiar examples can be constructed easily if umlauts are involved. > > Conclusion: > The search query which contains a wildcard should also be run through > the analyzer, because there are a lot of queries which would return > nothing. The lucene FAQ already as a topic related to this issue: > http://wiki.apache.org/lucene- > java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sen > sitive.3F > > The example with "dog" and "dogs" works as long as only one character > is stemmed - which could be true in English for the majority. But if > more characters are involved lucene does not return anything instead > of returning a few additional items. Just consider "families" which is > stemmed to "famili". > Searching for "familie*" wouldn't return no item. > > To find an ending for this initial post ;) : > Could this behavior made configurable in the standard? If not: > a) Why are the stemmers used by default if they can led to wrong results? > b) What can be done manually to stem queries containing wildcards, e.g. > overriding some parser. > > Best regards > Dennis > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
spatial searches
Hi all, I hope someone can enlighten me. I am trying to figure out how spatial searches are to be implemented with Lucene. From walking through mailing lists and various web pages, looking at the JavaDoc and source code, I understand how the tiers work and how the search is limited by a special term query containing the ID(s) of the relevant grid cells. However, it still puzzles me how, where and when the final distance filtering takes place. I see three possibilities: the "Filter" class, the "ValueSourceQuery" or the use of a subclass of "Collector". With my limited understanding of the inner working of Lucene, it seems to me that the first two ways more or less operate on the whole document set, i.e. prior to the moment where the term query for the tiers comes into effect, rendering it useless. The "Collector" approach seems to be much more appropriate, but additionally to the decision whether the document meets the distance condition or not, I would like to have different scores depending on the distance (lower score for larger distances). Originally I thought that the solution would be some kind of subclass of "Query", but haven't seen any hints pointing in this direction and I don't know whether I am able to implement that on my own. I fear that I completely misunderstand something. Thanks in advance for any hints. Regards, Klaus - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: spatial searches
On 22/05/10 08:45, Julian Atkinson wrote: Hi Klaus, I suggest you take a look at the code in TestCartesian.java for working examples of the search and as a staring point to trace through. in more depth, if you look at DistanceQueryBuilder.java you'll see 2 filters are being setup. The first pass filter is created by CartesianPolyFilterBuilder and this makes sure you only consider documents near to the area you are searching by looking in the right tier and pulling out the relevant grid cells. The second filter is dependent on which method you are using Lat/Lng or Geohash - this is where the more precise filtering is done based on the calculated distance. The use of the second pass filter is optional and driven by a boolean. If you want to custom score then there is an example in the TestCartesian.class with CustomScoreQuery Hope this helps, Julian Hi Julian, sorry not to thank earlier -- unfortunately, I had a family tragedy. I missed that CustomScoreQuery can be used without ValueSourceQuery instances. So I will try to use a term query as a subquery to preselect the documents in the geographic vicinity and to finally calculate the right distances using an own implementation of CustomScoreProvider. Greetings, Klaus - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Slow Index Writes
Hi, I am trying to use a lucene as a kind of key value store, but I encountered some bad performance issues. When I try to add my data as documents to the index I get an average write rate of 3 documents / second!! This seems to me ridiculously slow and I guess I must have somewhere an error. Please have a look at my code: Directory dir = new niofsdirectojava-u...@lucene.apache.org! java-user@lucene.apache.org!ry(file); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_45); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_45, analyzer); IndexWriter writer = new IndexWriter(dir, config); int eventCount = 1000; for(int i=0; i < eventCount;i++){ Document doc = new Document(); doc.add(new StringField("id", i+"id" ,Store.YES)); doc.add(new StoredField("b", buildVector())); writer.addDocument(doc); writer.commit(); } dir.close(); writer.close() Not calling the commit function seems to fix the issue, but I guess this would then have some issues if I want to read values in the mean time. My normal use case would be to read something from the index, maybe alter it and then write back. So I would have roughly 50% of reads. I tried also an embedded version of elastic search and it manages to go to 2000 documents/ per second. As its based on lucene as well I guess I do something wrong in my code. THX for the help, Klaus -- -- Klaus Schaefers Senior Optimization Manager Ligatus GmbH Hohenstaufenring 30-32 D-50674 Köln Tel.: +49 (0) 221 / 56939 -784 Fax: +49 (0) 221 / 56 939 - 599 E-Mail: klaus.schaef...@ligatus.com Web: www.ligatus.de HRB Köln 56003 Geschäftsführung: Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, Dipl.-Wirtschaftsingenieur Arne Wolter
Re: Slow Index Writes
Hi, I was looking for some examples but I just found some using an NRTManager class? In Lucene 4.5 I cannot find the class (missing a maven dependency?). Can anyone point me to a working example? Cheers, Klaus On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea wrote: > You will indeed get poor performance if you commit for every doc. Can > you compromise and commit every, say, 1000 docs, or once every few > minutes, or whatever makes sense for your app. > > Or look at lucene's near-real-time search features. Google "Lucene > NRT" for info. > > Or use Elastic Search. > > > -- > Ian. > > > On Fri, Jan 3, 2014 at 10:21 AM, Klaus Schaefers > wrote: > > Hi, > > > > I am trying to use a lucene as a kind of key value store, but I > encountered > > some bad performance issues. When I try to add my data as documents to > the > > index I get an average write rate of 3 documents / second!! This seems to > > me ridiculously slow and I guess I must have somewhere an error. Please > > have a look at my code: > > > > > > > > Directory dir = new niofsdirectojava-u...@lucene.apache.org! > > java-user@lucene.apache.org!ry(file); > > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_45); > > IndexWriterConfig config = new > IndexWriterConfig(Version.LUCENE_45, > > analyzer); > > IndexWriter writer = new IndexWriter(dir, config); > > > > int eventCount = 1000; > > for(int i=0; i < eventCount;i++){ > > Document doc = new Document(); > > doc.add(new StringField("id", i+"id" ,Store.YES)); > > doc.add(new StoredField("b", buildVector())); > > writer.addDocument(doc); > > writer.commit(); > > } > > dir.close(); > > writer.close() > > > > > > Not calling the commit function seems to fix the issue, but I guess this > > would then have some issues if I want to read values in the mean time. My > > normal use case would be to read something from the index, maybe alter it > > and then write back. So I would have roughly 50% of reads. > > > > I tried also an embedded version of elastic search and it manages to go > to > > 2000 documents/ per second. As its based on lucene as well I guess I do > > something wrong in my code. > > > > > > THX for the help, > > > > Klaus > > > > > > -- > > > > -- > > > > Klaus Schaefers > > Senior Optimization Manager > > > > Ligatus GmbH > > Hohenstaufenring 30-32 > > D-50674 Köln > > > > Tel.: +49 (0) 221 / 56939 -784 > > Fax: +49 (0) 221 / 56 939 - 599 > > E-Mail: klaus.schaef...@ligatus.com > > Web: www.ligatus.de > > > > HRB Köln 56003 > > Geschäftsführung: > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > > Dipl.-Wirtschaftsingenieur Arne Wolter > > ----- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- -- Klaus Schaefers Senior Optimization Manager Ligatus GmbH Hohenstaufenring 30-32 D-50674 Köln Tel.: +49 (0) 221 / 56939 -784 Fax: +49 (0) 221 / 56 939 - 599 E-Mail: klaus.schaef...@ligatus.com Web: www.ligatus.de HRB Köln 56003 Geschäftsführung: Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, Dipl.-Wirtschaftsingenieur Arne Wolter
Re: Slow Index Writes
THX! On Wed, Jan 8, 2014 at 10:10 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > NRTManager was renamed to ControlledRealTimeReopenThread at some point. > > But likely simple NRT readers (as Ian described, using > .openIfChanged()) will fit your usage. > > ControlledRealTimeReopenThread is only necessary if you require > certain searches to be real-time, e.g. you just indexed a document and > then want to run a search that you know reflects that document. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Jan 7, 2014 at 8:41 AM, Klaus Schaefers > wrote: > > Hi, > > > > > > I was looking for some examples but I just found some using an NRTManager > > class? In Lucene 4.5 I cannot find the class (missing a maven > dependency?). > > Can anyone point me to a working example? > > > > Cheers, > > > > Klaus > > > > > > > > On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea wrote: > > > >> You will indeed get poor performance if you commit for every doc. Can > >> you compromise and commit every, say, 1000 docs, or once every few > >> minutes, or whatever makes sense for your app. > >> > >> Or look at lucene's near-real-time search features. Google "Lucene > >> NRT" for info. > >> > >> Or use Elastic Search. > >> > >> > >> -- > >> Ian. > >> > >> > >> On Fri, Jan 3, 2014 at 10:21 AM, Klaus Schaefers > >> wrote: > >> > Hi, > >> > > >> > I am trying to use a lucene as a kind of key value store, but I > >> encountered > >> > some bad performance issues. When I try to add my data as documents to > >> the > >> > index I get an average write rate of 3 documents / second!! This > seems to > >> > me ridiculously slow and I guess I must have somewhere an error. > Please > >> > have a look at my code: > >> > > >> > > >> > > >> > Directory dir = new niofsdirectojava-u...@lucene.apache.org! > >> > java-user@lucene.apache.org!ry(file); > >> > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_45); > >> > IndexWriterConfig config = new > >> IndexWriterConfig(Version.LUCENE_45, > >> > analyzer); > >> > IndexWriter writer = new IndexWriter(dir, config); > >> > > >> > int eventCount = 1000; > >> > for(int i=0; i < eventCount;i++){ > >> > Document doc = new Document(); > >> > doc.add(new StringField("id", i+"id" ,Store.YES)); > >> > doc.add(new StoredField("b", buildVector())); > >> > writer.addDocument(doc); > >> > writer.commit(); > >> > } > >> > dir.close(); > >> > writer.close() > >> > > >> > > >> > Not calling the commit function seems to fix the issue, but I guess > this > >> > would then have some issues if I want to read values in the mean > time. My > >> > normal use case would be to read something from the index, maybe > alter it > >> > and then write back. So I would have roughly 50% of reads. > >> > > >> > I tried also an embedded version of elastic search and it manages to > go > >> to > >> > 2000 documents/ per second. As its based on lucene as well I guess I > do > >> > something wrong in my code. > >> > > >> > > >> > THX for the help, > >> > > >> > Klaus > >> > > >> > > >> > -- > >> > > >> > -- > >> > > >> > Klaus Schaefers > >> > Senior Optimization Manager > >> > > >> > Ligatus GmbH > >> > Hohenstaufenring 30-32 > >> > D-50674 Köln > >> > > >> > Tel.: +49 (0) 221 / 56939 -784 > >> > Fax: +49 (0) 221 / 56 939 - 599 > >> > E-Mail: klaus.schaef...@ligatus.com > >> > Web: www.ligatus.de > >> > > >> > HRB Köln 56003 > >> > Geschäftsführung: > >> > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > >> > Dipl.-Wirtschaftsingenieur Arne Wolter > >> > >> ----- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > > > -- > > > > -- > > > > Klaus Schaefers > > Senior Optimization Manager > > > > Ligatus GmbH > > Hohenstaufenring 30-32 > > D-50674 Köln > > > > Tel.: +49 (0) 221 / 56939 -784 > > Fax: +49 (0) 221 / 56 939 - 599 > > E-Mail: klaus.schaef...@ligatus.com > > Web: www.ligatus.de > > > > HRB Köln 56003 > > Geschäftsführung: > > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > > Dipl.-Wirtschaftsingenieur Arne Wolter > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- -- Klaus Schaefers Senior Optimization Manager Ligatus GmbH Hohenstaufenring 30-32 D-50674 Köln Tel.: +49 (0) 221 / 56939 -784 Fax: +49 (0) 221 / 56 939 - 599 E-Mail: klaus.schaef...@ligatus.com Web: www.ligatus.de HRB Köln 56003 Geschäftsführung: Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, Dipl.-Wirtschaftsingenieur Arne Wolter
Alternative scoring of BooleanQuery
Hi all, sorry if this is FAQ or has been answered in the list earlier, but unfortunately I did not find a decent way to search in the archive (maybe a job for Lucene ;-) ) For some reason, I had to split my document into multiple fields. For the search, I create a query with two subqueries for the same term within each field, combining it via a BooleanQuery/Occur.SHOULD. If a term happens to appear in both fields, the score is added (and scaled, if disableCoord is false). In my context this is not really what I want. I would prefer to have a simple "maximum" function over the scores of the subqueries. Since I do not consider myself an expert in the internal working of Lucene, is there an easy way to achieve this or do I have to reimplement the whole BooleanQuery class? Thanks for any advice. Regards, Klaus - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Concurrent Indexing and Searching
Hi, I've read that it is possible to update the index while another thread has a reader open. Now let's say the reader is trying to reopen the index (using its reopen method) and at the very same time, the write its committing its 500MB changes to the index. My question is, what happens in this situation? What index does the reader end up having if it tries to open the index while the writer is modifying it? Any feedback will be much appreciated, Klaus. -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
fast Result Count
Hi Guys, Is there a way to speed up couting documents that satisfy a search query other than by using TopDocCollector.getTotalHits()? For instance, if there are 100 documents satisfying my search query, how can I count them without loading them all in memory? Thanks, Klaus. -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
boosting results with a field from the index
Hi and a Happy New Year! I created a lucene index with 2 fields (text and importance). The text contains the real text and importance is a field where I manually give a number between 1 and 5 for the related document. When I search the index I find the documents with the highest revelancy weighted automatically by lucene. I'm just wondering if I can boost the results with the importance field I already have stored in the index. As I result I expect the same search results just weighted differently. Something like relevancy multiplied by importance. Thank you so much, Klaus - Yahoo! Photos Ring in the New Year with Photo Calendars. Add photos, events, holidays, whatever.
RE: boosting results with a field from the index
Wow, that was fast :-) Right, why haven't I came up with the idea on just sorting the results by importance... Lol... OK, I will test both solutions and see what I like best. Such a great piece for software... -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 03, 2006 5:26 PM To: java-user@lucene.apache.org Subject: Re: boosting results with a field from the index Hi Klaus, You might want to just set the boost value of the Document using your importance number, then Lucene will factor that in automatically when scoring. See the Document#setBoost javadoc for info. You could also sort on the field, I think, so that the more important docs come to the top. -Grant Klaus Hubert wrote: >Hi and a Happy New Year! > >I created a lucene index with 2 fields (text and importance). The text contains the real text and importance is a field where I manually give a number between 1 and 5 for the related document. When I search the index I find the documents with the highest revelancy weighted automatically by lucene. I'm just wondering if I can boost the results with the importance field I already have stored in the index. As I result I expect the same search results just weighted differently. Something like relevancy multiplied by importance. > >Thank you so much, > > Klaus > > >- >Yahoo! Photos > Ring in the New Year with Photo Calendars. Add photos, events, holidays, whatever. > > -- --- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: indexReader close method
-Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Hostetter Gesendet: Montag, 6. Dezember 2004 21:32 An: Lucene Users List Betreff: Re: indexReader close method : Do you know why I can't close the IndexReader explicitly under some : circumstances and why, when I do manage to close it I can still call : methods on the reader? 1) I tried to create a test case that demonstrated your bug based on the code outline you provided, and i couldn't (see below). that implies to me that somethine else is going on. If you can create a completely self contained program that demonstrates your bug and mail it to the list that would help us help you. 2) the documentation for IndexReader.close() says... Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called. ...note the word "should". it doesn't say what the other methods will do if you try to call them, just that you shouldn't try. In some cases they may generate exceptions, in other cases they may just be able to return you data based on state internal to the object which is unaffected by the fact that the files have all been closed. -Hoss public static void main(String argv[]) throws IOException { /* create a directory */ String d = System.getProperty("java.io.tmpdir", "tmp") + System.getProperty("file.separator") + "index-dir-" + (new Random()).nextInt(1000); Directory trash = FSDirectory.getDirectory(d, true); /* build index */ Document doc; IndexWriter w = new IndexWriter(d, new SimpleAnalyzer(), true); doc = new Document(); doc.add(Field.Text("words", "apple emu")); w.addDocument(doc); w.optimize(); w.close(); /* search index */ IndexReader r = IndexReader.open(d); IndexSearcher s = new IndexSearcher(r); Hits h = s.search(new TermQuery(new Term("words", "apple"))); s.close(); r.close(); System.out.println("Reader? - " + r.maxDoc()); } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SIMPLE Lucene / MySQL Indexer
Hi, I played with several search engines to replace MySQL FULLTEXT index and hope that Lucene is the best solution for that. I am reading Mannings book on Lucene in action and it seems to be the most powerful search engine I found so far. I'm stuck at some problem and need help from you experts. I managed to create an index as described in the examples. I also managed to read a MySQL database in Java. My question is, if anybody here has some SIMPLE example which does this in one step. I am good in PHP and in Visual Basic, but very new to Java. Maybe I'm using the wrong tools (NetBeans IDE and JCreator) but I don't get it managed to create an Lucene Index on 3 database fields. I appreciate any help. Thank you so much, Klaus __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SIMPLE Lucene / MySQL Indexer
Hi Chris, this is indeed a cool application, but I need just to create the index. I definitely will look into your file and see if it makes my life easier. Can you tell any details how long it took to create such a huge index? What experiences you have with the slowest search? Does it go over 1 second? (I know, it depends on the hardware, but I'm just wondering) Thanks, Klaus -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 5:04 AM To: java-user@lucene.apache.org Subject: Re: SIMPLE Lucene / MySQL Indexer Please allow me to intraduce DBSight. It's based on Lucene, oriented for Any database search. Most of the things are done by web UI. No coding is needed to create your search. check out this demo. http://search.dbsight.com It's free to download and test. Free for developer edition, non-profit usage. Chris Lu --- Full-Text Search on Any Database http://www.dbsight.net Klaus Hubert wrote: >Hi, > >I played with several search engines to replace MySQL FULLTEXT index >and hope that Lucene is the best solution for that. > >I am reading Mannings book on Lucene in action and it seems to be the >most powerful search engine I found so far. > >I'm stuck at some problem and need help from you experts. I managed to >create an index as described in the examples. I also managed to read a >MySQL database in Java. > >My question is, if anybody here has some SIMPLE example which does this >in one step. I am good in PHP and in Visual Basic, but very new to >Java. Maybe I'm using the wrong tools (NetBeans IDE and JCreator) but I >don't get it managed to create an Lucene Index on 3 database fields. > >I appreciate any help. > >Thank you so much, > > Klaus > >__ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > -- Chris Lu -- Free-Text Search on Any Database http://www.dbsight.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SIMPLE Lucene / MySQL Indexer
Hi Nader, I downloaded Eclipse and also the Hibernate plugin and I really like this IDE. It seems to have lots of power. What I didn't found so far is a Debugger where I can go line by line through the code to see errors eventually. It runs and I get error messages at the line where the problem arises. But I cannot go step by step as I was used to when Programming Visual Basic, PHP or Perl. Thanks, Klaus -Original Message- From: Nader Henein [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 10:42 AM To: java-user@lucene.apache.org Subject: Re: SIMPLE Lucene / MySQL Indexer Also Hibernate, you can use Eclipse as an IDE, with the Hibernator plugin to create objects cleanly from your MySQL database and then a few lines will fetch an object which you could then be passed to Lucene for indexing. Nader Henein Klaus Hubert wrote: >Hi, > >I played with several search engines to replace MySQL FULLTEXT index >and hope that Lucene is the best solution for that. > >I am reading Mannings book on Lucene in action and it seems to be the >most powerful search engine I found so far. > >I'm stuck at some problem and need help from you experts. I managed to >create an index as described in the examples. I also managed to read a >MySQL database in Java. > >My question is, if anybody here has some SIMPLE example which does this >in one step. I am good in PHP and in Visual Basic, but very new to >Java. Maybe I'm using the wrong tools (NetBeans IDE and JCreator) but I >don't get it managed to create an Lucene Index on 3 database fields. > >I appreciate any help. > >Thank you so much, > > Klaus > >__ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > -- Nader S. Henein Senior Applications Architect Bayt.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SIMPLE Lucene / MySQL Indexer
Hi Ian, That's something I'm looking for. Right, a simple source code which reads a database and adds the fields to the index. What I've found also so far is another solution at http://www-128.ibm.com/developerworks/java/library/j-lucene/. First step is to export my MySQL database in simple XML and go from there. It is just an additional step and I would stick with this if I don't find another method to do all at once. Thanks, Klaus -Original Message- From: Ian Lea [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 10:19 AM To: java-user@lucene.apache.org Subject: Re: SIMPLE Lucene / MySQL Indexer Something like this? IndexWriter iw = whatever ResultSet rs = whatever while (rs.next()) { Document ldoc = new Document(); ldoc.add(Field.Text("f1", rs.getString("f1")); ldoc.add(Field.Unstored("f2", rs.getString("f2")); ldoc.add(Field.Keyword("f3", rs.getString("f3")); ... iw.addDocument(ldoc); } rs.close(); iw.close(); On the IDE front, most people seem to use Eclipse nowadays. -- Ian. On 13/07/05, Klaus Hubert <[EMAIL PROTECTED]> wrote: > Hi, > > I played with several search engines to replace MySQL FULLTEXT index > and hope that Lucene is the best solution for that. > > I am reading Mannings book on Lucene in action and it seems to be the > most powerful search engine I found so far. > > I'm stuck at some problem and need help from you experts. I managed to > create an index as described in the examples. I also managed to read a > MySQL database in Java. > > My question is, if anybody here has some SIMPLE example which does > this in one step. I am good in PHP and in Visual Basic, but very new > to Java. Maybe I'm using the wrong tools (NetBeans IDE and JCreator) > but I don't get it managed to create an Lucene Index on 3 database > fields. > > I appreciate any help. > > Thank you so much, > > Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SIMPLE Lucene / MySQL Indexer
Hi Xing, I have the book and as I wrote in my initial message I managed to create the sample index as well managed to read mySQL. But I seem to be not able to combine those programs :-( I'm very new to Java and I haven't found a nice Debugger so far to go step by step through my code. I will try today all day to get this fixed. I know, it shouldn't be too difficult. Thank you, Klaus -Original Message- From: Xing Li [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 2:15 PM To: java-user@lucene.apache.org Subject: RE: SIMPLE Lucene / MySQL Indexer Don't make the mistake of complicating the task. Just read straight from mysql into lucene via java. There is no benefit of exporting data to xml just to regrab the data back into lucene. Get the Lucene In actioin book if you haven't cause all the samples there are real-world practical. Are you need to add is 10 lines of mysql type java/jdbc code and you are ready to create your first index. Download luke for lucene, GUI testing tool so you can browse the index, perform searches, validate/test search performan bottlenecks, dissect queries, etc. On Wednesday, July 13, 2005, at 05:04AM, Klaus Hubert <[EMAIL PROTECTED]> wrote: >Hi Ian, > >That's something I'm looking for. Right, a simple source code which >reads a database and adds the fields to the index. What I've found also >so far is another solution at >http://www-128.ibm.com/developerworks/java/library/j-lucene/. >First step is >to export my MySQL database in simple XML and go from there. It is just >an additional step and I would stick with this if I don't find another >method to do all at once. > >Thanks, > > Klaus > >-Original Message- >From: Ian Lea [mailto:[EMAIL PROTECTED] >Sent: Wednesday, July 13, 2005 10:19 AM >To: java-user@lucene.apache.org >Subject: Re: SIMPLE Lucene / MySQL Indexer > >Something like this? > >IndexWriter iw = whatever >ResultSet rs = whatever > >while (rs.next()) { > Document ldoc = new Document(); > ldoc.add(Field.Text("f1", rs.getString("f1")); > ldoc.add(Field.Unstored("f2", rs.getString("f2")); > ldoc.add(Field.Keyword("f3", rs.getString("f3")); > ... > iw.addDocument(ldoc); >} > >rs.close(); >iw.close(); > > >On the IDE front, most people seem to use Eclipse nowadays. > > >-- >Ian. > >On 13/07/05, Klaus Hubert <[EMAIL PROTECTED]> >wrote: >> Hi, >> >> I played with several search engines to replace >MySQL FULLTEXT index >> and hope that Lucene is the best solution for that. >> >> I am reading Mannings book on Lucene in action and >it seems to be the >> most powerful search engine I found so far. >> >> I'm stuck at some problem and need help from you >experts. I managed to >> create an index as described in the examples. I also >managed to read a >> MySQL database in Java. >> >> My question is, if anybody here has some SIMPLE >example which does >> this in one step. I am good in PHP and in Visual >Basic, but very new >> to Java. Maybe I'm using the wrong tools (NetBeans >IDE and JCreator) >> but I don't get it managed to create an Lucene Index >on 3 database >> fields. >> >> I appreciate any help. >> >> Thank you so much, >> >> Klaus > >- >To unsubscribe, e-mail: >[EMAIL PROTECTED] >For additional commands, e-mail: >[EMAIL PROTECTED] > > > >__ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SIMPLE Lucene / MySQL Indexer
Hi, Thank you all so much for the crash course in Java for Beginners. Indeed the last time I used java was 1996... Lol. But I'm getting now very close. It is all about the right declarations of classes and includes at the correct location. I have almost done it. I will publish my code to the community if somebody is interested. Bye, Klaus -Original Message- From: Xing Li [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 2:38 PM To: java-user@lucene.apache.org Subject: RE: SIMPLE Lucene / MySQL Indexer Kalus, Just a few days ago I couldn't even remember how to compile java code. Last time I touched java was like 2001. Don't worry, Lucene is extremely easy, once you know a bit of fund java. It's no different than any other language. Just syntax. I recommend Java from Deitel & Deitel. Fell in love with their practical written style back in college. Below is what I whipped up quick to test mysql connections... Just add the following to an lucene book sample. You need to download ConnectJ jdbc driver from mysql site and put the jar file in your path variable. my_db db = new my_db(); db.connect(); ResultSet = rs; rs = db.query("select * from mytable limit 100"); whiel(rs.next()) { ... = rs.getString("mysqltablefieldname"); //return string value of mysql row/column ...copy code from lucene... }. import java.sql.*; public class my_db { public Connection conn = null; public Statement stmt = null; public boolean loaded = false; public boolean load() { try { // The newInstance() call is a work around for some // broken Java implementations Class.forName("com.mysql.jdbc.Driver"); return true; } catch (Exception ex) { System.out.println("Cannot load mysql driver."); return false; // handle the error } } public boolean connect() { if(loaded == false) { loaded = load(); } if(loaded == false) { System.out.println("Can't load driver."); return false; } try { conn = DriverManager.getConnection("jdbc:mysql://ip:port/dbname?user=user&password= pass"); stmt = conn.createStatement(); stmt.executeQuery("SET NAMES 'utf8'"); return true; } catch (SQLException ex) { // handle any errors System.out.println("SQLException: " + ex.getMessage()); System.out.println("SQLState: " + ex.getSQLState()); System.out.println("VendorError: " + ex.getErrorCode()); return false; } } public ResultSet query(String sql) { try { return stmt.executeQuery(sql); } catch (SQLException ex) { // handle any errors System.out.println("SQLException: " + ex.getMessage()); System.out.println("SQLState: " + ex.getSQLState()); System.out.println("VendorError: " + ex.getErrorCode()); return null; } } } On Wednesday, July 13, 2005, at 05:23AM, Klaus Hubert <[EMAIL PROTECTED]> wrote: >Hi Xing, > >I have the book and as I wrote in my initial message I managed to >create the sample index as well managed to read mySQL. But I seem to be >not able to combine those programs :-( I'm very new to Java and I >haven't found a nice Debugger so far to go step by step through my >code. I will try today all day to get this fixed. I know, it shouldn't >be too difficult. > >Thank you, > > Klaus > >-Original Message- >From: Xing Li [mailto:[EMAIL PROTECTED] >Sent: Wednesday, July 13, 2005 2:15 PM >To: java-user@lucene.apache.org >Subject: RE: SIMPLE Lucene / MySQL Indexer > >Don't make the mistake of complicating the task. Just read straight >from mysql into lucene via java. There is no benefit of exporting data >to xml just to regrab the data back into lucene. > >Get the Lucene In actioin book if you haven't cause all the samples >there are real-world practical. Are you need to add is 10 lines of >mysql type java/jdbc code and you are ready to create your first index. >Download luke for lucene, GUI testing tool so you can browse the index,
RE: SIMPLE Lucene / MySQL Indexer
Yes, it works with breakpoints and so on, but the current line is never highlighted. All I see where it is the line number in the debug window. But you are right, this is no Java Forum and I apologize for beginners questions. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 2:41 PM To: java-user@lucene.apache.org Subject: RE: SIMPLE Lucene / MySQL Indexer hI Apologies Interesting this is not the Form to discuss about HOW to Debugging with Eclipse So I suggest u to use the Help tab in Eclispe Ide. Hint : First set the Break point on hte code and then use Use the Debug tab under Run. this is a Lucene Form Guys Karthik -Original Message- From: Klaus Hubert [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 5:54 PM To: java-user@lucene.apache.org Subject: RE: SIMPLE Lucene / MySQL Indexer Hi Xing, I have the book and as I wrote in my initial message I managed to create the sample index as well managed to read mySQL. But I seem to be not able to combine those programs :-( I'm very new to Java and I haven't found a nice Debugger so far to go step by step through my code. I will try today all day to get this fixed. I know, it shouldn't be too difficult. Thank you, Klaus -Original Message- From: Xing Li [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 2:15 PM To: java-user@lucene.apache.org Subject: RE: SIMPLE Lucene / MySQL Indexer Don't make the mistake of complicating the task. Just read straight from mysql into lucene via java. There is no benefit of exporting data to xml just to regrab the data back into lucene. Get the Lucene In actioin book if you haven't cause all the samples there are real-world practical. Are you need to add is 10 lines of mysql type java/jdbc code and you are ready to create your first index. Download luke for lucene, GUI testing tool so you can browse the index, perform searches, validate/test search performan bottlenecks, dissect queries, etc. On Wednesday, July 13, 2005, at 05:04AM, Klaus Hubert <[EMAIL PROTECTED]> wrote: >Hi Ian, > >That's something I'm looking for. Right, a simple source code which >reads a database and adds the fields to the index. What I've found also >so far is another solution at >http://www-128.ibm.com/developerworks/java/library/j-lucene/. >First step is >to export my MySQL database in simple XML and go from there. It is just >an additional step and I would stick with this if I don't find another >method to do all at once. > >Thanks, > > Klaus > >-Original Message- >From: Ian Lea [mailto:[EMAIL PROTECTED] >Sent: Wednesday, July 13, 2005 10:19 AM >To: java-user@lucene.apache.org >Subject: Re: SIMPLE Lucene / MySQL Indexer > >Something like this? > >IndexWriter iw = whatever >ResultSet rs = whatever > >while (rs.next()) { > Document ldoc = new Document(); > ldoc.add(Field.Text("f1", rs.getString("f1")); > ldoc.add(Field.Unstored("f2", rs.getString("f2")); > ldoc.add(Field.Keyword("f3", rs.getString("f3")); > ... > iw.addDocument(ldoc); >} > >rs.close(); >iw.close(); > > >On the IDE front, most people seem to use Eclipse nowadays. > > >-- >Ian. > >On 13/07/05, Klaus Hubert <[EMAIL PROTECTED]> >wrote: >> Hi, >> >> I played with several search engines to replace >MySQL FULLTEXT index >> and hope that Lucene is the best solution for that. >> >> I am reading Mannings book on Lucene in action and >it seems to be the >> most powerful search engine I found so far. >> >> I'm stuck at some problem and need help from you >experts. I managed to >> create an index as described in the examples. I also >managed to read a >> MySQL database in Java. >> >> My question is, if anybody here has some SIMPLE >example which does >> this in one step. I am good in PHP and in Visual >Basic, but very new >> to Java. Maybe I'm using the wrong tools (NetBeans >IDE and JCreator) >> but I don't get it managed to create an Lucene Index >on 3 database >> fields. >> >> I appreciate any help. >> >> Thank you so much, >> >> Klaus > >- >To unsubscribe, e-mail: >[EMAIL PROTECTED] >For additional commands, e-mail: >[EMAIL PROTECTED] > > > >__ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >
RE: SIMPLE Lucene / MySQL Indexer
Hi Chris, I've not thought about that. I'm almost done with my program and I will give yours also a try as suggested. I have the lasest (recommended) JDBC 3.1.10. But I still have to download and install Tomcat or similar to run your .war file. I think 5-24h is not that bad, since you can update the Lucene index in future and not go over this long building time again. Your demo looks really nice and its fast. Congratulations! Bye, Klaus -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 5:47 PM To: java-user@lucene.apache.org Subject: Re: SIMPLE Lucene / MySQL Indexer Hi, Klaus, thanks. You can simply use DBSight to create the index. It's in Lucene's standard format. And you can control index field type, analyzers, how to select data from database, number of java threads, etc, just by web UI. No coding is needed. We have a user who didn't know Lucene at all, and have 3 database searches up and running in one week. To index a huge index, say 1 million records, it may take 5 ~ 24 hours depends on the record size, computer size, etc. Actually most of the time is spent on JDBC pulling the data. Special warning: MySQL's JDBC driver has a bug leading to OutOfMemory if you do a select with lots of rows. You must download the latest JDBC(dev version) and use setFetchSize(). Chris --- Full-Text Search on Any Database http://www.dbsight.net On 7/13/05, Klaus Hubert <[EMAIL PROTECTED]> wrote: > Hi Chris, > > this is indeed a cool application, but I need just to create the > index. I definitely will look into your file and see if it makes my > life easier. Can you tell any details how long it took to create such > a huge index? What experiences you have with the slowest search? Does > it go over 1 second? (I know, it depends on the hardware, but I'm just > wondering) > > Thanks, > > Klaus > > -Original Message- > From: Chris Lu [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 13, 2005 5:04 AM > To: java-user@lucene.apache.org > Subject: Re: SIMPLE Lucene / MySQL Indexer > > Please allow me to intraduce DBSight. > It's based on Lucene, oriented for Any database search. > > Most of the things are done by web UI. No coding is needed to create > your search. > check out this demo. http://search.dbsight.com > > It's free to download and test. Free for developer edition, non-profit > usage. > > Chris Lu > --- > Full-Text Search on Any Database > http://www.dbsight.net > > Klaus Hubert wrote: > > >Hi, > > > >I played with several search engines to replace MySQL > FULLTEXT index > >and hope that Lucene is the best solution for that. > > > >I am reading Mannings book on Lucene in action and it > seems to be the > >most powerful search engine I found so far. > > > >I'm stuck at some problem and need help from you > experts. I managed to > >create an index as described in the examples. I also > managed to read a > >MySQL database in Java. > > > >My question is, if anybody here has some SIMPLE > example which does this > >in one step. I am good in PHP and in Visual Basic, > but very new to > >Java. Maybe I'm using the wrong tools (NetBeans IDE > and JCreator) but I > >don't get it managed to create an Lucene Index on 3 > database fields. > > > >I appreciate any help. > > > >Thank you so much, > > > > Klaus > > > >__ > >Do You Yahoo!? > >Tired of spam? Yahoo! Mail has the best spam > protection around > >http://mail.yahoo.com > > > >- > >To unsubscribe, e-mail: > [EMAIL PROTECTED] > >For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > -- > Chris Lu > -- > Free-Text Search on Any Database > http://www.dbsight.net > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > Start your day with Yahoo! - make it your home page > http://www.yahoo.com/r/hs > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Chris Lu - Full-Text Search on Any Database http://www.dbsight.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]