Re: Re: wild card with keyword fileld
What does query.toString() show in each case? I still think you should try lowercasing everything, if only to see if it helps. If it does you could either keep it or figure out what you need to do. -- Ian. On 20 Jul 2005 05:22:29 -, Rahul D Thakare <[EMAIL PROTECTED]> wrote: > > > > Hi Ian, > >Yes, I did implement Eric's suggestion last week, but couldn't help. > I am using a demo program from Lucene.jar to test this, let me put a code > here. > >doc.add(Field.Keyword("keywords", "MAIN BOARD")); >while indexing > > and for retrieving > > PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( new > StandardAnalyzer() ); > analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() ); > > /* QueryParser qp = new QueryParser(line,analyzer); >qp.setLowercaseWildcardTerms(false); >Query query = qp.parse(line, "keywords", analyzer); > */ > Query query = QueryParser.parse(line, "keywords", analyzer); > >you can see Eric's suggestion implemented in commented line. > > am I doing something wrong here ? please let me know. > >thanks and regards > >Rahul Thakare.. > > > On Tue, 19 Jul 2005 Ian Lea wrote : > > >Have you tried Erik's suggestion from last week? > >http://mail-archives.apache.org/mod_mbox/lucene-java-user/200507.mbox/[EMAIL > >PROTECTED] > > > >There is certainly some case confusion in your examples there. > >Personally, I tend to just lowercase all text on indexing and > >searching. > > > >-- > >Ian. > > > >On 19 Jul 2005 05:31:08 -, Rahul D Thakare > ><[EMAIL PROTECTED]> wrote: > > > > > > Hi, > > > > > > I am using Field.Keyword for indexing multi-word keyword (eg: MAIN > LOGIG). Also used keywordAnalyzer, but wild card search is not coming up. Is > there anything which I need to do in addition or, wild card search is not > possible with keyword field. > > > > > > thanks and regards, > > > > > > Rahul Thakare.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: wild card with keyword fileld
On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote: /* QueryParser qp = new QueryParser(line,analyzer); qp.setLowercaseWildcardTerms(false); Query query = qp.parse(line, "keywords", analyzer); */ Query query = QueryParser.parse(line, "keywords", analyzer); You've been bitten, as many others have, of not using the proper parse method. parse(String, String, Analyzer) is a _static_ method and completely ignores your set* calls. Use parse(String). I have deprecated the static method for the 1.9 release and will remove it in the 2.0 release (coming in the near unknown future). Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: wild card with keyword fileld
On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote: Hi Ian, Yes, I did implement Eric's suggestion last week, but couldn't help. Also, just to note it I did mention the parse(String) method in the e-mail referenced below! :) Erik I am using a demo program from Lucene.jar to test this, let me put a code here. doc.add(Field.Keyword("keywords", "MAIN BOARD")); while indexing and for retrieving PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper ( new StandardAnalyzer() ); analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() ); /* QueryParser qp = new QueryParser(line,analyzer); qp.setLowercaseWildcardTerms(false); Query query = qp.parse(line, "keywords", analyzer); */ Query query = QueryParser.parse(line, "keywords", analyzer); you can see Eric's suggestion implemented in commented line. am I doing something wrong here ? please let me know. thanks and regards Rahul Thakare.. On Tue, 19 Jul 2005 Ian Lea wrote : Have you tried Erik's suggestion from last week? http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200507.mbox/% [EMAIL PROTECTED] There is certainly some case confusion in your examples there. Personally, I tend to just lowercase all text on indexing and searching. -- Ian. On 19 Jul 2005 05:31:08 -, Rahul D Thakare <[EMAIL PROTECTED]> wrote: Hi, I am using Field.Keyword for indexing multi-word keyword (eg: MAIN LOGIG). Also used keywordAnalyzer, but wild card search is not coming up. Is there anything which I need to do in addition or, wild card search is not possible with keyword field. thanks and regards, Rahul Thakare.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re: wild card with keyword fileld
Erik/Ian I tried using query.parse(String) did't return any result also my query.toString() returns mainboard:keywords if i give the keyword as mainboard. pls see the changed code again. PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( new StandardAnalyzer() ); analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() ); QueryParser qp = new QueryParser(line,analyzer); qp.setLowercaseWildcardTerms(false); Query query = qp.parse("keywords"); and doc.add(Field.Keyword("keywords", "mainboard")); please advice if I am doing someting wrong regards rahul... On Wed, 20 Jul 2005 Erik Hatcher wrote : > >On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote: > >> >>Hi Ian, >> >> Yes, I did implement Eric's suggestion last week, but couldn't help. > >Also, just to note it I did mention the parse(String) method in the >e-mail referenced below! :) > > Erik > >> I am using a demo program from Lucene.jar to test this, let me put a code >> here. >> >> doc.add(Field.Keyword("keywords", "MAIN BOARD")); >> while indexing >> >>and for retrieving >> >> PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper ( new >> StandardAnalyzer() ); >> analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() ); >> >>/* QueryParser qp = new QueryParser(line,analyzer); >> qp.setLowercaseWildcardTerms(false); >> Query query = qp.parse(line, "keywords", analyzer); >>*/ >> Query query = QueryParser.parse(line, "keywords", analyzer); >> >> you can see Eric's suggestion implemented in commented line. >> >> am I doing something wrong here ? please let me know. >> >> thanks and regards >> >> Rahul Thakare.. >> >> >>On Tue, 19 Jul 2005 Ian Lea wrote : >> >>>Have you tried Erik's suggestion from last week? >>>http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200507.mbox/% >>>[EMAIL PROTECTED] >>> >>>There is certainly some case confusion in your examples there. >>>Personally, I tend to just lowercase all text on indexing and >>>searching. >>> >>>-- >>>Ian. >>> >>>On 19 Jul 2005 05:31:08 -, Rahul D Thakare >>><[EMAIL PROTECTED]> wrote: >>> Hi, I am using Field.Keyword for indexing multi-word keyword (eg: MAIN LOGIG). Also used keywordAnalyzer, but wild card search is not coming up. Is there anything which I need to do in addition or, wild card search is not possible with keyword field. thanks and regards, Rahul Thakare.. >>> >>>- >>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>For additional commands, e-mail: [EMAIL PROTECTED] >>> >> > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] >
Re: Re: wild card with keyword fileld
Rahul Looks like you've got the args mixed up in your qp calls. I think it should be: QueryParser qp = new QueryParser("keywords",analyzer); qp.setLowercaseWildcardTerms(false); Query query = qp.parse(line); -- Ian. On 20 Jul 2005 14:06:32 -, Rahul D Thakare <[EMAIL PROTECTED]> wrote: > > Erik/Ian > > I tried using query.parse(String) did't return any result > also my query.toString() returns mainboard:keywords if i give the keyword > as mainboard. pls see the changed code again. > >PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( new > StandardAnalyzer() ); > analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() ); > > QueryParser qp = new QueryParser(line,analyzer); > qp.setLowercaseWildcardTerms(false); > Query query = qp.parse("keywords"); > > and > doc.add(Field.Keyword("keywords", "mainboard")); > > please advice if I am doing someting wrong > > regards > rahul... > > > On Wed, 20 Jul 2005 Erik Hatcher wrote : > > > >On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote: > > > >> > >>Hi Ian, > >> > >> Yes, I did implement Eric's suggestion last week, but couldn't help. > > > >Also, just to note it I did mention the parse(String) method in the > >e-mail referenced below! :) > > > > Erik > > > >> I am using a demo program from Lucene.jar to test this, let me put a > >> code here. > >> > >> doc.add(Field.Keyword("keywords", "MAIN BOARD")); > >> while indexing > >> > >>and for retrieving > >> > >> PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper ( new > >> StandardAnalyzer() ); > >> analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() ); > >> > >>/* QueryParser qp = new QueryParser(line,analyzer); > >> qp.setLowercaseWildcardTerms(false); > >> Query query = qp.parse(line, "keywords", analyzer); > >>*/ > >> Query query = QueryParser.parse(line, "keywords", analyzer); > >> > >> you can see Eric's suggestion implemented in commented line. > >> > >> am I doing something wrong here ? please let me know. > >> > >> thanks and regards > >> > >> Rahul Thakare.. > >> > >> > >>On Tue, 19 Jul 2005 Ian Lea wrote : > >> > >>>Have you tried Erik's suggestion from last week? > >>>http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200507.mbox/% > >>>[EMAIL PROTECTED] > >>> > >>>There is certainly some case confusion in your examples there. > >>>Personally, I tend to just lowercase all text on indexing and > >>>searching. > >>> > >>>-- > >>>Ian. > >>> > >>>On 19 Jul 2005 05:31:08 -, Rahul D Thakare > >>><[EMAIL PROTECTED]> wrote: > >>> > > Hi, > > I am using Field.Keyword for indexing multi-word keyword (eg: MAIN > LOGIG). Also used keywordAnalyzer, but wild card search is not coming > up. Is there anything which I need to do in addition or, wild card > search is not possible with keyword field. > > thanks and regards, > > Rahul Thakare.. > > > >>> > >>>- > >>>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >> > > > > > >- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching for similar documents
I hope you will forgive the newbie question but do I have to add the MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work? I put the .class file in my \wwwroot\web-inf\classes folder and I am getting an error I don't understand when trying to instantiate the object from Cold Fusion. I also added the .class to a .jar and put it in \lib to no avail. I don't know if this is a CF problem or a Java problem. CF Error: Object Instantiation Exception. An exception occurred when instantiating a java object. The cause of this exception was that: MoreLikeThis (wrong name: org/apache/lucene/search/similar/MoreLikeThis). index="\\www\lucene\myindex"; // get an IndexReader object to use in the constructor to the searcher var indexReader = CreateObject("java", "org.apache.lucene.index.IndexReader"); // get an IndexSearcher object searcher = CreateObject("java", "org.apache.lucene.search.IndexSearcher"); searcher = searcher.init(indexReader.open(index)); // get an Analyzer object analyzer = CreateObject("java", "org.apache.lucene.analysis.standard.StandardAnalyzer"); analyzer.init(); mlt = CreateObject("java", "MoreLikeThis"); // < this is the line that causes the error mlt=mlt.init(indexReader); // [ I have also tried mlt = CreateObject("java", "org.apache.lucene.search.similar.MoreLikeThis");] target = "test of the similarity feature"; query = mlt.like( target); hits = CreateObject("java", "org.apache.lucene.search.Hits"); hits = searcher.search(query); -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 19, 2005 10:59 AM To: java-user@lucene.apache.org Subject: Re: Searching for similar documents On Jul 19, 2005, at 12:42 PM, Kadlabalu, Hareesh wrote: > If someone could someone please extract a version of this file from > source control that corresponds to lucene 1.4.3 or if this can file > can be back-ported, it would be greatly helpful. The old Jakarta Lucene Sandbox is still available via CVS: cvs -d:pserver:[EMAIL PROTECTED]:/home/cvspublic co jakarta- lucene-sandbox > 1. > IndexReader.getFieldNames( IndexReader.FieldOption.INDEXED ) does not > compile on 1.4.3, replace with IndexReader.getIndexedFieldNames ( true > )? I think you want false, not true. The boolean flag refers to term vector data. > 2. > query.add(tq, BooleanClause.Occur.SHOULD) does not compile on 1.4.3, > is this the same as query.add( tq, true, true )? No. It's the same as add(tq, false, false) > I have one small request, is it possible to make the archive of > 'Contribution' section that corresponds to Lucene > 1.4.3 release available online? At this point we're probably too far removed from it to accomplish that cleanly. MoreLikeThis may not have ever been 1.4.3 compatible - I don't recall - it certainly wasn't added until well after 1.4.3 was released. The CVS repository should be sufficient for folks to build it themselves if necessary. For most of the old Sandbox contributions, you can find binary releases of those in the Lucene in Action code distribution at www.lucenebook.com Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New line
When my text file is being searched it seems every line is blending. So I need the index searcher to see a newline character or field separator in the text file. What can be used in the text file to separate my lines ? From: Otis Gospodnetic <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: New line Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT) I may be misunderstanding you, but \n is the "newline" character. http://www.google.com/search?q=newline%20character%20java Otis --- christopher may <[EMAIL PROTECTED]> wrote: > > I am using text files in my index. What can be used as the new line > character ? Say I have > A batch of apples Apples . So the doc is returned as Apples > and the > summary is A batch of apples. If I want to then on the next line of > the file > put A state out west Arizona. This all blends together. What > is my > default line separator ? Or new line character. Thanks all > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryParser handling of backslash characters
On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote: Hi, I'm seeing some strange behavior in the way the QueryParser handles consecutive backslash characters. I know that backslash is the escape character in Lucene, and so I would expect "" to match fields that have two consecutive backslashes, but this does not seem to be the case. The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public". The only way I can get my query to find the record containing that value is to type "FieldName:\\\192.168.0.15\\public" (three slashes). Why is the third backslash character not treated as an escape? Is it just that any backslash that is preceded by a backslash is interpreted as a literal backslash character, regardless of whether the "escape" backslash was itself escaped? I can code around this, but it seems inconsistent with the way that escape characters usually work. Is this a bug, or is it intentional, or am I missing something? I've waited until I had a chance to experiment with this before replying. I say that this is a bug. There is a private method in QueryParser called discardEscapeChar (shown below). I copied it to a JUnit test case and gave it this assert: assertEquals("192.168.0.15public", discardEscapeChar ("192.168.0.15public")); This test fails with: Expected:192.168.0.15\\public Actual :\192.168.0.15\public Which is wrong in my opinion. (though my head hurts thinking about metaescaping backslashes in Java code to make this a proper test) The bug is isolated to the discardEscapeChar() method where it eats too many backslashes. Could you have a shot at tweaking that method to do the right thing and submit a patch? private String discardEscapeChar(String input) { char[] caSource = input.toCharArray(); char[] caDest = new char[caSource.length]; int j = 0; for (int i = 0; i < caSource.length; i++) { if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] == '\\')) { caDest[j++]=caSource[i]; } } return new String(caDest, 0, j); } Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching for similar documents
On Jul 20, 2005, at 1:47 PM, Derek Westfall wrote: I hope you will forgive the newbie question but do I have to add the MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work? I put the .class file in my \wwwroot\web-inf\classes folder If you put it in the right package directory under WEB-INF/classes then it should work (provided all the dependencies it has are in WEB- INF/lib, which may just be the Lucene JAR file). The package is org.apache.lucene.search.similar, so it should go in WEB-INF/classes/ org/apache/lucene/search/similar. I recommend you put this under your webapps WEB-INF/classes directory, not in a common directory to your container. mlt = CreateObject("java", "MoreLikeThis"); // < this is the line that causes the error You should use org.apache.lucene.search.similar.MoreLikeThis Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Too many open files error using tomcat and lucene
We are getting the following error in our tomcat error log. /dsk1/db/lucene/journals/_clr.f7 (Too many open files) java.io.FileNotFoundException: /dsk1/db/lucene/journals/_clr.f7 (Too many open files) at java.io.RandomAccessFile.open(Native Method) We are using the following lucene-1.3-final SunOS thor 5.8 Generic_117350-21 sun4u sparc SUNW,Ultra-250 tomcat 4.1.34 Java 1.4.2 Does any one have any idea how to resolve this. Is it an OS, java or tomcat problem. thanks, Dan P. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: QueryParser handling of backslash characters
I think this should work: (Written in C# originally - so someone please check if it compiles - I don't have a java compiler here) private String discardEscapeChar(String input) { char[] caSource = input.toCharArray(); char[] caDest = new char[caSource.length]; int j = 0; for (int i = 0; i < caSource.length; i++) { if (caSource[i] == '\\') { if (caSource.length == ++i) break; } caDest[j++]=caSource[i]; } return new String(caDest, 0, j); } Regarding your UnitTest - It think it's wrong: > assertEquals("192.168.0.15public", > discardEscapeChar ("192.168.0.15public")); It should be: assertEquals("192.168.0.15public", discardEscapeChar ("192.168.0.15public")); I would also suggest to add the following: String s="some.host.name\\dir+:+-!():^[]\{}~*?"; assertEquals(s,discardEscapeChar(escape(s))); Eyal > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 20, 2005 22:38 PM > To: java-user@lucene.apache.org > Subject: Re: QueryParser handling of backslash characters > > > On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote: > > > Hi, > > > > I'm seeing some strange behavior in the way the QueryParser handles > > consecutive backslash characters. I know that backslash is > the escape > > character in Lucene, and so I would expect "" to match > fields that > > have two consecutive backslashes, but this does not seem to be the > > case. > > > > The fields I'm searching are UNC paths, e.g. > "\\192.168.0.15\public". > > The only way I can get my query to find the record containing that > > value is to type "FieldName:\\\192.168.0.15\\public" (three > slashes). > > Why is the third backslash character not treated as an > escape? Is it > > just that any backslash that is preceded by a backslash is > interpreted > > as a literal backslash character, regardless of whether the "escape" > > backslash was itself escaped? > > > > I can code around this, but it seems inconsistent with the way that > > escape characters usually work. Is this a bug, or is it > intentional, > > or am I missing something? > > I've waited until I had a chance to experiment with this > before replying. I say that this is a bug. There is a > private method in QueryParser called discardEscapeChar (shown > below). I copied it to a JUnit test case and gave it this assert: > > assertEquals("192.168.0.15public", > discardEscapeChar ("192.168.0.15public")); > > This test fails with: > > Expected:192.168.0.15\\public > Actual :\192.168.0.15\public > > Which is wrong in my opinion. (though my head hurts thinking > about metaescaping backslashes in Java code to make this a > proper test) > > The bug is isolated to the discardEscapeChar() method where > it eats too many backslashes. Could you have a shot at > tweaking that method to do the right thing and submit a patch? > >private String discardEscapeChar(String input) { > char[] caSource = input.toCharArray(); > char[] caDest = new char[caSource.length]; > int j = 0; > for (int i = 0; i < caSource.length; i++) { >if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] > == '\\')) { > caDest[j++]=caSource[i]; >} > } > return new String(caDest, 0, j); >} > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BOOLEAN OPERATOR HOWTO
On Jul 19, 2005, at 8:31 AM, Karthik N S wrote: Given a Search word = 'erik OR hatcher AND otis OR gospodnetic' , Is it possible to RETURN COUNT occurances for each of the word with in the Searched documents. This would give me the Each word's Term Frequency. How to achieve this Wow - I really missed my guess on your question! :) It is possible, but not directly (though you could spelunk the Explanation to get this information per document). Do you want term frequency across an individual document or the entire index? Erik Thx in advance karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, July 18, 2005 6:39 PM To: java-user@lucene.apache.org Subject: Re: BOOLEAN OPERATOR HOWTO On Jul 18, 2005, at 8:12 AM, Karthik N S wrote: I have 2 Questions. But there were no question marks! I don't understand your questions at all, sorry, but I'll see if I can decipher it somewhat 1) The Search Criteria src requires to automatically fill " " between Search words with a Boolean Operator " AND ". You mean to achieve AND'd clauses? By default, OR is the operator, and AND must be explicit. You can construct a QueryParser instance and set the default operator to AND, though, and then OR must be explicit. 2) The Search Criteria src requires to automatically recognise the existing Boolean Query ' AND , + ' present and append the same with out any manupulations. Ex : - Search Word = 'Lucene in Action Erik hatcher and Otis + Gospodnetic ' = lucene AND action AND Eric AND hatcher AND otis + gospodnetic . How to Achieve this , Is there any mechanism built into Lucene to handle such situations. Yes, this sounds like the default operator is what you're looking for. Since you use "Lucene in Action" as an example, flip to page 94 for more discussion on this, and then flip to the other pages mentioned here: http://www.lucenebook.com/search?query=default+operator Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New line
Chris, If I understand your question correctly, you are saying why is the the search output of lucene not returning the two lines as distinct two lines? If you are returning the lucene search output to the web and want the new line \n to be dispalyed as such, you need to replace the character with [br] tags. To lucene, the new line is likely used as part of the tokenizer to distinguish words/tokens for the index but it does not do anything special with it is stored or displayed. However, depending on your lucene client/app, you might need to tweak the client output to display the 2 lines separately. I think that is your question. Xing christopher may wrote: When my text file is being searched it seems every line is blending. So I need the index searcher to see a newline character or field separator in the text file. What can be used in the text file to separate my lines ? From: Otis Gospodnetic <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: New line Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT) I may be misunderstanding you, but \n is the "newline" character. http://www.google.com/search?q=newline%20character%20java Otis --- christopher may <[EMAIL PROTECTED]> wrote: > > I am using text files in my index. What can be used as the new line > character ? Say I have > A batch of apples Apples . So the doc is returned as Apples > and the > summary is A batch of apples. If I want to then on the next line of > the file > put A state out west Arizona. This all blends together. What > is my > default line separator ? Or new line character. Thanks all > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using QueryParser with a single field
On Jul 19, 2005, at 8:10 AM, Eyal wrote: Hi, In my client application I allow the user to build a query by selecting a field from a combobox and entering a value to search by. I want the user to enter free text queries for each field, but I don't want to parse it myself so I thought I'd use QueryParser for that. My problem is that if the user will (for example) select a field called author and enter the following text: 'John content:MyContent' QueryParser will build a query for author:John OR content:MyContent. I want QueryParser to ignore other fields. Any method in QueryParser to allow that? If not - any other suggestions? There is no such switch in QueryParser to disable fielded queries. A custom QueryParser would be needed to make this happen. If you only need TermQuery and PhraseQuery you could do without QueryParser altogether in this situation and process (not quite "parse") the text fields by building up the appropriate query. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Too many open files error using tomcat and lucene
On Wednesday 20 July 2005 22:49, Dan Pelton wrote: > We are getting the following error in our tomcat error log. > /dsk1/db/lucene/journals/_clr.f7 (Too many open files) > java.io.FileNotFoundException: /dsk1/db/lucene/journals/_clr.f7 (Too > many open files) See http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82 -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Too many open files error using tomcat and lucene
Hi, Dan, I think the problem you mentioned is the one that has been discussed lot of times in this mailing list. Bottomline is that you'd better use the compound file format to store indexes. I am not sure Lucene 1.3 has that available, but, if possible, can you upgrade to lucene 1.4.3? Cheers, Jian On 7/20/05, Dan Pelton <[EMAIL PROTECTED]> wrote: > We are getting the following error in our tomcat error log. > /dsk1/db/lucene/journals/_clr.f7 (Too many open files) > java.io.FileNotFoundException: /dsk1/db/lucene/journals/_clr.f7 (Too many > open files) > at java.io.RandomAccessFile.open(Native Method) > > We are using the following > lucene-1.3-final > SunOS thor 5.8 Generic_117350-21 sun4u sparc SUNW,Ultra-250 > tomcat 4.1.34 > Java 1.4.2 > > > Does any one have any idea how to resolve this. Is it an OS, java or tomcat > problem. > > thanks, > Dan P. > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New line
How you tokenize your input is up to you. It sounds like you want a custom Analyzer that has a tokenizer that knows about newline characters and does whatever you need it to do when a newline character is encountered (e.g. stop reading or whatever). The search part of Lucene has no notion of newline characters and such. It only knows about documents and words/tokens in them. Otis --- christopher may <[EMAIL PROTECTED]> wrote: > > When my text file is being searched it seems every line is blending. > So I > need the index searcher to see a newline character or field separator > in the > text file. What can be used in the text file to separate my lines ? > > >From: Otis Gospodnetic <[EMAIL PROTECTED]> > >Reply-To: java-user@lucene.apache.org > >To: java-user@lucene.apache.org > >Subject: Re: New line > >Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT) > > > >I may be misunderstanding you, but \n is the "newline" character. > >http://www.google.com/search?q=newline%20character%20java > > > >Otis > > > > > >--- christopher may <[EMAIL PROTECTED]> wrote: > > > > > > > > I am using text files in my index. What can be used as the new > line > > > character ? Say I have > > > A batch of apples Apples . So the doc is returned as > Apples > > > and the > > > summary is A batch of apples. If I want to then on the next line > of > > > the file > > > put A state out west Arizona. This all blends together. > What > > > is my > > > default line separator ? Or new line character. Thanks all > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > >- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryParser handling of backslash characters
That fix works perfectly, as far as I can tell. As for the unit test, it should actually be: assertEquals("192.168.0.15\\public", discardEscapeChar ("192.168.0.15public")); Jeff On 7/20/05, Eyal <[EMAIL PROTECTED]> wrote: > I think this should work: > > (Written in C# originally - so someone please check if it compiles - I don't > have a java compiler here) > > private String discardEscapeChar(String input) > { > char[] caSource = input.toCharArray(); > char[] caDest = new char[caSource.length]; > int j = 0; > > for (int i = 0; i < caSource.length; i++) > { > if (caSource[i] == '\\') > { > if (caSource.length == ++i) > break; > } > caDest[j++]=caSource[i]; > } > return new String(caDest, 0, j); > } > > > Regarding your UnitTest - It think it's wrong: > > > assertEquals("192.168.0.15public", > > discardEscapeChar ("192.168.0.15public")); > > It should be: assertEquals("192.168.0.15public", discardEscapeChar > ("192.168.0.15public")); > > I would also suggest to add the following: > String s="some.host.name\\dir+:+-!():^[]\{}~*?"; > assertEquals(s,discardEscapeChar(escape(s))); > > Eyal > > > -Original Message- > > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, July 20, 2005 22:38 PM > > To: java-user@lucene.apache.org > > Subject: Re: QueryParser handling of backslash characters > > > > > > On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote: > > > > > Hi, > > > > > > I'm seeing some strange behavior in the way the QueryParser handles > > > consecutive backslash characters. I know that backslash is > > the escape > > > character in Lucene, and so I would expect "" to match > > fields that > > > have two consecutive backslashes, but this does not seem to be the > > > case. > > > > > > The fields I'm searching are UNC paths, e.g. > > "\\192.168.0.15\public". > > > The only way I can get my query to find the record containing that > > > value is to type "FieldName:\\\192.168.0.15\\public" (three > > slashes). > > > Why is the third backslash character not treated as an > > escape? Is it > > > just that any backslash that is preceded by a backslash is > > interpreted > > > as a literal backslash character, regardless of whether the "escape" > > > backslash was itself escaped? > > > > > > I can code around this, but it seems inconsistent with the way that > > > escape characters usually work. Is this a bug, or is it > > intentional, > > > or am I missing something? > > > > I've waited until I had a chance to experiment with this > > before replying. I say that this is a bug. There is a > > private method in QueryParser called discardEscapeChar (shown > > below). I copied it to a JUnit test case and gave it this assert: > > > > assertEquals("192.168.0.15public", > > discardEscapeChar ("192.168.0.15public")); > > > > This test fails with: > > > > Expected:192.168.0.15\\public > > Actual :\192.168.0.15\public > > > > Which is wrong in my opinion. (though my head hurts thinking > > about metaescaping backslashes in Java code to make this a > > proper test) > > > > The bug is isolated to the discardEscapeChar() method where > > it eats too many backslashes. Could you have a shot at > > tweaking that method to do the right thing and submit a patch? > > > >private String discardEscapeChar(String input) { > > char[] caSource = input.toCharArray(); > > char[] caDest = new char[caSource.length]; > > int j = 0; > > for (int i = 0; i < caSource.length; i++) { > >if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] > > == '\\')) { > > caDest[j++]=caSource[i]; > >} > > } > > return new String(caDest, 0, j); > >} > > > > Erik > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
StackOverflowError when index pdf files
Hi, I've the error "java.lang.StackOverflowError" when I try to index text files that I got from transforming pdf files through pdfbox API. When I index normal text repository, I havn't this error. may some one help me ? Thanks, Gayo - envoyé via Webmail/IMAG ! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StackOverflowError when index pdf files
It sounds like the problem may stem from your PDF parser Otis --- [EMAIL PROTECTED] wrote: > > Hi, > I've the error "java.lang.StackOverflowError" when I try to index > text files > that I got from transforming pdf files through pdfbox API. > When I index normal text repository, I havn't this error. > may some one help me ? > > Thanks, > > Gayo > > - > envoyé via Webmail/IMAG ! > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching for similar documents
Okay, I figured out how to use JAR, extracted all the files from lucene-1.4.3.jar, added the MoreLikeThis classes in the appropriate folder, recreated and replaced the JAR. Since Lucene is my first exposure to Java I am pretty proud of myself at this point. The only thing that still wasn't working was the setFieldNames function, so I just set it to NULL in the .java code, recompiled and recreated the .jar and now it is working! And doing a good job, too! Thanks! Derek -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 20, 2005 1:31 PM To: java-user@lucene.apache.org Subject: Re: Searching for similar documents On Jul 20, 2005, at 1:47 PM, Derek Westfall wrote: > I hope you will forgive the newbie question but do I have to add the > MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work? > > I put the .class file in my \wwwroot\web-inf\classes folder If you put it in the right package directory under WEB-INF/classes then it should work (provided all the dependencies it has are in WEB- INF/lib, which may just be the Lucene JAR file). The package is org.apache.lucene.search.similar, so it should go in WEB-INF/classes/ org/apache/lucene/search/similar. I recommend you put this under your webapps WEB-INF/classes directory, not in a common directory to your container. > mlt = CreateObject("java", "MoreLikeThis"); // < this is the > line that causes the error You should use org.apache.lucene.search.similar.MoreLikeThis Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching for similar documents
You'll want to re-think that re-JARing approach for the long term, as you'll want to upgrade Lucene at some point I suspect. But congrats on hacking it! Erik On Jul 20, 2005, at 5:44 PM, Derek Westfall wrote: Okay, I figured out how to use JAR, extracted all the files from lucene-1.4.3.jar, added the MoreLikeThis classes in the appropriate folder, recreated and replaced the JAR. Since Lucene is my first exposure to Java I am pretty proud of myself at this point. The only thing that still wasn't working was the setFieldNames function, so I just set it to NULL in the .java code, recompiled and recreated the .jar and now it is working! And doing a good job, too! Thanks! Derek -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 20, 2005 1:31 PM To: java-user@lucene.apache.org Subject: Re: Searching for similar documents On Jul 20, 2005, at 1:47 PM, Derek Westfall wrote: I hope you will forgive the newbie question but do I have to add the MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work? I put the .class file in my \wwwroot\web-inf\classes folder If you put it in the right package directory under WEB-INF/classes then it should work (provided all the dependencies it has are in WEB- INF/lib, which may just be the Lucene JAR file). The package is org.apache.lucene.search.similar, so it should go in WEB-INF/classes/ org/apache/lucene/search/similar. I recommend you put this under your webapps WEB-INF/classes directory, not in a common directory to your container. mlt = CreateObject("java", "MoreLikeThis"); // < this is the line that causes the error You should use org.apache.lucene.search.similar.MoreLikeThis Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StackOverflowError when index pdf files
Yes, this sounds like an issue with PDFBox, can you determine if it is a single PDF document and post an issue on the PDFBox sourceforge site. Thanks, Ben Litchfield On Wed, 20 Jul 2005, Otis Gospodnetic wrote: > It sounds like the problem may stem from your PDF parser > > Otis > > --- [EMAIL PROTECTED] wrote: > > > > > Hi, > > I've the error "java.lang.StackOverflowError" when I try to index > > text files > > that I got from transforming pdf files through pdfbox API. > > When I index normal text repository, I havn't this error. > > may some one help me ? > > > > Thanks, > > > > Gayo > > > > - > > envoyé via Webmail/IMAG ! > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching for similar documents
Your solution below is undoubtedly my problem. I didn't even consider the need to create all those directory levels. I'm sure that will solve it! -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 20, 2005 1:31 PM To: java-user@lucene.apache.org Subject: Re: Searching for similar documents If you put it in the right package directory under WEB-INF/classes then it should work (provided all the dependencies it has are in WEB- INF/lib, which may just be the Lucene JAR file). The package is org.apache.lucene.search.similar, so it should go in WEB-INF/classes/ org/apache/lucene/search/similar. I recommend you put this under your webapps WEB-INF/classes directory, not in a common directory to your container. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]