Re: Re: wild card with keyword fileld

2005-07-20 Thread Ian Lea
What does query.toString() show in each case?  I still think you
should try lowercasing everything, if only to see if it helps.  If it
does you could either keep it or figure out what you need to do.


--
Ian.


On 20 Jul 2005 05:22:29 -, Rahul D Thakare
<[EMAIL PROTECTED]> wrote:
> 
> 
>
>  Hi Ian,
>  
>Yes, I did implement Eric's suggestion last week, but couldn't help.
>  I am using a demo program from Lucene.jar to test this, let me put a code
> here.
>  
>doc.add(Field.Keyword("keywords", "MAIN BOARD"));
>while indexing
>  
>  and for retrieving
>  
>  PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( new
> StandardAnalyzer() );
>  analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() );
>  
>  /* QueryParser qp = new QueryParser(line,analyzer);
>qp.setLowercaseWildcardTerms(false);
>Query query = qp.parse(line, "keywords", analyzer);
>  */
>  Query query = QueryParser.parse(line, "keywords", analyzer);
>  
>you can see Eric's suggestion implemented in commented line.
>  
>  am I doing something wrong here ? please let me know.
>  
>thanks and regards
>  
>Rahul Thakare..
>  
>  
>  On Tue, 19 Jul 2005 Ian Lea wrote :
> 
>  >Have you tried Erik's suggestion from last week?
> >http://mail-archives.apache.org/mod_mbox/lucene-java-user/200507.mbox/[EMAIL 
> >PROTECTED]
>  >
>  >There is certainly some case confusion in your examples there.
>  >Personally, I tend to just lowercase all text on indexing and
>  >searching.
>  >
>  >--
>  >Ian.
>  >
>  >On 19 Jul 2005 05:31:08 -, Rahul D Thakare
>  ><[EMAIL PROTECTED]> wrote:
>  > >
>  > > Hi,
>  > >
>  > >  I am using Field.Keyword for indexing multi-word keyword (eg: MAIN
> LOGIG). Also used keywordAnalyzer, but wild card search is not coming up. Is
> there anything which I need to do in addition or, wild card search is not
> possible with keyword field.
>  > >
>  > > thanks and regards,
>  > >
>  > > Rahul Thakare..

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: wild card with keyword fileld

2005-07-20 Thread Erik Hatcher


On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote:

/* QueryParser qp = new QueryParser(line,analyzer);
  qp.setLowercaseWildcardTerms(false);
  Query query = qp.parse(line, "keywords", analyzer);
*/
 Query query = QueryParser.parse(line, "keywords", analyzer);


You've been bitten, as many others have, of not using the proper  
parse method.  parse(String, String, Analyzer) is a _static_ method  
and completely ignores your set* calls.  Use parse(String).


I have deprecated the static method for the 1.9 release and will  
remove it in the 2.0 release (coming in the near unknown future).


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: wild card with keyword fileld

2005-07-20 Thread Erik Hatcher


On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote:



Hi Ian,

  Yes, I did implement Eric's suggestion last week, but couldn't help.


Also, just to note it I did mention the parse(String) method in  
the e-mail referenced below!  :)


Erik

 I am using a demo program from Lucene.jar to test this, let me put  
a code here.


  doc.add(Field.Keyword("keywords", "MAIN BOARD"));
  while indexing

and for retrieving

 PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper 
( new StandardAnalyzer() );

 analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() );

/* QueryParser qp = new QueryParser(line,analyzer);
  qp.setLowercaseWildcardTerms(false);
  Query query = qp.parse(line, "keywords", analyzer);
*/
 Query query = QueryParser.parse(line, "keywords", analyzer);

  you can see Eric's suggestion implemented in commented line.

 am I doing something wrong here ? please let me know.

  thanks and regards

  Rahul Thakare..


On Tue, 19 Jul 2005 Ian Lea wrote :


Have you tried Erik's suggestion from last week?
http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 
200507.mbox/% 
[EMAIL PROTECTED]


There is certainly some case confusion in your examples there.
Personally, I tend to just lowercase all text on indexing and
searching.

--
Ian.

On 19 Jul 2005 05:31:08 -, Rahul D Thakare
<[EMAIL PROTECTED]> wrote:



Hi,

  I am using Field.Keyword for indexing multi-word keyword (eg:  
MAIN LOGIG). Also used keywordAnalyzer, but wild card search is  
not coming up. Is there anything which I need to do in addition  
or, wild card search is not possible with keyword field.


thanks and regards,

Rahul Thakare..




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Re: wild card with keyword fileld

2005-07-20 Thread Rahul D Thakare
   
Erik/Ian

 I tried using query.parse(String) did't return any result
 also my query.toString() returns mainboard:keywords  if i give the keyword as 
mainboard. pls see the changed code again.

   PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( new 
StandardAnalyzer() );
 analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() );
 
  QueryParser qp = new QueryParser(line,analyzer);
  qp.setLowercaseWildcardTerms(false);
  Query query = qp.parse("keywords");

  and 
  doc.add(Field.Keyword("keywords", "mainboard"));

  please advice if I am doing someting wrong

  regards
  rahul...


On Wed, 20 Jul 2005 Erik Hatcher wrote :
>
>On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote:
>
>>
>>Hi Ian,
>>
>>   Yes, I did implement Eric's suggestion last week, but couldn't help.
>
>Also, just to note it I did mention the parse(String) method in  the 
>e-mail referenced below!  :)
>
> Erik
>
>>  I am using a demo program from Lucene.jar to test this, let me put  a code 
>> here.
>>
>>   doc.add(Field.Keyword("keywords", "MAIN BOARD"));
>>   while indexing
>>
>>and for retrieving
>>
>>  PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper ( new 
>> StandardAnalyzer() );
>>  analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() );
>>
>>/* QueryParser qp = new QueryParser(line,analyzer);
>>   qp.setLowercaseWildcardTerms(false);
>>   Query query = qp.parse(line, "keywords", analyzer);
>>*/
>>  Query query = QueryParser.parse(line, "keywords", analyzer);
>>
>>   you can see Eric's suggestion implemented in commented line.
>>
>>  am I doing something wrong here ? please let me know.
>>
>>   thanks and regards
>>
>>   Rahul Thakare..
>>
>>
>>On Tue, 19 Jul 2005 Ian Lea wrote :
>>
>>>Have you tried Erik's suggestion from last week?
>>>http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200507.mbox/% 
>>>[EMAIL PROTECTED]
>>>
>>>There is certainly some case confusion in your examples there.
>>>Personally, I tend to just lowercase all text on indexing and
>>>searching.
>>>
>>>--
>>>Ian.
>>>
>>>On 19 Jul 2005 05:31:08 -, Rahul D Thakare
>>><[EMAIL PROTECTED]> wrote:
>>>

Hi,

   I am using Field.Keyword for indexing multi-word keyword (eg:  MAIN 
 LOGIG). Also used keywordAnalyzer, but wild card search is  not coming up. 
 Is there anything which I need to do in addition  or, wild card search is 
 not possible with keyword field.

thanks and regards,

Rahul Thakare..


>>>
>>>-
>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>
>
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>


Re: Re: wild card with keyword fileld

2005-07-20 Thread Ian Lea
Rahul

Looks like you've got the args mixed up in your qp calls.  I think it should be:

 QueryParser qp = new QueryParser("keywords",analyzer);
 qp.setLowercaseWildcardTerms(false); 
 Query query = qp.parse(line);


--
Ian.


On 20 Jul 2005 14:06:32 -, Rahul D Thakare
<[EMAIL PROTECTED]> wrote:
>  
> Erik/Ian
> 
>  I tried using query.parse(String) did't return any result
>  also my query.toString() returns mainboard:keywords  if i give the keyword 
> as mainboard. pls see the changed code again.
> 
>PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( new 
> StandardAnalyzer() );
>  analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() );
> 
>   QueryParser qp = new QueryParser(line,analyzer);
>   qp.setLowercaseWildcardTerms(false);
>   Query query = qp.parse("keywords");
> 
>   and
>   doc.add(Field.Keyword("keywords", "mainboard"));
> 
>   please advice if I am doing someting wrong
> 
>   regards
>   rahul...
> 
> 
> On Wed, 20 Jul 2005 Erik Hatcher wrote :
> >
> >On Jul 20, 2005, at 1:22 AM, Rahul D Thakare wrote:
> >
> >>
> >>Hi Ian,
> >>
> >>   Yes, I did implement Eric's suggestion last week, but couldn't help.
> >
> >Also, just to note it I did mention the parse(String) method in  the 
> >e-mail referenced below!  :)
> >
> > Erik
> >
> >>  I am using a demo program from Lucene.jar to test this, let me put  a 
> >> code here.
> >>
> >>   doc.add(Field.Keyword("keywords", "MAIN BOARD"));
> >>   while indexing
> >>
> >>and for retrieving
> >>
> >>  PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper ( new 
> >> StandardAnalyzer() );
> >>  analyzer.addAnalyzer( "keywords", new KeywordAnalyzer() );
> >>
> >>/* QueryParser qp = new QueryParser(line,analyzer);
> >>   qp.setLowercaseWildcardTerms(false);
> >>   Query query = qp.parse(line, "keywords", analyzer);
> >>*/
> >>  Query query = QueryParser.parse(line, "keywords", analyzer);
> >>
> >>   you can see Eric's suggestion implemented in commented line.
> >>
> >>  am I doing something wrong here ? please let me know.
> >>
> >>   thanks and regards
> >>
> >>   Rahul Thakare..
> >>
> >>
> >>On Tue, 19 Jul 2005 Ian Lea wrote :
> >>
> >>>Have you tried Erik's suggestion from last week?
> >>>http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200507.mbox/% 
> >>>[EMAIL PROTECTED]
> >>>
> >>>There is certainly some case confusion in your examples there.
> >>>Personally, I tend to just lowercase all text on indexing and
> >>>searching.
> >>>
> >>>--
> >>>Ian.
> >>>
> >>>On 19 Jul 2005 05:31:08 -, Rahul D Thakare
> >>><[EMAIL PROTECTED]> wrote:
> >>>
> 
> Hi,
> 
>    I am using Field.Keyword for indexing multi-word keyword (eg:  MAIN 
>  LOGIG). Also used keywordAnalyzer, but wild card search is  not coming 
>  up. Is there anything which I need to do in addition  or, wild card 
>  search is not possible with keyword field.
> 
> thanks and regards,
> 
> Rahul Thakare..
> 
> 
> >>>
> >>>-
> >>>To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>For additional commands, e-mail: [EMAIL PROTECTED]
> >>>
> >>
> >
> >
> >-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching for similar documents

2005-07-20 Thread Derek Westfall
I hope you will forgive the newbie question but do I have to add the
MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work?

I put the .class file in my \wwwroot\web-inf\classes folder and I am
getting an error I don't understand when trying to instantiate the
object from Cold Fusion. I also added the .class to a .jar and put it in
\lib to no avail. I don't know if this is a CF problem or a Java
problem.

CF Error: Object Instantiation Exception.  
An exception occurred when instantiating a java object. The cause of
this exception was that: MoreLikeThis (wrong name:
org/apache/lucene/search/similar/MoreLikeThis).  



index="\\www\lucene\myindex";
// get an IndexReader object to use in the constructor to the
searcher var
indexReader = CreateObject("java",
"org.apache.lucene.index.IndexReader");

// get an IndexSearcher object
searcher = CreateObject("java",
"org.apache.lucene.search.IndexSearcher");
searcher = searcher.init(indexReader.open(index));

// get an Analyzer object 
analyzer = CreateObject("java",
"org.apache.lucene.analysis.standard.StandardAnalyzer");
analyzer.init();

mlt = CreateObject("java", "MoreLikeThis"); // < this is the
line that causes the error
mlt=mlt.init(indexReader);

// [ I have also tried  mlt = CreateObject("java",
"org.apache.lucene.search.similar.MoreLikeThis");]


target = "test of the similarity feature";

query = mlt.like( target);
hits = CreateObject("java", "org.apache.lucene.search.Hits");
hits = searcher.search(query);
 



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 19, 2005 10:59 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for similar documents

On Jul 19, 2005, at 12:42 PM, Kadlabalu, Hareesh wrote:
> If someone could someone please extract a version of this file from 
> source control that corresponds to lucene 1.4.3 or if this can file 
> can be back-ported, it would be greatly helpful.

The old Jakarta Lucene Sandbox is still available via CVS:

 cvs -d:pserver:[EMAIL PROTECTED]:/home/cvspublic co jakarta-
lucene-sandbox

> 1.
> IndexReader.getFieldNames( IndexReader.FieldOption.INDEXED ) does not 
> compile on 1.4.3, replace with IndexReader.getIndexedFieldNames ( true

> )?

I think you want false, not true.  The boolean flag refers to term
vector data.

> 2.
> query.add(tq, BooleanClause.Occur.SHOULD) does not compile on 1.4.3, 
> is this the same as query.add( tq, true, true )?

No.  It's the same as add(tq, false, false)

> I have one small request, is it possible to make the archive of 
> 'Contribution' section that corresponds to Lucene
> 1.4.3 release available online?

At this point we're probably too far removed from it to accomplish that
cleanly.  MoreLikeThis may not have ever been 1.4.3 compatible - I don't
recall - it certainly wasn't added until well after 1.4.3 was released.
The CVS repository should be sufficient for folks to build it themselves
if necessary.

For most of the old Sandbox contributions, you can find binary releases
of those in the Lucene in Action code distribution at www.lucenebook.com

 Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New line

2005-07-20 Thread christopher may


When my text file is being searched it seems every line is  blending. So I 
need the index searcher to see a newline character or field separator in the 
text file. What can be used in the text file to separate my lines ?



From: Otis Gospodnetic <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: New line
Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT)

I may be misunderstanding you, but \n is the "newline" character.
http://www.google.com/search?q=newline%20character%20java

Otis


--- christopher may <[EMAIL PROTECTED]> wrote:

>
> I am using text files in my index. What can be used as the new line
> character ? Say I have
> A batch of apples  Apples . So the doc is returned as Apples
> and the
> summary is A batch of apples. If I want to then on the next line of
> the file
> put A state out west Arizona. This all blends together. What
> is my
> default line separator ? Or new line character. Thanks all
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: QueryParser handling of backslash characters

2005-07-20 Thread Erik Hatcher


On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:


Hi,

I'm seeing some strange behavior in the way the QueryParser handles
consecutive backslash characters.  I know that backslash is the escape
character in Lucene, and so I would expect "" to match fields that
have two consecutive backslashes, but this does not seem to be the
case.

The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public".
The only way I can get my query to find the record containing that
value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
Why is the third backslash character not treated as an escape?  Is it
just that any backslash that is preceded by a backslash is interpreted
as a literal backslash character, regardless of whether the "escape"
backslash was itself escaped?

I can code around this, but it seems inconsistent with the way that
escape characters usually work.  Is this a bug, or is it intentional,
or am I missing something?


I've waited until I had a chance to experiment with this before  
replying.  I say that this is a bug.  There is a private method in  
QueryParser called discardEscapeChar (shown below).  I copied it to a  
JUnit test case and gave it this assert:


assertEquals("192.168.0.15public", discardEscapeChar 
("192.168.0.15public"));


This test fails with:

Expected:192.168.0.15\\public
Actual  :\192.168.0.15\public

Which is wrong in my opinion.  (though my head hurts thinking about  
metaescaping backslashes in Java code to make this a proper test)


The bug is isolated to the discardEscapeChar() method where it eats  
too many backslashes.  Could you have a shot at tweaking that method  
to do the right thing and submit a patch?


  private String discardEscapeChar(String input) {
char[] caSource = input.toCharArray();
char[] caDest = new char[caSource.length];
int j = 0;
for (int i = 0; i < caSource.length; i++) {
  if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] == '\\')) {
caDest[j++]=caSource[i];
  }
}
return new String(caDest, 0, j);
  }

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for similar documents

2005-07-20 Thread Erik Hatcher

On Jul 20, 2005, at 1:47 PM, Derek Westfall wrote:

I hope you will forgive the newbie question but do I have to add the
MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work?

I put the .class file in my \wwwroot\web-inf\classes folder


If you put it in the right package directory under WEB-INF/classes  
then it should work (provided all the dependencies it has are in WEB- 
INF/lib, which may just be the Lucene JAR file).  The package is  
org.apache.lucene.search.similar, so it should go in WEB-INF/classes/ 
org/apache/lucene/search/similar.  I recommend you put this under  
your webapps WEB-INF/classes directory, not in a common directory to  
your container.



mlt = CreateObject("java", "MoreLikeThis"); // < this is the
line that causes the error


You should use org.apache.lucene.search.similar.MoreLikeThis

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Too many open files error using tomcat and lucene

2005-07-20 Thread Dan Pelton

We are getting the following error in our tomcat error log.
/dsk1/db/lucene/journals/_clr.f7 (Too many open files)
java.io.FileNotFoundException: /dsk1/db/lucene/journals/_clr.f7 (Too many open 
files)
at java.io.RandomAccessFile.open(Native Method)

We are using the following
lucene-1.3-final
SunOS thor 5.8 Generic_117350-21 sun4u sparc SUNW,Ultra-250
tomcat 4.1.34
Java 1.4.2


Does any one have any idea how to resolve this. Is it an OS, java or tomcat
problem.

thanks,
Dan P.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: QueryParser handling of backslash characters

2005-07-20 Thread Eyal
I think this should work:

(Written in C# originally - so someone please check if it compiles - I don't
have a java compiler here)

private String discardEscapeChar(String input) 
{
  char[] caSource = input.toCharArray();
  char[] caDest = new char[caSource.length];
  int j = 0;

  for (int i = 0; i < caSource.length; i++) 
  {
if (caSource[i] == '\\')
{
  if (caSource.length == ++i)
break;
}
caDest[j++]=caSource[i];
  }
  return new String(caDest, 0, j);
}
 

Regarding your UnitTest - It think it's wrong:

>  assertEquals("192.168.0.15public", 
> discardEscapeChar ("192.168.0.15public"));

It should be: assertEquals("192.168.0.15public", discardEscapeChar
("192.168.0.15public"));

I would also suggest to add the following:
String s="some.host.name\\dir+:+-!():^[]\{}~*?";
assertEquals(s,discardEscapeChar(escape(s)));

Eyal

> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, July 20, 2005 22:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: QueryParser handling of backslash characters
> 
> 
> On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:
> 
> > Hi,
> >
> > I'm seeing some strange behavior in the way the QueryParser handles 
> > consecutive backslash characters.  I know that backslash is 
> the escape 
> > character in Lucene, and so I would expect "" to match 
> fields that 
> > have two consecutive backslashes, but this does not seem to be the 
> > case.
> >
> > The fields I'm searching are UNC paths, e.g. 
> "\\192.168.0.15\public".
> > The only way I can get my query to find the record containing that 
> > value is to type "FieldName:\\\192.168.0.15\\public" (three 
> slashes).
> > Why is the third backslash character not treated as an 
> escape?  Is it 
> > just that any backslash that is preceded by a backslash is 
> interpreted 
> > as a literal backslash character, regardless of whether the "escape"
> > backslash was itself escaped?
> >
> > I can code around this, but it seems inconsistent with the way that 
> > escape characters usually work.  Is this a bug, or is it 
> intentional, 
> > or am I missing something?
> 
> I've waited until I had a chance to experiment with this 
> before replying.  I say that this is a bug.  There is a 
> private method in QueryParser called discardEscapeChar (shown 
> below).  I copied it to a JUnit test case and gave it this assert:
> 
>  assertEquals("192.168.0.15public", 
> discardEscapeChar ("192.168.0.15public"));
> 
> This test fails with:
> 
>  Expected:192.168.0.15\\public
>  Actual  :\192.168.0.15\public
> 
> Which is wrong in my opinion.  (though my head hurts thinking 
> about metaescaping backslashes in Java code to make this a 
> proper test)
> 
> The bug is isolated to the discardEscapeChar() method where 
> it eats too many backslashes.  Could you have a shot at 
> tweaking that method to do the right thing and submit a patch?
> 
>private String discardEscapeChar(String input) {
>  char[] caSource = input.toCharArray();
>  char[] caDest = new char[caSource.length];
>  int j = 0;
>  for (int i = 0; i < caSource.length; i++) {
>if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] 
> == '\\')) {
>  caDest[j++]=caSource[i];
>}
>  }
>  return new String(caDest, 0, j);
>}
> 
> Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BOOLEAN OPERATOR HOWTO

2005-07-20 Thread Erik Hatcher


On Jul 19, 2005, at 8:31 AM, Karthik N S wrote:
Given a Search word = 'erik OR hatcher AND otis OR gospodnetic' ,  
Is it

possible to RETURN COUNT
occurances for each of the word with in the Searched documents.


This would give me the Each word's Term Frequency.


How to achieve this


Wow - I really missed my guess on your question! :)

It is possible, but not directly (though you could spelunk the  
Explanation to get this information per document).  Do you want term  
frequency across an individual document or the entire index?


Erik




Thx in advance
karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, July 18, 2005 6:39 PM
To: java-user@lucene.apache.org
Subject: Re: BOOLEAN OPERATOR HOWTO



On Jul 18, 2005, at 8:12 AM, Karthik N S wrote:


I have 2 Questions.


But there were no question marks!   I don't understand your questions
at all, sorry, but I'll see if I can decipher it somewhat



1) The Search Criteria  src  requires to automatically fill   "  "
between  Search words with a Boolean Operator   "  AND ".



You mean to achieve AND'd clauses?   By default, OR is the operator,
and AND must be explicit.  You can construct a QueryParser instance
and set the default operator to AND, though, and then OR must be
explicit.



 2) The Search Criteria  src  requires to automatically recognise
the existing  Boolean Query  ' AND , + '  present and append the same
 with  out any manupulations.

Ex : -
Search Word  =

'Lucene in Action Erik hatcher and  Otis  + Gospodnetic '   =
lucene AND action AND Eric  AND hatcher AND otis + gospodnetic .


How to Achieve this , Is there any mechanism built into Lucene to
handle such situations.



Yes, this sounds like the default operator is what you're looking
for.  Since you use "Lucene in Action" as an example, flip to page 94
for more discussion on this, and then flip to the other pages
mentioned here:

 http://www.lucenebook.com/search?query=default+operator

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New line

2005-07-20 Thread [EMAIL PROTECTED]

Chris,

If I understand your question correctly, you are saying why is the the 
search output of lucene not returning the two lines as distinct two lines?


If you are returning the lucene search output to the web and want the 
new line \n to be dispalyed as such, you need to replace the character 
with [br] tags.


To lucene, the new line is likely used as part of the tokenizer to 
distinguish words/tokens for the index but it does not do anything 
special with it is stored or displayed. However, depending on your 
lucene client/app, you might need to tweak the client output to display 
the 2 lines separately.


I think that is your question.

Xing



christopher may wrote:


When my text file is being searched it seems every line is  blending. So 
I need the index searcher to see a newline character or field separator 
in the text file. What can be used in the text file to separate my lines ?



From: Otis Gospodnetic <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: New line
Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT)

I may be misunderstanding you, but \n is the "newline" character.
http://www.google.com/search?q=newline%20character%20java

Otis


--- christopher may <[EMAIL PROTECTED]> wrote:

>
> I am using text files in my index. What can be used as the new line
> character ? Say I have
> A batch of apples  Apples . So the doc is returned as Apples
> and the
> summary is A batch of apples. If I want to then on the next line of
> the file
> put A state out west Arizona. This all blends together. What
> is my
> default line separator ? Or new line character. Thanks all
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using QueryParser with a single field

2005-07-20 Thread Erik Hatcher


On Jul 19, 2005, at 8:10 AM, Eyal wrote:


Hi,

In my client application I allow the user to build a query by  
selecting a

field from a combobox and entering a value to search by.
I want the user to enter free text queries for each field, but I  
don't want
to parse it myself so I thought I'd use QueryParser for that. My  
problem is
that if the user will (for example) select a field called author  
and enter

the following text: 'John content:MyContent'
QueryParser will build a query for author:John OR  
content:MyContent. I want

QueryParser to ignore other fields.
Any method in QueryParser to allow that? If not - any other  
suggestions?


There is no such switch in QueryParser to disable fielded queries.  A  
custom QueryParser would be needed to make this happen.


If you only need TermQuery and PhraseQuery you could do without  
QueryParser altogether in this situation and process (not quite  
"parse") the text fields by building up the appropriate query.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Too many open files error using tomcat and lucene

2005-07-20 Thread Daniel Naber
On Wednesday 20 July 2005 22:49, Dan Pelton wrote:

> We are getting the following error in our tomcat error log.
> /dsk1/db/lucene/journals/_clr.f7 (Too many open files)
> java.io.FileNotFoundException: /dsk1/db/lucene/journals/_clr.f7 (Too
> many open files)

See 
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Too many open files error using tomcat and lucene

2005-07-20 Thread jian chen
Hi, Dan,

I think the problem you mentioned is the one that has been discussed
lot of times in this mailing list.

Bottomline is that you'd better use the compound file format to store
indexes. I am not sure Lucene 1.3 has that available, but, if
possible, can you upgrade to lucene 1.4.3?

Cheers,

Jian

On 7/20/05, Dan Pelton <[EMAIL PROTECTED]> wrote:
> We are getting the following error in our tomcat error log.
> /dsk1/db/lucene/journals/_clr.f7 (Too many open files)
> java.io.FileNotFoundException: /dsk1/db/lucene/journals/_clr.f7 (Too many 
> open files)
>  at java.io.RandomAccessFile.open(Native Method)
> 
> We are using the following
> lucene-1.3-final
> SunOS thor 5.8 Generic_117350-21 sun4u sparc SUNW,Ultra-250
> tomcat 4.1.34
> Java 1.4.2
> 
> 
> Does any one have any idea how to resolve this. Is it an OS, java or tomcat
> problem.
> 
> thanks,
> Dan P.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New line

2005-07-20 Thread Otis Gospodnetic
How you tokenize your input is up to you.  It sounds like you want a
custom Analyzer that has a tokenizer that knows about newline
characters and does whatever you need it to do when a newline character
is encountered (e.g. stop reading or whatever).  The search part of
Lucene has no notion of newline characters and such.  It only knows
about documents and words/tokens in them.

Otis


--- christopher may <[EMAIL PROTECTED]> wrote:

> 
> When my text file is being searched it seems every line is  blending.
> So I 
> need the index searcher to see a newline character or field separator
> in the 
> text file. What can be used in the text file to separate my lines ?
> 
> >From: Otis Gospodnetic <[EMAIL PROTECTED]>
> >Reply-To: java-user@lucene.apache.org
> >To: java-user@lucene.apache.org
> >Subject: Re: New line
> >Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT)
> >
> >I may be misunderstanding you, but \n is the "newline" character.
> >http://www.google.com/search?q=newline%20character%20java
> >
> >Otis
> >
> >
> >--- christopher may <[EMAIL PROTECTED]> wrote:
> >
> > >
> > > I am using text files in my index. What can be used as the new
> line
> > > character ? Say I have
> > > A batch of apples  Apples . So the doc is returned as
> Apples
> > > and the
> > > summary is A batch of apples. If I want to then on the next line
> of
> > > the file
> > > put A state out west Arizona. This all blends together.
> What
> > > is my
> > > default line separator ? Or new line character. Thanks all
> > >
> > >
> > >
> > >
> -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
>-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: QueryParser handling of backslash characters

2005-07-20 Thread Jeff Davis
That fix works perfectly, as far as I can tell.

As for the unit test, it should actually be:
assertEquals("192.168.0.15\\public", discardEscapeChar
("192.168.0.15public"));

Jeff


On 7/20/05, Eyal <[EMAIL PROTECTED]> wrote:
> I think this should work:
> 
> (Written in C# originally - so someone please check if it compiles - I don't
> have a java compiler here)
> 
> private String discardEscapeChar(String input)
> {
>   char[] caSource = input.toCharArray();
>   char[] caDest = new char[caSource.length];
>   int j = 0;
> 
>   for (int i = 0; i < caSource.length; i++)
>   {
> if (caSource[i] == '\\')
> {
>   if (caSource.length == ++i)
> break;
> }
> caDest[j++]=caSource[i];
>   }
>   return new String(caDest, 0, j);
> }
> 
> 
> Regarding your UnitTest - It think it's wrong:
> 
> >  assertEquals("192.168.0.15public",
> > discardEscapeChar ("192.168.0.15public"));
> 
> It should be: assertEquals("192.168.0.15public", discardEscapeChar
> ("192.168.0.15public"));
> 
> I would also suggest to add the following:
> String s="some.host.name\\dir+:+-!():^[]\{}~*?";
> assertEquals(s,discardEscapeChar(escape(s)));
> 
> Eyal
> 
> > -Original Message-
> > From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 20, 2005 22:38 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: QueryParser handling of backslash characters
> >
> >
> > On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:
> >
> > > Hi,
> > >
> > > I'm seeing some strange behavior in the way the QueryParser handles
> > > consecutive backslash characters.  I know that backslash is
> > the escape
> > > character in Lucene, and so I would expect "" to match
> > fields that
> > > have two consecutive backslashes, but this does not seem to be the
> > > case.
> > >
> > > The fields I'm searching are UNC paths, e.g.
> > "\\192.168.0.15\public".
> > > The only way I can get my query to find the record containing that
> > > value is to type "FieldName:\\\192.168.0.15\\public" (three
> > slashes).
> > > Why is the third backslash character not treated as an
> > escape?  Is it
> > > just that any backslash that is preceded by a backslash is
> > interpreted
> > > as a literal backslash character, regardless of whether the "escape"
> > > backslash was itself escaped?
> > >
> > > I can code around this, but it seems inconsistent with the way that
> > > escape characters usually work.  Is this a bug, or is it
> > intentional,
> > > or am I missing something?
> >
> > I've waited until I had a chance to experiment with this
> > before replying.  I say that this is a bug.  There is a
> > private method in QueryParser called discardEscapeChar (shown
> > below).  I copied it to a JUnit test case and gave it this assert:
> >
> >  assertEquals("192.168.0.15public",
> > discardEscapeChar ("192.168.0.15public"));
> >
> > This test fails with:
> >
> >  Expected:192.168.0.15\\public
> >  Actual  :\192.168.0.15\public
> >
> > Which is wrong in my opinion.  (though my head hurts thinking
> > about metaescaping backslashes in Java code to make this a
> > proper test)
> >
> > The bug is isolated to the discardEscapeChar() method where
> > it eats too many backslashes.  Could you have a shot at
> > tweaking that method to do the right thing and submit a patch?
> >
> >private String discardEscapeChar(String input) {
> >  char[] caSource = input.toCharArray();
> >  char[] caDest = new char[caSource.length];
> >  int j = 0;
> >  for (int i = 0; i < caSource.length; i++) {
> >if ((caSource[i] != '\\') || (i > 0 && caSource[i-1]
> > == '\\')) {
> >  caDest[j++]=caSource[i];
> >}
> >  }
> >  return new String(caDest, 0, j);
> >}
> >
> > Erik
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



StackOverflowError when index pdf files

2005-07-20 Thread Gayo . Diallo

Hi,
I've the error "java.lang.StackOverflowError" when I try to index text files
that I got from transforming pdf files through pdfbox API.
When I index normal text repository, I havn't this error.
may some one help me ?

Thanks,

Gayo

-
envoyé via Webmail/IMAG !


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: StackOverflowError when index pdf files

2005-07-20 Thread Otis Gospodnetic
It sounds like the problem may stem from your PDF parser

Otis

--- [EMAIL PROTECTED] wrote:

> 
> Hi,
> I've the error "java.lang.StackOverflowError" when I try to index
> text files
> that I got from transforming pdf files through pdfbox API.
> When I index normal text repository, I havn't this error.
> may some one help me ?
> 
> Thanks,
> 
> Gayo
> 
> -
> envoyé via Webmail/IMAG !
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching for similar documents

2005-07-20 Thread Derek Westfall
Okay, I figured out how to use JAR, extracted all the files from
lucene-1.4.3.jar, added the MoreLikeThis classes in the appropriate
folder, recreated and replaced the JAR. Since Lucene is my first
exposure to Java I am pretty proud of myself at this point.

The only thing that still wasn't working was the setFieldNames function,
so I just set it to NULL in the .java code, recompiled and recreated the
.jar and now it is working! And doing a good job, too!

Thanks!

Derek



 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 20, 2005 1:31 PM
To: java-user@lucene.apache.org
Subject: Re: Searching for similar documents

On Jul 20, 2005, at 1:47 PM, Derek Westfall wrote:
> I hope you will forgive the newbie question but do I have to add the 
> MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work?
>
> I put the .class file in my \wwwroot\web-inf\classes folder

If you put it in the right package directory under WEB-INF/classes then
it should work (provided all the dependencies it has are in WEB-
INF/lib, which may just be the Lucene JAR file).  The package is
org.apache.lucene.search.similar, so it should go in WEB-INF/classes/
org/apache/lucene/search/similar.  I recommend you put this under your
webapps WEB-INF/classes directory, not in a common directory to your
container.

> mlt = CreateObject("java", "MoreLikeThis"); // < this is the 
> line that causes the error

You should use org.apache.lucene.search.similar.MoreLikeThis

 Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for similar documents

2005-07-20 Thread Erik Hatcher
You'll want to re-think that re-JARing approach for the long term, as  
you'll want to upgrade Lucene at some point I suspect.   But congrats  
on hacking it!


Erik


On Jul 20, 2005, at 5:44 PM, Derek Westfall wrote:


Okay, I figured out how to use JAR, extracted all the files from
lucene-1.4.3.jar, added the MoreLikeThis classes in the appropriate
folder, recreated and replaced the JAR. Since Lucene is my first
exposure to Java I am pretty proud of myself at this point.

The only thing that still wasn't working was the setFieldNames  
function,
so I just set it to NULL in the .java code, recompiled and  
recreated the

.jar and now it is working! And doing a good job, too!

Thanks!

Derek





-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 20, 2005 1:31 PM
To: java-user@lucene.apache.org
Subject: Re: Searching for similar documents

On Jul 20, 2005, at 1:47 PM, Derek Westfall wrote:


I hope you will forgive the newbie question but do I have to add the
MoreLikeThis.class file to the Lucene-1.4.3.JAR for it to work?

I put the .class file in my \wwwroot\web-inf\classes folder



If you put it in the right package directory under WEB-INF/classes  
then

it should work (provided all the dependencies it has are in WEB-
INF/lib, which may just be the Lucene JAR file).  The package is
org.apache.lucene.search.similar, so it should go in WEB-INF/classes/
org/apache/lucene/search/similar.  I recommend you put this under your
webapps WEB-INF/classes directory, not in a common directory to your
container.



mlt = CreateObject("java", "MoreLikeThis"); // < this is the
line that causes the error



You should use org.apache.lucene.search.similar.MoreLikeThis

 Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: StackOverflowError when index pdf files

2005-07-20 Thread Ben Litchfield

Yes, this sounds like an issue with PDFBox, can you determine if it is a
single PDF document and post an issue on the PDFBox sourceforge site.

Thanks,
Ben Litchfield


On Wed, 20 Jul 2005, Otis Gospodnetic wrote:

> It sounds like the problem may stem from your PDF parser
>
> Otis
>
> --- [EMAIL PROTECTED] wrote:
>
> >
> > Hi,
> > I've the error "java.lang.StackOverflowError" when I try to index
> > text files
> > that I got from transforming pdf files through pdfbox API.
> > When I index normal text repository, I havn't this error.
> > may some one help me ?
> >
> > Thanks,
> >
> > Gayo
> >
> > -
> > envoyé via Webmail/IMAG !
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching for similar documents

2005-07-20 Thread Derek Westfall
Your solution below is undoubtedly my problem. I didn't even consider
the need to create all those directory levels. I'm sure that will solve
it! 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 20, 2005 1:31 PM
To: java-user@lucene.apache.org
Subject: Re: Searching for similar documents

If you put it in the right package directory under WEB-INF/classes then
it should work (provided all the dependencies it has are in WEB-
INF/lib, which may just be the Lucene JAR file).  The package is
org.apache.lucene.search.similar, so it should go in WEB-INF/classes/
org/apache/lucene/search/similar.  I recommend you put this under your
webapps WEB-INF/classes directory, not in a common directory to your
container.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]