Re: parsing Java log file with Lucene 3.0.3

2011-01-04 Thread Benzion G
OK, I succeeded to write an Analyzer I need. I can't say that I understood all Lucene Analyzer-Tokenizer-Filter logic, but here's attached MyAnalyzer. Hope it will help somebody else. import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.CharTokeni

Re: parsing Java log file with Lucene 3.0.3

2011-01-04 Thread Erick Erickson
Lucene In Action has an example of creating a synonymanalyzer that you can adapt. The general idea is to subclass from Analyzer and implement the required functions, perhaps wrapping a Tokenizer in a bunch of Filters. You might be able to crib some ideas from solr.analysis.WordDelimiterFilter Best

Re: parsing Java log file with Lucene 3.0.3

2011-01-04 Thread Benzion G
Problem with SimpleAnalyzer! It ignores digits. For text "customer 123 found" it will take only "customer" and "found", but will ignore "123". StandardAnalyzer handles OK the digits but has the dots problem, I mentioned before. Is there an understandable guide how to write my own Analyzer - a h

Re: parsing Java log file with Lucene 3.0.3

2011-01-03 Thread Benzion G
Thank you guys! Looks like SimpleAnalyzer is OK for my application. I'm still testing but meanwhile it looks good. -- View this message in context: http://lucene.472066.n3.nabble.com/parsing-Java-log-file-with-Lucene-3-0-3-tp2173046p2190354.html Sent from the Lucene - Java Users mailing list ar

Re: parsing Java log file with Lucene 3.0.3

2011-01-02 Thread Erick Erickson
Some days I just can't read... First question: Why do you require standard analyzer?Are you really making use of the special processing? Take a look at other analyzer options. PatternAnalyzer, SimpleAnalyzer, etc. If you really require StandardAnalyzer, consider using two fields. field_original a

Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Benzion G
Of course I want to store and then show to user the original message. That's why I can't change it and the place to handle the dots is the Analyzer area. So how can I make the StandardAnalyzer to handle dots as commas? -- View this message in context: http://lucene.472066.n3.nabble.com/parsing-

Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Erick Erickson
<<>> No, that is not the case. Storing a field stores an exact copy of the input, without any analysis. The intent of storing a field is to return something to display in the results list that reflects the original document. What use would it be to store something that had gone through the analysi

Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Benzion G
I'm testing it with ~50M log files. But in production env the log files will be ~10G. -- View this message in context: http://lucene.472066.n3.nabble.com/parsing-Java-log-file-with-Lucene-3-0-3-tp2173046p2177477.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Benzion G
I tried to understand where the StandardAnalyzer and other Standard* classes are handling these dots and commas and how can I change its behaviour. I debugged it as well, but I failed to understand it. -- View this message in context: http://lucene.472066.n3.nabble.com/parsing-Java-log-file-wit

Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Hasan Diwan
On 1 January 2011 21:47, Benzion G wrote: > But I'm afraid it will make my index files much bigger. Since I'm indexing > log files the index will be anyway too big so I can't make it even bigger. Have you tried it out? How large are your log files and how large do you expect them to get? -- Sent

Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Benzion G
Hi, Of course I thought about replacing dots by commas or blanks. But I add this field as Filed.Store.YES. If I'll replace dot with commas it will appear with commas in search results. I also considered adding it as 2 fields: 1. With dots replaced by commas for index and Filed.Store.NO 2. The

Re: parsing Java log file with Lucene 3.0.3

2010-12-31 Thread Erick Erickson
Have you looked at: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Best Erick On Fri, Dec 31, 2010 at 6:12 AM, Benzion G wrote: > Hi, > > I need to parse the Java log files with Lucene 3.0.3. The StandardAnalyzer > is > OK, except it's handling of dots. > > E.g. it handles "java.la

Re: parsing Java log file with Lucene 3.0.3

2010-12-31 Thread Hasan Diwan
On 31 December 2010 11:12, Benzion G wrote: > I need to parse the Java log files with Lucene 3.0.3. The StandardAnalyzer is > OK, except it's handling of dots. > > E.g. it handles "java.lang.NullPointerException" as one word andĀ searching for > "NullPointerException" will bring nothing. > > IĀ need