Re: how to implement a TokenFilter?

2012-12-23 Thread Lance Norskog
You need to use an IDE. Find the Attribute type and show all subclasses. This shows a lot of rare ones and a few which are used a lot. Now, look at source code for various TokenFilters and search for other uses of the Attributes you find. This generally is how I figured it out. Also, after the

Re: how to implement a TokenFilter?

2012-12-23 Thread Xi Shen
thanks a lot :) On Mon, Dec 24, 2012 at 10:22 AM, feng lu wrote: > hi Shen > > May be you can see some source code in org.apache.lucene.analysis package, > such LowerCaseFilter.java,StopFilter.java and so on. > > and some common attribute includes: > > offsetAtt = addAttribute(OffsetAttribute.c

Re: how to implement a TokenFilter?

2012-12-23 Thread feng lu
hi Shen May be you can see some source code in org.apache.lucene.analysis package, such LowerCaseFilter.java,StopFilter.java and so on. and some common attribute includes: offsetAtt = addAttribute(OffsetAttribute.class); termAtt = addAttribute(CharTermAttribute.class); typeAtt = addAttribute(Typ

Re: WordDelimiterFilter Question (lucene 4.0)

2012-12-23 Thread Jeremy Long
Have you ever wished you could retract your question to a mailing list? And for anyone that read my question - yes, I do know the difference between a bitwise "and" and a bitwise "or" and how they should be used when combining flags... Sorry for the spam. --Jeremy On Sun, Dec 23, 2012 at 11:56 AM

Re: Lucene 4.0 scalability and performance.

2012-12-23 Thread Steve Rowe
Hi Vitaly, Anything by Tom Burton-West should interest you - he works on the HathiTrust digital library project , which currently indexes 7TB of full-length books, e.g.: "Practical Relevance Ranking for 10 Million Books" (paper) INEX 2012, September 2012, Rome, Italy

WordDelimiterFilter Question (lucene 4.0)

2012-12-23 Thread Jeremy Long
Hello, I'm having an issue creating a custom analyzer utilizing the WordDelimiterFilter. I'm attempting to create an index of information gleaned from JAR manifest files. So if I have "spring-framework" I need the following tokens indexed: "spring" "springframework" "framework" "spring-framework".

Lucene 4.0 scalability and performance.

2012-12-23 Thread Vitaly_Artemov
Hi all, We start to evaluate Lucene 4.0 for using in the production environment. This means that we need to index millions of document with TeraBytes of content and search in it. For now we want to define only one indexed field, contained the content of the documents, with possibility to search t

Re: how to implement a TokenFilter?

2012-12-23 Thread Rafał Kuć
Hello! The simplest way is to look at Lucene javadoc and see what implementations of Attribute interface there are - http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > thanks, i read