RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Uwe Schindler
> we already did this in 3.1 by making a base FilteringTokenFilter class? > a regex filter is trivial if you subclass this (we could add something like > this > untested code to the .pattern package or whatever) > > public class PatternRemoveFilter extends FilteringTokenFilter { > private final

RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Elmo Bleek
Sure, I'd be will to do that. I'll get create an issue and then get working on code and tests. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-omit-ShingleFilter-s-filler-tokens-tp2926009p2933250.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

RE: Sharding Techniques

2011-05-12 Thread Burton-West, Tom
Hi Samar, Have you looked at top or iostat or other monitoring utilities to see if you are cpu bound vs I/O bound? With 225 term queries, it's possible that you are I/O bound. I suspect you need to think about seek time and caching. For each unique field:term combination lucene has to look up

RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Steven A Rowe
Cool! I had forgotten about FilteringTokenFilter. Elmo, would you care to make a JIRA issue and a patch (based on Robert's code, and adding some tests) to create this? If so, this may be useful: http://wiki.apache.org/lucene-java/HowToContribute Steve > -Original Message- > F

Re: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Robert Muir
On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote: > A thought: one way to do #1 without modifying ShingleFilter: if there were a > StopFilter variant that accepted regular expressions instead of a stopword > list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a > full m

RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Steven A Rowe
A thought: one way to do #1 without modifying ShingleFilter: if there were a StopFilter variant that accepted regular expressions instead of a stopword list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a full match is required, i.e. implicit beginning and end anchors), and

Re: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Elmo Bleek
I have found that simply having StopFilter before ShingleFilter does the trick for #2. However, I have also been working on trying to accomplish #1, don't create shingles across stop words. I am currently under the impression that this will take modifying ShingleFilter. Does anyone have any suggest

Re: PDF Highlighting using PDF Highlight File

2011-05-12 Thread Dawn Zoƫ Raison
On 12/05/2011 15:47, Wulf Berschin wrote: I think support for highlighting documents would be a very welcome feature. Highlighting HTML documents is already possible with the org.apache.solr.analysis.HTMLStripCharFilter and a NullFragmenter, but ther seems to be nothing for highlighting PDF fi

Re: PDF Highlighting using PDF Highlight File

2011-05-12 Thread Wulf Berschin
Well, AFAIS the Lucene Highlighters do not offer this functionality via their API, but could easily do. I think support for highlighting documents would be a very welcome feature. Highlighting HTML documents is already possible with the org.apache.solr.analysis.HTMLStripCharFilter and a NullFr

Re: Help needed on Ant build script for creating Lucene index

2011-05-12 Thread Erik Hatcher
There's an example build file, see It's pretty outdated stuff there though. It has some flexibility for a custom document handler in order to allow full control over how a File gets turned into a Lucene Document

Re: Bug in BrazilianAnalyzer?

2011-05-12 Thread paulocsc
Thanks. Paulo On Qua 11/05/11 11:18 , Adriano Crestani adrianocrest...@gmail.com sent: Hi, I think you forgot to attach the JUnit. On Wed, May 11, 2011 at 10:04 AM, wrote: > Hi, > I did a test to understand the use of '*'and '?'. > If I use StandardAnalyzer I have expected results by i