Re: Lucene V8 Support

2022-09-15 Thread Mike Drob
Hi Fergal, You should not expect much support on version 8 going forward. It will probably get critical security releases and not much else. Mike On Thu, Sep 15, 2022 at 8:31 AM Fergal Gavin wrote: > Hi there, > > We are a user of the Lucene core library in our product. > > W

Re: Using Lucene 8.5.1 vs 8.5.2

2022-07-26 Thread Mike Drob
those. Mike Drob On Tue, Jul 26, 2022 at 1:03 PM Baris Kazar wrote: > Dear Folks,- > May I please ask if using 8.5.1 is ok wrt 8.5.2? > The only change was the following where fuzzy query was fixed for a major > bug (?). > How much does this affect the fuzzy query performanc

Re: Fuzzy Query Similarity

2022-07-12 Thread Mike Drob
On Mon, Jul 11, 2022 at 3:36 PM Mike Drob wrote: > Hi Uwe, thanks for all the pointers! > > I tried using BooleanSimilarity and the resulting scores were even more > divergent! 1.0 for the exact match vs 1.55 (= 0.8 + 0.75) for the multiple > terms that were close. Which m

Re: Fuzzy Query Similarity

2022-07-11 Thread Mike Drob
Hi Uwe, thanks for all the pointers! I tried using BooleanSimilarity and the resulting scores were even more divergent! 1.0 for the exact match vs 1.55 (= 0.8 + 0.75) for the multiple terms that were close. Which makes sense with ignoring TF but still doesn't help me down-boost the other terms.

Fuzzy Query Similarity

2022-07-08 Thread Mike Drob
g for me again? I don't see an option to tweak the internal boost provided by FuzzyQuery, that's one idea I had. Or is this a different change that needs to be fixed at the lucene level rather than application level? Thanks, Mike More detail: The first document with the field "

[ANNOUNCE] Apache Lucene 8.11.2 released

2022-06-21 Thread Mike Drob
The Lucene PMC is pleased to announce the release of Apache Lucene 8.11.2. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Th

FacetsCollector ScoreMode

2022-03-21 Thread Mike Drob
without scoring. I've tested it locally and it seems to work, but I'm wondering what nuance I am missing. The default behaviour is keepScores == false, so I feel like we should be able to adjust the score mode used based on that. Thanks, Mike

[ANNOUNCE] Apache Lucene 8.8.2 released

2021-04-12 Thread Mike Drob
The Lucene PMC is pleased to announce the release of Apache Lucene 8.8.2. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This

Re: [VOTE] Lucene logo contest

2020-06-16 Thread Mike Drob
C. The current Lucene logo Committer, not PMC On Tue, Jun 16, 2020 at 9:31 AM Gus Heck wrote: > From the comments, I sense some confusion, (or perhaps I was confused)... > at least as I read the vote mail, there are 3 options and 4 links, the > first link doesn't appear to be presented as an op

[ANNOUNCE] Apache Lucene 8.5.2 released

2020-05-26 Thread Mike Drob
26 May 2020, Apache Lucene™ 8.5.2 available The Lucene PMC is pleased to announce the release of Apache Lucene 8.5.2. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-t

Lucene API to retrieve matched words

2018-09-05 Thread Mike Grishaber
s 'skier' and 'skiis'. Is there an API call that provides this? Thanks Mike

Thanks to Vincenzo D'Amore

2018-05-12 Thread Mike Lynott
older doesn't have a message either. Thanks very much Vincenzo. Grazie! This will help a great deal. Mike

Sample code?

2018-05-02 Thread Mike Lynott
The sample code for SynonymGraphFilterFactory is written (I assume) in Solr. Could someone provide a Java translation? Thanks. This is what I see: Mike L

Re: Query in a doc context

2017-12-15 Thread Mike Dinescu (DNQ)
Got it. I misunderstood the question (actually I'm still not convinced I fully understand what you're looking for). It might be good to give an example in case others on the mailing list are confused. *Mike* On Thu, Dec 14, 2017 at 8:54 AM, Vadim Gindin wrote: > Mike, > >

Re: Query in a doc context

2017-12-14 Thread Mike Dinescu (DNQ)
fields matching of one document > and > > > analyze it. Particularly - to count whether all query terms are matched > > (to > > > one field or to different fields). I need to be able to fetch > > corresponding > > > information: what terms are matched to what fields and

Re: Replacement for RangeAccumulator

2017-12-06 Thread Mike Dinescu (DNQ)
Ping - Still trying to understand the migration path for using the RangeAccumulator. I would really appreciate it if anybody with insights could comment on what is the replacement of RangeAccumulator in Lucene after 4.7.0 *​Thanks,MIke* On Sat, Dec 2, 2017 at 12:25 AM, Mike Dinescu (DNQ

Replacement for RangeAccumulator

2017-12-02 Thread Mike Dinescu (DNQ)
e that used RangeAccumulator for date range faceted searches.* Please feel free to respond here, or on the S/O question. *​thanks,Mike*

Lucene config issue cannot run demo

2017-11-10 Thread Mike Lynott
or load main class org.apache.lucene.demo.IndexFiles Suggestions? Mike Lynott Sent from Mail for Windows 10

RE: run in eclipse error

2017-10-17 Thread Mike Sokolov
Checkstyle has a onetoplevelclass rule that would enforce this On October 17, 2017 3:45:01 AM EDT, Uwe Schindler wrote: >Hi, > >this has nothing to do with the Java version. I generally ignore this >Eclipse-failure as I only develop in Eclipse, but run from command >line. The reason for this beha

Re: FunctionValues vs DoubleValuesSource

2017-10-13 Thread Mike Sokolov
ons. > >Alan Woodward >www.flax.co.uk > > >> On 12 Oct 2017, at 23:25, Michael McCandless > wrote: >> >> Hi Mike, >> >> It looks like FunctionValues is a very old API used by many function >> queries, while DoubleValuesSource is relatively ne

Re: Accent insensitive search for greek characters

2017-09-27 Thread Mike Sokolov
These are only used in classical Greek I think, explaining probably why they are not covered by the simpler filter. On September 27, 2017 9:48:37 AM EDT, Ahmet Arslan wrote: >I may be wrong about ASCIIFoldingFilter. Please go with the >ICUFoldingFilter. >Ahmet >On Wednesday, September 27, 2017,

Re: Spatial Search with Nested Polygons

2015-03-26 Thread Mike Hansen
ction > with the order from biggest to smallest, you can know the index of which > shape matches, and then pull the i-th numeric value you need from a list of > numbers in BinaryDocValues. The largest shape could be kept out of here > since you don’t need it.

Spatial Search with Nested Polygons

2015-03-26 Thread Mike Hansen
largest polygon contains the point (that part is easy). Additionally, I'd like to have access to the numerical value of the smallest polygon which contains the point (something like makeDistanceValueSource). Thanks, --Mike ---

Re: An incorrect sentence in Javadoc at o.a.l.queryparser.surround.parser?

2014-12-04 Thread Mike Drob
I believe this is already filed as https://issues.apache.org/jira/browse/SOLR-4572 Getting the wiki page fixed would be great as well, though! On Wed, Dec 3, 2014 at 7:44 PM, Shinichiro Abe wrote: > Hi, > > That Javadoc says "N is ordered, and W is unordered." > > https://github.com/apache/luce

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-26 Thread Mike O'Leary
quot;, new KeywordAnalyzer()); ... ... return new PerFieldAnalyzerWrapper(new CustomAnalyzer(), analyzerMap) ; } } Which is much simpler than all of the things I was thinking I would need to do. Thanks very much, Mike -Original Message- From: Chris Male [mailto:gento...@gmail.

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-25 Thread Mike O'Leary
now how to define it. This is my modified analyzer code with ??? in the places I don't know how to define. Thanks, Mike public class MyPerFieldAnalyzer extends AnalyzerWrapper { Map _analyzerMap = new HashMap(); Analyzer _defaultAnalyzer; public MyPerFieldAnalyzer() { _analyzerMa

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-25 Thread Mike O'Leary
eam stream = _analyzer.tokenStream(fieldname, reader); return new TokenStreamComponents(source, stream); } } I must be missing something obvious. Can you tell me what it is? Thanks, Mike -Original Message- From: Chris Male [mailto:gento...@gmail.com] Sent: Tuesday, September 25, 2012 5:18 PM To:

Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-25 Thread Mike O'Leary
the fieldname parameter. I would appreciate any help in finding out the best way to update this analyzer and to write the required function(s). Thanks, Mike

Uses for IndexWriter.commit(commitUserData)/IndexCommit.getUserData()

2012-09-21 Thread Mike O'Leary
useful though, and I would like to ask if anyone could describe use cases where it works well to save data in a commitUserData map while indexing for later use in the application. Thanks, Mike

RE: Problem with TermVector offsets and positions not being preserved

2012-08-24 Thread Mike O'Leary
So for Lucene 3.6, is the right way to do this to create a new Document and add new Fields based on the old Fields (with the settings you want them to have for term vector offsets and positions, etc.) and then call updateDocument on that new Document? Thanks, Mike -Original Message

RE: Problem with TermVector offsets and positions not being preserved

2012-08-22 Thread Mike O'Leary
term vectors in the affected fields? Is there a way to add a field to the documents in an index in which this doesn't occur? Thanks, Mike -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, July 20, 2012 5:59 PM To: java-user@lucene.apache.org Subje

Supporting advanced search methods in a user interface

2012-08-16 Thread Mike O'Leary
I would like to know if anyone has ideas (or pointers to discussions) about good ways to support advanced search options, such as the various kinds of SpanQuery, in a search application user interface that is understandable to non-expert users. Thanks, Mike

Re: Small Vocabulary

2012-08-06 Thread Mike Sokolov
clude only a single token. -Mike On 07/30/2012 09:07 AM, Carsten Schnober wrote: Dear list, I'm considering to use Lucene for indexing sequences of part-of-speech (POS) tags instead of words; for those who don't know, POS tags are linguistically motivated labels that are assigned to to

RE: Problem with TermVector offsets and positions not being preserved

2012-07-26 Thread Mike O'Leary
Hi Robert, Thanks for your help. This cleared up all of the things I was having trouble understanding about offsets and positions in term vectors. Mike -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, July 20, 2012 5:59 PM To: java-user@lucene.apache.org

RE: Problem with TermVector offsets and positions not being preserved

2012-07-20 Thread Mike O'Leary
I need to produce indexes that a co-worker can use with a UI that uses fast vector term highlighting, and I'd like to be sure I have created indexes that work for him. Thanks, Mike -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, July 20, 2012 4:05 PM T

RE: Problem with TermVector offsets and positions not being preserved

2012-07-20 Thread Mike O'Leary
I neglected to mention that CreateTestIndex uses a collection of data files with .properties extensions that are included in the Lucene In Action source code download. Mike -Original Message- From: Mike O'Leary [mailto:tmole...@uw.edu] Sent: Friday, July 20, 2012 2:10 PM To: java

RE: Problem with TermVector offsets and positions not being preserved

2012-07-20 Thread Mike O'Leary
) and field.isStorePositionWithTermVector() are true. When I run DumpIndex on the index that was created, those fields print out true for field.isTermVectorStored() and false for the other two functions. Thanks, Mike This is the source code for CreateText

Problem with TermVector offsets and positions not being preserved

2012-07-19 Thread Mike O'Leary
being saved in some older indexes I created that I unfortunately no longer have around for comparison. I'm sure that I am just overlooking something or have made some kind of mistake, but I can't see what it is at the moment. Thanks for any help or advice you can give me. Mike

Re: find meaningful words through Lucene

2012-06-27 Thread Mike Sokolov
Maybe high frequency terms that are not evenly distributed throughout the corpus would be a better definition. Discriminative terms. I'm sure there is something in the machine learning literature about unsupervised clustering that would help here. But I don't know what it is :)

Re: Fast way to get the start of document

2012-06-25 Thread Mike Sokolov
event blowups for very large docs (hl.phraseLimit; see LUCENE-3234) -Mike On 06/25/2012 01:03 PM, Paul Hill wrote: Mike and Jack, Thanks for the suggestions. As Mike suggested, I already have the pre-stored length field. I DO NOT read in the whole doc just to make the decision on "too huge&q

Re: Fast way to get the start of document

2012-06-23 Thread Mike Sokolov
e the decision about whether to highlight. -Mike Sokolov On 6/23/2012 6:17 PM, Jack Krupansky wrote: Simply have two fields, "full_body" and "limited_body". The former would index but not store the full document text from Tika (the "content" metadata.) The latter would

Re: filter by term frequency

2012-06-17 Thread Mike Sokolov
'memory') See: http://wiki.apache.org/solr/FunctionQuery#tf Lucene does have "FunctionQuery", "ValueSource", and "TermFreqValueSource". See: http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html -- Jack Krupansky -Orig

filter by term frequency

2012-06-16 Thread Mike Sokolov
for filtering, just sorting. Perhaps a Collector could then impose a score threshold later? Any suggestions here? -Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-16 Thread Mike Sokolov
I recall, there was a way to implement extensions to it that were fairly natural. -Mike On 5/16/2012 7:15 PM, Trejkaz wrote: On Thu, May 17, 2012 at 7:11 AM, Chris Harris wrote: but also crazier ones, perhaps like agreement w/5 (medical and companion) (dog or dragon) w/5 (cat and cow) (daisy and

Re: How to extract highest TF-IDF terms from Lucene index?

2012-05-09 Thread Mike McCandless
There is a tool named HighFregTerms, in contrib/misc that does this... Mike Sent from my iPad On May 9, 2012, at 4:18 PM, Michael Berkovsky wrote: > Hi, > > Assuming that there is a large lucene collection, and I want to extract top > N terms with highest TF/IDF scores fro

Re: surround parser match-all query

2012-05-06 Thread Mike Sokolov
r, but I don't know if it would be worth the trouble. It turns out in my very specific case I have a term that appears in every document in a particular field, so I am just using a search for that at the moment. -Mike On 5/6/2012 8:04 PM, Mike Sokolov wrote: I think what I have in min

Re: surround parser match-all query

2012-05-06 Thread Mike Sokolov
venient to have this as an option. -Mike On 5/6/2012 7:28 PM, Robert Muir wrote: Hi Mike: wheres for the normal queryparser this Query doesn't consult the positions file and is trivial, how would such a query be implemented for the surround parser? As a single span that matches all pos

Re: surround parser match-all query

2012-05-06 Thread Mike Sokolov
No, that doesn't work either - it works for the lucene query parser, but not for the *surround* query parser, which I'm using because it has a syntax for span queries. On 5/6/2012 6:10 PM, Vladimir Gubarkov wrote: Do you mean *:* ? On Mon, May 7, 2012 at 1:26 AM, Mike Sokolov wr

surround parser match-all query

2012-05-06 Thread Mike Sokolov
does anybody know how to express a MatchAllDocsQuery in surround query parser language? I've tried * and() but those don't parse. I looked at the grammar and I don't think there is a way. Please let us all know if you know otherwise! Thanks

Highlighting in Luke?

2012-03-13 Thread Mike O'Leary
look to me like there is a way to change strings that are displayed as search results so that some words are displayed with bold, italic or some other highlighting feature and others are not. Is this true, or did I overlook something? Thanks, Mike

Lucene's use of vectors

2012-03-01 Thread Mike O'Leary
ent way of searching using a term vector of search terms - other than using its terms in a Boolean search that is? I am asking because my boss asked me what all of the ways that Lucene uses vectors in indexing and search were, and my answer revealed a lot of gaps in my understanding of it. Thanks, Mike

Re: Concurrency and multiple merge threads

2012-02-19 Thread Mike McCandless
out sync. That said the ops inside the sync are tiny so it's strange if this really is the cause of the contention... It could just be a profiling ghost and something else is the real bottleneck... Mike On Feb 18, 2012, at 9:21 PM, Benson Margulies wrote: > Using Lucene 3.5.0, on a

Searching by similarity using term vectors

2012-02-14 Thread Mike O'Leary
t API functions are provided to do this kind of search? It looks like the standard method of search treats a list of query terms as a Boolean query. Is there an alternative search function that doesn't do this? Thanks, Mike

Re: Retrieving offsets

2012-01-19 Thread Mike Sokolov
rt of thing, and not worry about spans? -Mike On 1/19/2012 9:46 PM, Nishad Prakash wrote: I'm going to cry. There is no way to retrieve offsets for position, rather than for term? On 1/13/2012 6:33 PM, Nishad Prakash wrote: I'm having a set of issues in trying to use Luce

Tamper resistant index

2012-01-09 Thread Mike C
ersion of Directory (and IndexInput/IndexOutput), however before I go down that rabbit hole, decided to check in here. Any advice or suggestions appreciated. Kind Regards, Mike C. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.ap

RE: Obtaining IDF values for the terms in a document set

2011-12-15 Thread Mike O'Leary
all of the terms that occur in the document set and obtain their IDF values. Thanks, Mike -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, December 15, 2011 11:44 AM To: java-user@lucene.apache.org Subject: Re: Obtaining IDF values for the

Obtaining IDF values for the terms in a document set

2011-12-15 Thread Mike O'Leary
et of stopwords for the larger document set by selecting the terms that have the lowest IDF values. First of all, is this the best way to create a stopword list? Second, is there a straightforward way to generate a list of terms and their IDF values from a Lucene index? Thanks, Mike

Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-07 Thread Mike Sokolov
need to stick w/3.x for now. You might be in a different situation if you really need the 4.0 changes. Maybe you can just stick w/the current trunk and take responsibility for patching critical bugfixes, hoping you won't have to recreate your index too many times... -Mike On 12/06/20

Re: Advanced NearSpanQuery

2011-07-13 Thread Mike Sokolov
panOR query with a minShouldMatch functionality though. simon On Wed, Jul 13, 2011 at 5:09 PM, Jeroen Lauwers wrote: Hi Mike, Thanks for your quick reply, but do not seem to find any documentation on "DisjunctionSumQuery" and I'm not familiar with that concept. Could you p

Re: Advanced NearSpanQuery

2011-07-13 Thread Mike Sokolov
Can you wrap a SpanNearQuery around an DisjunctionSumQuery with minNrShouldMatch=8? -Mike On 07/13/2011 08:53 AM, Jeroen Lauwers wrote: Hi, I was wondering if anyone could help me on this: I want to search for: 1. a set of words (eg. 10) 2. only a couple of words may come in

highlighting performance

2011-06-20 Thread Mike Sokolov
Our apps use highlighting, and I expect that highlighting is an expensive operation since it requires processing the text of the documents, but I ran a test and was surprised just how expensive it is. I made a test index with three fields: path, modified, and contents. I made the index using

Re: Sharding Techniques

2011-05-10 Thread Mike Sokolov
for very large indexes: indexes to big to store on disk and cache in memory on one commodity box -Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: new to lucene, non standard index

2011-05-05 Thread Mike Sokolov
I think the solution I gave you will work. The only problem is if a token appears twice in the same doc: doc1 has foo with two different sets of weights and frequencies... but I think you're saying that doesn't happen On 05/05/2011 06:09 PM, Chris Schilling wrote: Hey Mike, Let

Re: new to lucene, non standard index

2011-05-05 Thread Mike Sokolov
will only get each document listed once. If they aren't unique, it's not clear what you want to sort by anyway.... -Mike On 05/05/2011 04:12 PM, Chris Schilling wrote: Hi, I am trying to figure out how to solve this problem: I have about 500,000 files that I would like to index, but

Re: QueryValidator

2011-05-05 Thread Mike Sokolov
o be aware of which parser it's trying to avoid errors in. In our case, we have a limited case where we always use a single parser, but I think solr exposes a pluggable extensible architecture with a lot of different parsers, so a more general solution will be more complex, and I don'

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread mike anderson
[x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On

Scoring problem with MultiPhraseQuery?

2010-12-15 Thread Mike Cawson
nk of a reason why this should be intentional beheviour so I assume there's a bug. I'm using Lucene 3.0. Thanks, Mike Cawson - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Where does Lucene recognise it has encountered a new term for the first time?

2010-12-15 Thread Mike Cawson
optimised. Is this correct? Can anyone suggest how to maintain a secondary index of terms? Perhaps only when the main index is optimised? Thanks, Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

proposed change to CharTokenizer

2010-10-14 Thread Mike Sokolov
rform character deletion, and AFAICT the only existing CharFilter performs replacements and expansions (of ligatures and the like). But my knowledge of Lucene is far from comprehensive. Does this seem like a reasonable patch? -Mike Michael Sokolov Engineering Director www.ifactory.com @iFactoryBos

Re: How to calculate payloads in queries too

2010-04-12 Thread Mike Schultz
I see the payload in the token now. -- View this message in context: http://n3.nabble.com/How-to-calculate-payloads-in-queries-too-tp712743p713413.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

How to calculate payloads in queries too

2010-04-11 Thread Mike Schultz
I am interested in using payloads in the following way. I store Func(index-term) as a payload at index-term when indexing. When querying I want to compute Func(query-term) as well. Then my similarity returns some other function, Gunc(Func(index-term1),Func(query-term)). As an example, maybe I'

Limiting search result for web search engine

2010-02-02 Thread Mike Polzin
amatically remove the results which, although they meet the search criteria, are not as relevent? Is there a way to do this through queries? Thanks in advance! Mike

RE: Read past EOF

2009-04-28 Thread Mike Streeton
An update, I have managed to get it to not fail by debugging and changing the value of org.apache.lucene.store.InputIndex.preUTF8Strings = true. The value is always false when it fails. Mike -Original Message- From: Mike Streeton [mailto:mike.stree...@connexica.com] Sent: 28 April

Read past EOF

2009-04-28 Thread Mike Streeton
I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in Any ideas? Many Thanks Mike CheckIndex v2.3.2 NOTE: testing will be more thorough if you run java with

Re: document boost

2008-01-31 Thread Mike Grafton
So we upgraded to SOLR 1.2, which uses Lucene 2.1 or so, and the problem went away. Thanks all the help, folks! Mike On 1/30/08, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Hi Mike, I think this issue probably belongs in the Solr lists since > it looks like you're indexing t

Re: document boost

2008-01-30 Thread Mike Grafton
to? We looked through the Lucene source for a while, but it was kind of hard to track this down. One note: we're on an old version of Lucene - a nightly build between 2.0.0and 2.1.0. Mike On 1/30/08, Mark Miller <[EMAIL PROTECTED]> wrote: > > I would say you def misconfigured so

document boost

2008-01-30 Thread Mike Grafton
y, we're using SOLR to access Lucene. We can give more information if necessary, such as our SOLR schema.xml, if folks think that would help explain things. Let us know what other information we can provide. Thanks, Mike

Re: SimpleFragmenter docs

2008-01-23 Thread Mike Klaas
Indeed--this is why the associated parameter is called maxAnalyzedChars in Solr. -Mike On 14-Jan-08, at 2:33 PM, Mark Miller wrote: I think your right, and thats not the only place...the whole handling of maxDocBytesToAnalyze in the main Highlighter class shares this issue. I guess the

Re: Wikia search goes live today

2008-01-08 Thread Mike Klaas
links). If they are incorporating the "star" ratings yet, it is probably folded in to the global doc boost. -Mike Regards, Lukas On Jan 7, 2008 11:14 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: See my comment (around #45-50) on Techcrunch about that from late last nigh

Re: Pagination ...

2007-12-26 Thread Mike Richmond
You might want to take a look at Solr (http://lucene.apache.org/solr/). You could either use Solr directly, or see how they implement paging. --Mike On Dec 26, 2007 12:12 PM, Zhou Qi <[EMAIL PROTECTED]> wrote: > Using the search function for pagination will carry out unnecess

Re: thoughts/suggestions for analyzing/tokenizing class names

2007-12-17 Thread Mike Klaas
On 17-Dec-07, at 11:39 AM, Beyer,Nathan wrote: Would using Field.Index.UN_TOKENIZED be the same as tokenizing a field into one token? Indeed. -Mike -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Monday, December 17, 2007 12:53 PM To: java-user

Re: thoughts/suggestions for analyzing/tokenizing class names

2007-12-17 Thread Mike Klaas
"classname:org.apache*" would probably be wrong--you might not want to match org.apache-fake.lucene.document regards, -Mike On 17-Dec-07, at 9:39 AM, Beyer,Nathan wrote: Good point. I don't want the sub-package names on their own to match. T

Re: thoughts/suggestions for analyzing/tokenizing class names

2007-12-17 Thread Mike Klaas
w to approach tokenizing these types of texts? Perhaps it would help to include some examples of queries you _don't_ want to match. For all the examples above, simply tokenizing alphanumeric components would suffice. -Mike

Re: index and access to lines of a CSV file

2007-12-13 Thread Mike Klaas
isn't really needed for that. You need some kind of on-disk key->value mapper. Something like a berkeley db hashtable or btree should work (store each line as a key/value pair). -Mike - To unsubscribe, e-mail: [EMAIL P

Re: Custom query parser

2007-11-22 Thread Mike Klaas
hen if the resulting query is a PhraseQuery, extract the components and replace the query with a span query of prefix queries, or somesuch. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: TermDocs.skipTo error

2007-11-14 Thread Mike Streeton
have run the test again after it fails changing the loop iterators so it repeats the failing iteratation first and it works okay. Many Thanks Mike import java.io.File; import java.io.IOException; import java.util.Random; import org.apache.lucene.analysis.standard.StandardAnalyzer; import

RE: TermDocs.skipTo error

2007-11-12 Thread Mike Streeton
Thanks Mike -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: 10 November 2007 22:49 To: java-user@lucene.apache.org Subject: Re: TermDocs.skipTo error On Nov 9, 2007 11:40 AM, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have just t

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have just tried this again using the index I built with lucene 2.1 but running the test using lucene 2.2 and it works okay, so it seems to be something related to an index built using lucene 2.2. Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 09 November

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have tried this again using Lucene 2.1 and as Erick found it works okay, I have tried it on jdk 1.6 u1 and u3 both work, but both fail when using lucene 2.2 Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 09 November 2007 16:05 To: java-user

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
Erick, Sorry the numbers are just printed out for debugging when it is building the index. I will try it with lucene 2.1 and see what happens Thanks Mike -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 09 November 2007 15:59 To: java-user@lucene.apache.org

TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
) at org.apache.lucene.index.MultiTermDocs.skipTo(MultiReader.java:413) at Test4.test(Test4.java:88) at main(Test4.java:69) The program creates a test index, if you run it a second time it will not create the index. Change the directory name on line 33. Many Thanks Mike Ps I am using Lucene 2.2 and java 1.6 u1

Re: Search performance using BooleanQueries in BooleanQueries

2007-11-06 Thread Mike Klaas
On 6-Nov-07, at 3:02 PM, Paul Elschot wrote: On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote: Wait--shouldn't the outer-most BooleanQuery provide most of this speedup already (since it should be skipTo'ing between the nested BooleanQueries and the outermost). Is it the indir

Re: Search performance using BooleanQueries in BooleanQueries

2007-11-06 Thread Mike Klaas
this speedup already (since it should be skipTo'ing between the nested BooleanQueries and the outermost). Is it the indirection and sub- query management that is causing the performance difference, or differences in skiptTo behaviou

Reuse TermDocs

2007-11-05 Thread Mike Streeton
reuse the same one over a period of time. Many Thanks Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

TermDocs.skipTo

2007-10-29 Thread Mike Streeton
TermDocs.skipTo(51) returns false indicating that no doc id > 50 exists. I will try and create a sample index to show this. Many Thanks Mike

Re: Generalized proximity query performance

2007-10-05 Thread Mike Klaas
what that exactly is being suggested here. I'm thinking of the dismax model: you still want each keyword to match (though possibly in different fields). I don't really think that that is appropriate to through into a single query class.

Re: Generalized proximity query performance

2007-10-05 Thread Mike Klaas
writing a variant of PhraseQuery that has the desired functionality would be _too_ hard, but I haven't looked into it in depth. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: BoostingTermQuery performance

2007-10-02 Thread Mike Klaas
ure in Lucene, and performacen will be key for that. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tokenization question

2007-09-13 Thread Mike Klaas
the nature of an inverted index). You can (quite time-consumedly) reconstruct by iterating over the whole index. I think luke can do this. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Re: Storing Host and IP Information in Lucene

2007-09-11 Thread Mike Klaas
hen you can query docs from example.com via: (com.example com.example.*) If you want 'example' to be searchable as a term, then additionally store the host in a different, tokenized field. -Mike Ankit Daniel Noll-3 wrote: On Monday 10 September 2007 23:53:06 AnkitSinghal wrote:

  1   2   >