Re: Max Field Length

2022-09-23 Thread Scott Guthery
g maximums. Cheers, Scott > >

Max Field Length

2022-09-22 Thread Scott Guthery
Lucene 9.3 seems to have a (post-Analyzer) maximum field length of 32767. Is there a way of increasing this without resorting to the source code? Thanks for any guidance. Cheers, Scott

Re: [EXTERNAL] XML Enternal Entity vulnerability

2018-10-22 Thread Bauer, Herbert S. (Scott)
Nevermind I see my question is answered here: https://issues.apache.org/jira/browse/SOLR-11477 On 10/22/18, 11:21 AM, "Bauer, Herbert S. (Scott)" wrote: Can someone verify that this is a vulnerability in regular lucene implementations outside of solr for versions 5.

XML Enternal Entity vulnerability

2018-10-22 Thread Bauer, Herbert S. (Scott)
Can someone verify that this is a vulnerability in regular lucene implementations outside of solr for versions 5.1 to 7.0.1?

Re: Call for MODERATORs on the dev and java-user mailing lists

2017-02-03 Thread scott cote
Thanks Steve. SCott > On Feb 3, 2017, at 3:57 PM, Steve Rowe wrote: > > Hmm, can’t do math today: the average per list is more like 1 message every 3 > days on a per list basis, assuming matches for subject:MODERATE on the > gmail.com web UI is accurate. It's burs

Re: Call for MODERATORs on the dev and java-user mailing lists

2017-02-03 Thread scott cote
Let me ask if I can get some cycles to do this. I’m interested but I have to check first. SCott scott.c...@lucidworks.com > On Feb 3, 2017, at 3:14 PM, Steve Rowe wrote: > > FYI I’m holding off on creating the INFRA JIRA until Aurelian has > acknowledged subscribing to dev@luc

Re: Newbie Questions

2016-08-09 Thread Bauer, Herbert S. (Scott)
You might start here: http://lucene.apache.org/core/6_1_0/core/org/apache/lucene/index/IndexWrite r.html On 8/8/16, 8:17 PM, "lukes" wrote: >Can anyone please reply ? . > >Regards. > > > >-- >View this message in context: >http://lucene.472066.n3.nabble.com/Newbie-Questions-tp4290817p4290854.

Re: upgrading lucene 4 to 6

2016-04-26 Thread scott cote
Jamie, I just went through an upgrade from 3 to 5. We used faceting, highlighting, search, explanation, etc ….It took us 3 months and that was a hard push (2 to 3 people dedicated to the effort). Don’t put off the upgrade. The performance is worth the pain. SCott > On Apr 26, 2016,

update of a numerical field in a lucene document

2016-04-08 Thread scott cote
he if statement that ensures that the document has the field), but then is not gated into “processEvents(true,false);” step as the line 1515 docWriter.updateDocValues(….) returns false. can’t seem to pin down why that is happening. What am I missing? SCott

Re: Serializing Queries

2016-03-22 Thread Bauer, Herbert S. (Scott)
ut = new Input(bais); >kryo = pool.borrowObject(); >deserializedQueryObject = (Query) kryo.readClassAndObject(input); >pool.returnObject(kryo); >input.close(); > >Hope that might help. > >Jim > > >From: Bauer, Herbert S. (Scot

Re: Serializing Queries

2016-03-19 Thread Bauer, Herbert S. (Scott)
Thanks James: This looks promising. I¹ll repost when I¹ve had a chance to implement this. -scott On 3/18/16, 10:44 AM, "McKinley, James T" wrote: >We use Kryo to pass query objects between hosts: > >https://github.com/EsotericSoftware/kryo > >We initially had some

Re: Serializing Queries

2016-03-19 Thread Bauer, Herbert S. (Scott)
Thanks Ahmet: I¹ve seen these, but I don¹t find any mechanism here helping to get me from the query object to the xml or a DOM object. -scott On 3/18/16, 11:35 AM, "Ahmet Arslan" wrote: >Hi, > >I think, xml query parser examples [1] are the safest way to persist >Lucene

Serializing Queries

2016-03-18 Thread Bauer, Herbert S. (Scott)
simple query representations. I’m looking at the CoreParser and it’s supporting xml parsing capabilities with an eye toward Marshalling the boolean query into a DOM object and unmarshalling it on the server side using some of the support implied by the CoreParser and related classes. -scott

Managing Searchers and Readers vs Creating new ones

2016-03-15 Thread Bauer, Herbert S. (Scott)
what might work well for our use case? Thanks, Scott

multivalued index search

2016-01-29 Thread scott cote
What is the best approach to implementing a multivalued index field search in the current version of Lucene? SCott - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h

Re: Highlighting deprecation?

2015-12-01 Thread scott cote
checkout the highlight package … https://lucene.apache.org/core/5_3_0/highlighter/org/apache/lucene/search/highlight/package-summary.html <https://lucene.apache.org/core/5_3_0/highlighter/org/apache/lucene/search/highlight/package-summary.html> SCott > On Dec 1, 2015, at 4:16 PM

problem using faceting in 5.3

2015-11-02 Thread scott cote
lucene-queryparser 3.6.0 Here is the code to retrieve facet data from the version 3.6 index (which does work against version 3.6 lucene): public class FacetRunner { public static void main(final String[] args) throws Exception { File indexDirFile = new File("/Users/

Re: Scoring over Multiple Indexes

2015-10-22 Thread Bauer, Herbert S. (Scott)
ng >is not going to be satisfactory. Boosting will influence the final >score and thus the >position of the document, but not absolutely order them unless you put >in insane boosts. >Tests based on boosting and doc ordering will be very fragile I'd guess. > >Best, >Erick

Scoring over Multiple Indexes

2015-10-22 Thread Bauer, Herbert S. (Scott)
normalize scoring over diverse indexes? If not is there a strategy for rolling your own normalizing solution? I’m assuming this has to be a common problem.-scott

Re: Entity level queries on parent child groups in block joins

2015-10-13 Thread Bauer, Herbert S. (Scott)
As a follow up, is it correct that the parent child grouping is on it’s own segment? And would this segment have it’s own LeafReader that I could identify and access. I think this might solve my problem. -scott From: , Scott Bauer mailto:bauer.sc...@mayo.edu>> Date: Tuesday, October 13

Entity level queries on parent child groups in block joins

2015-10-13 Thread Bauer, Herbert S. (Scott)
that contained these two values while at the same time returning no other parents that had one of the other values? (Which would be the case with a boolean query with two or more Occur.SHOULD designations) Performance is a consideration here. Thanks, Scott

Re: PerFieldAnalyzerWrapper does not seem to allow use of a custom analyzer

2015-08-10 Thread Bauer, Herbert S. (Scott)
I found the problem here. I had changed some method params and was inadvertently creating the fields I was having issues with as StringFields, which the analyzer fails silently against. From: , Scott Bauer mailto:bauer.sc...@mayo.edu>> Date: Friday, August 7, 2015 at 1:56 PM To: &quo

PerFieldAnalyzerWrapper does not seem to allow use of a custom analyzer

2015-08-07 Thread Bauer, Herbert S. (Scott)
I can’t seem to detect any issues with the final custom analyzer declared in this code snippet (The one that attempts to use a PatternMatchingTokenizer and is initialized as sa), but it doesn’t seem to be hit when I run my indexing code despite being in the map. It is indexed finally but I assu

Re: Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1

2015-06-29 Thread Bauer, Herbert S. (Scott)
rcher.search(termJoinQuery, collector); TopGroups getTopGroupsResults = collector.getTopGroups(termJoinQuery, null, 0, 10, 0, true); String ecode = null; for (GroupDocs result : getTopGroupsResults.groups) { Document parent = searcher.doc(result.groupValue);

Re: Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1

2015-06-23 Thread Bauer, Herbert S. (Scott)
implementation just not used or supported that much? On 6/22/15, 2:21 PM, "Bauer, Herbert S. (Scott)" wrote: >Well it’s clear that this is just giving a return value of >Integer.MAX_VALUE for the parentDoc. Given the recent changes noted here: > https://issues.apache.org/jira/brows

Re: Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1

2015-06-22 Thread Bauer, Herbert S. (Scott)
here? On 6/5/15, 12:05 PM, "Bauer, Herbert S. (Scott)" wrote: >One correction, it looks like the parentBits call has 4823680 passed to it >to generate the erroneous docId. > >On 6/5/15, 10:34 AM, "Bauer, Herbert S. (Scott)" >wrote: > >>I should men

Re: Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1

2015-06-05 Thread Bauer, Herbert S. (Scott)
One correction, it looks like the parentBits call has 4823680 passed to it to generate the erroneous docId. On 6/5/15, 10:34 AM, "Bauer, Herbert S. (Scott)" wrote: >I should mention that this worked in 4.10.4 using a very similar code >base. -scott > >On 6/4/15, 4:51

Re: Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1

2015-06-05 Thread Bauer, Herbert S. (Scott)
I should mention that this worked in 4.10.4 using a very similar code base. -scott On 6/4/15, 4:51 PM, "Bauer, Herbert S. (Scott)" wrote: >I¹m working with Lucene 5.1 to try to make use of the relational >structure of the block join index and query mechanisms. I¹m qu

Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1

2015-06-04 Thread Bauer, Herbert S. (Scott)
n someone shed some light on this exception? Thanks, Scott Bauer

Searching with String that Represents a Signature

2014-08-14 Thread Scott Selvia
We have OCR a document with a signature, you can select the signature and copy the text representation for searching in a lucene 4.7 index. We have surrounded the search text with double quotes since it has invalid search characters without the use of the double quotes. Search Text: ":J!/z&”

Re: Exact Phrase Search returning in correct results

2014-06-11 Thread Scott Selvia
o be able to search stop words consider adding > CharArraySet.EMPTY_SET to the StandardAnalyzer's initializer. > > > > -Original Message- > From: Scott Selvia [mailto:ssel...@gmail.com] > Sent: Wednesday, June 11, 2014 12:48 PM > To: java-user@lucene.apache

Exact Phrase Search returning in correct results

2014-06-11 Thread Scott Selvia
I’m having an issue searching for an exact phrase with Lucene 4.7. My use case loaded the Declaration of Independence into a Lucene search database. I search for “it becomes” and I get two hits; one for “it, becomes” and another for a line that just has “becomes” at the end of the line. Expec

RE: Performance testing Lucene

2014-01-27 Thread Scott Schneider
many unit tests! Scott > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Friday, January 24, 2014 3:03 AM > To: java-user@lucene.apache.org > Subject: RE: Performance testing Lucene > > Hi Scott, > > the unit tests are also a good

RE: Performance testing Lucene

2014-01-23 Thread Scott Schneider
Thanks! I ran this Directory subclass through the Lucene unit tests (and found 3 race conditions). Unit tests are wonderful. Scott > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Wednesday, January 22, 2014 7:05 AM > To:

Performance testing Lucene

2014-01-20 Thread Scott Schneider
gle, general query test. It's not hard to come up with a decent set of queries, but I'd really like something representative of real world queries. If there some standard set of commonly used queries, that would be ideal. Thanks! Scott

RE: Unit test help

2013-12-22 Thread Scott Schneider
set -Dlucene.version=3.3-SNAPSHOT. When running the test through ant, I think common-build.xml sets that property. My other problem running the tests on my own Directory subclass was a noob mistake. I had to specify -Dtests.directory=foo in VM arguments, rather than program arguments. Scott

Unit test help

2013-12-20 Thread Scott Schneider
at shouldn't be blocking the deletion. And I don't see how any other code could open a handle to this file, since it's created in a temp directory created by Lucene Transform. I can't think of any reason for the difference between ant and eclipse! Please help! Thanks, Scott

Debugging unit tests with Eclipse

2013-12-18 Thread Scott Schneider
"ant test -Dblahblah" works, but this doesn't use that argument and gets the same 54 test failures, like normal. Please help! Thanks, Scott

RE: Running Lucene tests on a custom Directory subclass

2013-12-18 Thread Scott Schneider
Never mind... the problem was that I compiled my jar against Lucene 3.3, but tried running against Lucene 4.4. It works when I also run against 3.3. (Or, at least, I get test failures that make sense!) Scott > -Original Message- > From: Scott Schneider [mailto:scott_

Running Lucene tests on a custom Directory subclass

2013-12-17 Thread Scott Schneider
ectory) and gave it a 0-argument constructor. Apologies if this was addressed elsewhere. In googling for an answer, the term "Directory" is basically invisible. I found a page on running Lucene's tests on a custom codec and approximated those steps. Scott

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
phi.de eMail: u...@thetaphi.de > -Original Message- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Thursday, December 05, 2013 9:36 PM > To: java-user@lucene.apache.org > Subject: Analyzers aren't reusable?? (lucene 4.2.1) > > I wrote the following

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
Thanks for the quick response. I'll read through the references. Thanks again Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, December 05, 2013 1:46 PM To: java-user@lucene.apache.org Subject: RE: Analyzers aren't reusable?? (lucene 4

Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
" and "/>". Is this expected behavior? I thought analyzers were thread-safe and reusable. Am I wrong on that point? I would expect the output of all three to be the same. Can someone explain to me what's going on? What am I missing? Scott

RE: Highlighting phrases

2013-11-27 Thread Scott Smith
Never mind. I figured it out. Thanks anyway. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Wednesday, November 27, 2013 9:27 AM To: java-user@lucene.apache.org Subject: Highlighting phrases I'm doing some highlighting with the following code fra

Highlighting phrases

2013-11-27 Thread Scott Smith
I'm doing some highlighting with the following code fragment: formatter = new SimpleHTMLFormatter(, ); Scorer score = new QueryScorer(myQuery); ht = new Highlighter(formatter, score); ht.setTextFragmenter(new NullFragmenter());

Phrase highlight

2013-11-26 Thread Scott Smith
I'm doing some highlighting with the following code fragment: formatter = new SimpleHTMLFormatter(, ); Scorer score = new QueryScorer(myQuery); ht = new Highlighter(formatter, score); ht.setTextFragmenter(new NullFragmenter());

RE: Can you escape characters you don't want the analyzer to modify

2013-09-18 Thread Scott Smith
ounds like you either need to have a custom analyzer or a field-aware analyzer. -- Jack Krupansky -Original Message----- From: Scott Smith Sent: Tuesday, September 17, 2013 4:26 PM To: java-user@lucene.apache.org Subject: Can you escape characters you don't want the analyzer to modify S

Can you escape characters you don't want the analyzer to modify

2013-09-17 Thread Scott Smith
Suppose I have a string like "ab@cd%d". My analyzer will turn this into "ab cd d". Can I pass it "ab\@cd\%d" and force it to treat it as a single word? I want to use the Query parser, but I don't want it messing with fields that have not been analyzed.

Lucene Query Syntax with analyzed and unanalyzed text

2013-09-16 Thread Scott Smith
I want to be sure I understand this correctly. Suppose I have a search that I'm going to run through the query parser that looks like: body:"some phrase" AND keyword:"my-keyword" clearly "body" and "keyword" are field names. However, the additional information is that the "body" field is anal

RE: classic.QueryParser - bug or new behavior?

2013-05-19 Thread Scott Smith
help. Scott -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Sunday, May 19, 2013 1:26 PM To: java-user@lucene.apache.org Subject: Re: classic.QueryParser - bug or new behavior? Yeah, just go ahead and escape the slash, either with a backslash or by enclosin

classic.QueryParser - bug or new behavior?

2013-05-19 Thread Scott Smith
a forward slash, I'm confused why it would need escaping of any of the characters in the string with the "/EXPIRED". Has anyone seen this? Scott

RE: Lucene slow performance -- still broke

2013-03-20 Thread Scott Smith
Duh...it's supposed to be setMergeFactor(). Thanks Scott -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Wednesday, March 20, 2013 3:53 PM To: java-user@lucene.apache.org Subject: Re: Lucene slow performance -- still broke quick question, w

RE: Lucene slow performance -- still broke

2013-03-20 Thread Scott Smith
tRAMBufferSizeMB(50.0); Any help in figuring out what is causing this problem would be appreciated. I do now have an offline system that I can play with so I can do some intrusive things if need be. Scott -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent:

RE: Lucene slow performance

2013-03-16 Thread Scott Smith
Thanks for the help. The reindex was done this morning and searches now take less than a second. I will make the change to the code. Cheers Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, March 15, 2013 11:17 PM To: java-user@lucene.apache.org

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
nauer [mailto:simon.willna...@gmail.com] Sent: Friday, March 15, 2013 5:08 PM To: java-user@lucene.apache.org Subject: Re: Lucene slow performance On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith wrote: > " Do you always close IndexWriter after adding few documents and when > closing, d

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
March 16, 2013 12:08 AM > To: java-user@lucene.apache.org > Subject: Re: Lucene slow performance > > On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith > wrote: > > " Do you always close IndexWriter after adding few documents and > > when > closing, disable "

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
ling all merges)?" Frankly I don't quite understand what this means. When I "close" the indexwriter, I simply call close(). Is that the wrong thing? Thanks Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, March 15, 2013 4:49 PM To:

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
a custom merge policy or somthing like this, any special IndexWriter settings? On Fri, Mar 15, 2013 at 11:15 PM, Scott Smith wrote: > We have a system that is using lucene and the searches are very slow. The > number of documents is fairly small (less than 30,000) and each document is

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
n has changed since 1.4, but does it not merge all of the various files into a few files? -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Friday, March 15, 2013 4:15 PM To: java-user@lucene.apache.org Subject: Lucene slow performance We have a system th

Lucene slow performance

2013-03-15 Thread Scott Smith
several thousand (3000+) .cfs files. We do optimize the index once per day. This is a system that probably gets several thousand document deletes and additions per day (spread out across the day). Any thoughts. We didn't really notice this until we went to 4.x. Scott

Handling a closed IndexWriter in Solr

2013-03-13 Thread Danzig, Scott
Hey all, We're using a Solr 4 core to handle our article data. When someone in our CMS publishes an article, we have a listener that indexes it straight to solr. We use the previously instantiated HttpSolrServer, build the solr document, add it with server.add(doc) .. then do a server.commit(

RE: Which stemmer?

2012-11-15 Thread Scott Smith
dog dog dog dog's dog'dog's dog' dogs dog dogs dog dogs' dog dogs dog Now, if someone would answer my question on the Solr list ("Custom Solr Indexer/Search&q

RE: Which stemmer?

2012-11-14 Thread Scott Smith
Perhaps the kstemmer is "just right" :-) Cheers Scott -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, November 14, 2012 4:14 PM To: java-user@lucene.apache.org Subject: Re: Which stemmer? What is your use case? If you don't have a spec

RE: CJKWidthFilter vs ICUFoldingFilter

2012-11-14 Thread Scott Smith
Thanks -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, November 14, 2012 12:17 PM To: java-user@lucene.apache.org Subject: Re: CJKWidthFilter vs ICUFoldingFilter On Wed, Nov 14, 2012 at 9:47 AM, Scott Smith wrote: > Reading the documentation for these

Which stemmer?

2012-11-14 Thread Scott Smith
Does anyone have any experience with the stemmers? I know that Porter is what "everyone" uses. Am I better off with KStemFilter (better performance) or ?? Does anyone understand the differences between the various stemmers and how to choose one over another?

CJKWidthFilter vs ICUFoldingFilter

2012-11-14 Thread Scott Smith
, etc. Can I just use the ICUFoldingFilter? Cheers Scott

RE: Near Real Time for multiple applications

2012-11-07 Thread Scott Smith
ccandless.com] Sent: Tuesday, November 06, 2012 5:32 AM To: java-user@lucene.apache.org Subject: Re: Near Real Time for multiple applications On Mon, Nov 5, 2012 at 6:33 PM, Scott Smith wrote: > I've been reading about NRT thinking it might be good to integrate it into my > code. However,

RE: Highlighting html pages

2012-11-05 Thread Scott Smith
tags being properly nested. Cheers Scott -----Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Thursday, November 01, 2012 7:16 PM To: Michael Sokolov; java-user@lucene.apache.org Subject: RE: Highlighting html pages I was trying to play with this. Am I correct in

Near Real Time for multiple applications

2012-11-05 Thread Scott Smith
I've been reading about NRT thinking it might be good to integrate it into my code. However, I have a question. Suppose that the index writer and the index reader run in totally different JVMs (i.e., they are different applications and only communicate via the disk). Am I correct in thinking

RE: Highlighting html pages

2012-11-01 Thread Scott Smith
id of punctuation (commas, periods, semicolons, etc.) after the HTML stripping, is there a filter? Essentially, I want to get it back to what StandardTokenizer would give me after I've stripped the HTML. Suggestions? Scott -Original Message- From: Michael Sokolov [mailto:soko...@if

4.0 tokenStream or SimpleAnalyzer bug?

2012-11-01 Thread Scott Smith
I was doing some tokenizer/filter analysis attempting to fix a bug I have in highlighting under 4.0. I was running the displayTokensWithFullDetails code from LIA2. I would get an exception with a bad index value of -1. I fixed the problem by doing a reset() immediately after creating my Token

Highlighting and InvalidTokenOffsetsException in Lucene 4.0

2012-10-31 Thread Scott Smith
here (even though it took a couple of "minor" changes to get it to compile in 4.0 This code used to work in 3.5. Anyone have any ideas? Scott Code fragment: try { ctf = new CachingTokenFilter(myCustomAnalyzer .tokenStream(M

RE: Norms and Term Vectors in Lucene 4.0

2012-10-30 Thread Scott Smith
Thanks Simon. Appears I had it mostly figured out correctly--except for the last question :-) Thanks for the suggestion on caching the fieldtype. Cheers Scott -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Tuesday, October 30, 2012 2:10 AM To: java

Norms and Term Vectors in Lucene 4.0

2012-10-29 Thread Scott Smith
orms on a field, then do I need to use the new streamlined Field() and set the appropriate FieldType object parameters? Is that my only option? I assume I also have to go through the new Field() if I need to control TermVectors? Where's LIA3 when you need it :) Scott

RE: lucene 4.0 indexReader is changed

2012-10-29 Thread Scott Smith
OK. I'll take a look at that. Thanks for the help. Scott -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, October 26, 2012 6:07 PM To: java-user@lucene.apache.org Subject: Re: lucene 4.0 indexReader is changed How about DirectoryReader

RE: Lucene 4.0 delete by ID

2012-10-29 Thread Scott Smith
I understand the issue of the lucene doc id changing. I'll probably look to see if I can delete stuff just based on some field that I have that I know won't change. I've used the doc id for a long time, but maybe it's time for a change. Thanks for all of the input.

RE: Lucene 4.0 delete by ID

2012-10-29 Thread Scott Smith
The lucene integer doc id. -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Sunday, October 28, 2012 5:09 PM To: java-user@lucene.apache.org Subject: Re: Lucene 4.0 delete by ID Scott, did you mean the Lucene integer id, or the unique id field? - Original

lucene 4.0 indexReader is changed

2012-10-26 Thread Scott Smith
How do I determine if the index has been modified in 4.0? The ifchanged() and isChanged() appear to have been removed.

Lucene 4.0 delete by ID

2012-10-26 Thread Scott Smith
I'm currently converting some lucene code to 4.0. It appears that you are no longer allowed to delete a document by its ID. Is that correct? Is my only option to figure some kind of query (which obviously isn't based on ID) and do the delete from there?

Highlighting html pages

2012-10-23 Thread Scott Smith
I need to take an html page that I retrieve from my lucene search and highlight all of the terms that are part of the search. I need to skip over any html tags since I don't want any words in tags which happen to match the search to be highlighted. Note that I don't want sections of the docum

RE: Lucene reorganizing indexes

2012-07-17 Thread Scott Smith
armed up after a commit and the never ending full GCs. Greets Ralf -Ursprüngliche Nachricht- Von: Scott Smith [mailto:ssm...@mainstreamdata.com] Gesendet: Montag, 16. Juli 2012 22:29 An: java-user@lucene.apache.org Betreff: Lucene reorganizing indexes We have an application that has

Lucene reorganizing indexes

2012-07-16 Thread Scott Smith
before going to 3.5) severely increased the disk activity which is interfering with other things running on the boxes. Does any of this make sense to anyone? Is there an explanation? Thoughts about what we might do about it? Thanks in advance. Scott

Bizarre Search order request

2012-05-25 Thread Scott Smith
blog" and "website". If there aren't 10 of one of these, then the I'm allowed to exceed the maximum of 10 so that I get 20 results. What I don't want is 20 "mail" documents if there are "blog" and/or "website" documents to display. Is something like this even possible? Any thoughts would be appreciated. Scott

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith
OK. Thanks -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, September 26, 2011 12:15 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith wrote: > "is" is the input stream

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith
test fails. If I include it, it passes. I'm using MLT as follows: _query = new BooleanClause(mlt.like(new InputStreamReader(is), "EVERYTHING"), BooleanClause.Occur.MUST); "is" is the input stream. Did I miss something in your response? Scott -O

RE: MoreLikeThis Interface changes

2011-09-22 Thread Scott Smith
Understand. Thanks for the information. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith wrote: >

MoreLikeThis Interface changes

2011-09-21 Thread Scott Smith
something where you boost the MLT words from the subject and as opposed to the body of the document you are looking for similar items on? Thanks Scott

RE: [ANNOUNCEMENT] NLP-based Analyzer library for Lucene

2011-02-14 Thread Scott Smith
One thing to note is that the Stanford POS Tagger is licensed using GPL v2. A commercial license is available, but it doesn't appear to be free ($3k min if I read correctly). I wonder what it would take to make this available using OpenNLP which has a friendlier license. -Original Message

Fuzzy Phrase Search

2010-10-27 Thread Andrew Scott
Hi Guys, I am wondering how I can go about doing a Fuzzy Phrase search using Lucene.NET 2.9.2 - I've tired looking around everywhere but there doesn't really seem to be any resources related to this anywhere. I found this stackoverflow link

RE: QueryParser in 3.x

2010-09-17 Thread Scott Smith
the reusableTokenStream and I now get the result I wanted. The above code snippet generates the word without the umlaut in both cases. So, problem solved. Thanks to Simon for putting on the right track. Scott -Original Message- From: Simon Willnauer [mailto:simon.willna...@google

QueryParser in 3.x

2010-09-16 Thread Scott Smith
I recently upgraded to Lucene 3.0 and am seeing some new behavior that I don't understand. Perhaps someone can explain why. I have a custom analyzer. Part of the analyzer uses the AsciiFoldingFilter. If I run a word with an umlaut through that analyzer using the AnalyzerDemo code in LIA2,

RE: Numeric Range Filter - bug or documentation oversight

2010-03-09 Thread Scott Smith
Thanks for looking at this Uwe. I'll check my code again, but I tried changing it several times and it did seem to make a difference. Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, March 06, 2010 3:11 AM To: java-user@lucene.apache.org Su

Numeric Range Filter - bug or documentation oversight

2010-03-05 Thread Scott Smith
I've been updating from 2.4.2 to 3.0.1. I had a number of issues (The Version object in the analyzers was an "interesting" addition-I guess I don't understand the use case for them. I understand what it says; I was just surprised and it caused me some problems since I create analyzers with reflect

Re: Old Lucene src archive corrupt?

2010-03-03 Thread Scott Ribe
Or perhaps your download process is treating the archive file as text and translating "line endings" for you? -- Scott Ribe scott_r...@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice - To unsubscri

Re: Why Lucene takes longer time for the first query and less for subsequent ones

2009-11-17 Thread Scott Ribe
T of the query code. Then first queries are fast. -- Scott Ribe scott_r...@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands

Polishing up my Lucene integration, customizing analyzer

2009-11-15 Thread Scott Ribe
hem are simple settings to StandardAnalyzer, but not all, particularly those first two items... Any hints or directions appreciated. -- Scott Ribe scott_r...@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice -

Re: Question about how to speed up custom scoring

2009-10-11 Thread scott w
vsq1, vsq2, vsq3 }; > > Query textQuery = QueryParser.parse("company:Microsoft"); > > Query q = new QueryTermBoostingQuery(textQuery, vsq, bias); > --- > > Does this work for you? > Yes I think this should work! Thanks for taking the time to clearly write up a solution. Will report back after testing it out. best, Scott

Re: Question about how to speed up custom scoring

2009-10-10 Thread scott w
at 5:32 PM, Jake Mannix wrote: > Great Scott (hah!) - please do report back, even if it just works fine and > you have no more questions, I'd like to know whether this really is > what you were after and actually works for you. > > Note that the FieldCache is kinda "mag

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
Thanks Jake! I will test this out and report back soon in case it's helpful to others. Definitely appreciate the help. Scott On Fri, Oct 9, 2009 at 3:33 PM, Jake Mannix wrote: > On Fri, Oct 9, 2009 at 3:07 PM, scott w wrote: > > > Example Document: > > model_1_score

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
score. Hopefully that make more sense. The other use case I had in mind is one where it doesn't care about the indexed value and only looks at whether the field is present or not and then uses the query supplied weight to measure the relative importance of that field. thanks, Scott O

  1   2   >