Re: Lucene in action

2023-06-10 Thread Mark Miller
Nature abhors being anything but an author by name on a second tech book. The ruse is up after one when you have the inputs crystalized and the hourly wage in hand. Hard to find anything but executive producers after that. I’d shoot for a persuasive crowdfunding attempt.

Re: How to get the index for a document after a search over multiple indexes

2016-06-15 Thread Mark Shapiro
Thanks, I appreciate the useful info. I can go with option 1. Mark

Re: How to get the index for a document after a search over multiple indexes

2016-06-14 Thread Mark Shapiro
private static IndexSearcher getSearcher( String[] indexDirs ) throws Exception { IndexReader[] readers = new IndexReader[indexDirs.length]; FSDirectory[] directorys = new FSDirectory[indexDirs.length]; for ( int i = 0; i < indexDirs.length; ++i ) { File file = new File( indexDirs[i] );

How to get the index for a document after a search over multiple indexes

2016-06-13 Thread Mark Shapiro
. Thanks, Mark

Search trough versioned data

2015-12-09 Thread Mark Bakker
? 4. In case 2 will this solve our requirements? Kind regards, Mark Bakker

Re: Indexing a binary field

2015-09-01 Thread Mark Hanfland
You are correct that Lucene only works with text (no binary or primitives), Base64 would be the way I would suggest. On Monday, August 31, 2015 11:19 AM, Dan Smith wrote: What's the best way to index binary data in Lucene? I'm adding a Lucene index to a key value store, and I want t

RE: Solr throws 400 from proxy but returns fine from browser.

2015-07-31 Thread Mark Horninger
utput(true); con.connect(); int responseCode = con.getResponseCode(); /*--SNIP!--*/ -Original Message- From: Mark Horninger Sent: Thursday, July 30, 2015 8:54 AM To: java-user@lucene.apache.org Subject: Solr throws 400 from proxy but returns fine from browser. Hi, I am

Solr throws 400 from proxy but returns fine from browser.

2015-07-30 Thread Mark Horninger
con.setRequestProperty("Accept", "application/json"); int responseCode = con.getResponseCode(); I'm not sure why exactly this call is failing with a 400. Any help that could be provided would be welcomed. Thanks in advance! -Mark [GrayHair] GHS Confidentia

RE: Does Lucene 4.6.1 compatible with Java 8?

2015-07-23 Thread Mark Horninger
ml#A999387. There are a few incompatibilities in there, particularly around number formatting. It's not entirely impossible that there were unintended challenges for their project around moving to JDK8. --Mark -Original Message- From: Xiaolong Zheng [mailto:xiaolong.zh...@mathworks.com]

[ANNOUNCE] Apache Lucene 4.10.3 released

2014-12-29 Thread Mark Miller
case, please try another mirror. This also goes for Maven access. Happy Holidays, Mark Miller http://www.about.me/markrmiller - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java

[ANNOUNCE] Apache Lucene 4.5.1 released.

2013-10-24 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Lucene™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.5.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable

Re: Regarding Compression Tool

2013-09-16 Thread Mark Miller
> > >wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trying to store all the Field values using CompressionTool, > But > > > > When I > > > > > search for any content, it is not finding any results. > > > > > > > > > > Can you help me, how to create the Field with CompressionTool to > add > > to > > > > the > > > > > Document and how to decompress it when searching for any content in > > it. > > > > > > > > > > -- > > > > > Thanks & Regards, > > > > > Jebarlin Robertson.R > > > > > > > > > > > > > > > > > > > > > -- > > > Thanks & Regards, > > > Jebarlin Robertson.R > > > GSM: 91-9538106181. > > > > > > > > > -- > Thanks & Regards, > Jebarlin Robertson.R > GSM: 91-9538106181. > -- Mark J. Miller Blog: http://www.developmentalmadness.com LinkedIn: http://www.linkedin.com/in/developmentalmadness

Re: Wildcard in PhraseQuery

2013-08-27 Thread mark harwood
See  http://lucene.apache.org/core/4_3_1/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html From: Ian Lea To: java-user@lucene.apache.org Sent: Tuesday, 27 August 2013, 10:16 Subject: Re: Wildcard in PhraseQuery See the FAQ

Assistance for Unified Index Proces

2013-08-14 Thread Mark Jason B. Nacional
ified Index". In this implementation, we have only one index file to manage. I just want to get information as to how am I going to implemented it in a an optimal way. Any suggestion would be perfect! :) Thanks! Mark Jason Nacional Junior Software Engineer

[ANNOUNCE] Apache Lucene 4.2.1 released

2013-04-03 Thread Mark Miller
April 2013, Apache Lucene™ 4.2.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.2.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-te

Re: Luke?

2013-03-15 Thread Mark Miller
If anyone is able to donate some effort, a nice future scenario could be that Luke comes fully up to date with every Lucene release: https://issues.apache.org/jira/browse/LUCENE-2562 - Mark On Mar 15, 2013, at 5:58 AM, Eric Charles wrote: > For the record, I happily use Luke (with Lucene

Is there a limit for a field size in Lucene 3.0.2

2013-02-21 Thread Mark Wilson
So my question is, Is there a size limit for a field in Lucene 3.0.2 Regards Mark On 21/02/2013 15:14, "Dyer, James" wrote: > Samuel, > > Do you think you could write a failing unit test and open a JIRA issue? Or at > the least open a JIRA issue with all the details with

Re: 3.6 - querying a no-norms field and getting document boost

2013-01-25 Thread mark harwood
Answering my own question - add optional new MatchAllDocsQuery("text") clause to factor in the encoded norms from the "text" field. From: mark harwood To: "java-user@lucene.apache.org" Sent: Friday, 25 January 2013, 16:11 Subj

3.6 - querying a no-norms field and getting document boost

2013-01-25 Thread mark harwood
I have a 3.6 index with many no-norms fields and a single text field with norms (a fairly common configuration). There is a document boost I have set at index-time that will have been encoded into the text field's norms. If I query solely on a non-text field then the ranking does not apply the

Re: Lucene 4.1 tentative release

2012-12-12 Thread Mark Miller
ly in the works! - Mark On Dec 12, 2012, at 6:50 AM, Ramprakash Ramamoorthy wrote: > Hello, > > Any 'tentative' release date for 4.1 would help. I know it is > difficult pointing a date, but still couldn't resist asking, for we could > plan accordingly. Thanks

Re: Lucene 4.0, Serialization

2012-12-04 Thread mark harwood
 implementations that execute the query. Cheers Mark From: Trejkaz To: Lucene Users Mailing List Sent: Tuesday, 4 December 2012, 9:43 Subject: Re: Lucene 4.0, Serialization On Tue, Dec 4, 2012 at 8:33 PM, BIAGINI Nathan wrote: > I need to send a cl

Re: "read past EOF" when merge

2012-11-03 Thread Mark Miller
Can you file a JIRA Markus? This is probably related to the new code that uses Directory for replication. - Mark On Nov 2, 2012, at 6:53 AM, Markus Jelsma wrote: > Hi, > > For what it's worth, we have seen similar issues with Lucene/Solr from this > week's trunk. The

Re: ComplexPhraseQueryParser and stop words

2012-11-02 Thread Mark Harwood
should parse ok as the query contains no markup that makes it a complex phrase. Cheers Mark On 1 Nov 2012, at 19:54, Brandon Mintern wrote: > We are still having the issue where ComplexPhraseQueryParser fails on > quoted expressions that include stop words. Does the original >

Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Mark Parker
gt; aparently it is... >> >> https://mail-archives.apache.org/mod_mbox/harmony-dev/200802.mbox/%3c47b2f7ae.2000...@gmail.com%3E >> >> > > Its definitely javadoc. For now I used U+: > http://svn.apache.org/vi

Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Mark Parker
ly) and converting the javadoc comments to XML documentation for good IntelliSense in Visual Studio. It works wonderfully, and we use it in very successful commercial software! Note that I'm not subscribed to the list, so please CC me if there are questions. Mark

Re: DuplicateFilter filters not only duplicates

2012-08-30 Thread mark harwood
DuplicateFilter has been mostly broken  since Lucene's switch over to segment-level filtering. Since v2.9 the calls to Filter.getDocIdSet no longer pass a "top-level" reader for accessing the whole index and instead pass a reader restricted to only accessing a single segment's contents. Becaus

Re: Creating Span Queries from Boolean Queries

2012-08-22 Thread mark harwood
>>> Ideally I'd like to take any ANDed clauses and require them to occur> >>> withing $SPAN of the other ANDs. See ComplexPhraseQueryParser? Like the standard QueryParser it uses quotes to define a phrase but also interprets any special characters between the quotes e.g. ( ) * ~ The syntax and

Re: Mapping Lucene search results with a relational database

2012-07-03 Thread mark harwood
o but it is one that I've encountered many times. Cheers Mark - Original Message - From: Jochen Hebbrecht To: java-user@lucene.apache.org Cc: Sent: Tuesday, 3 July 2012, 8:56 Subject: Mapping Lucene search results with a relational database Hi all, I have an application w

Sequence diagrams for Lucene 4.0 classes

2012-05-23 Thread mark harwood
these links to be fantastic for knocking this sort of stuff up. Cheers Mark - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Searching accross 2 fields

2012-05-22 Thread mark harwood
e flattening of this structure will muddle the data and prevent  you from knowing which age value is related to which height value - i.e. the "cross-matching" problem  (see here for overview:http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene ) Cheer

Re: Searching accross 2 fields

2012-05-21 Thread Mark Harwood
field's contents. Of course you can only go so far with this sort of flattening approach before cross-matching becomes an issue. Cheers Mark On 21 May 2012, at 19:36, Mohit Anchlia wrote: > I am new to search and just went through some concepts of "Lucene in > Action". I hav

Lucene 4 - POS and Syntactic Tagging

2012-03-14 Thread Mark McGuire
ging, would data like document length or sumTotalTermFreq be different from a document indexed without these tags? How would I counteract these differences if any occur? Thanks, Mark McGuire

Re: Nested BlockJoinQuery

2012-02-11 Thread Mark Harwood
o the set. Check out Solr faceting for your requirement Cheers, Mark On 10 Feb 2012, at 22:31, hasghari wrote: > I'm trying to learn more about using BlockJoinQuery in our search application > and I came across this blog post by Mike McCandless: > http://blog.mikemccandless.c

Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-08 Thread Mark Miller
t;> it will be finalized when Lucene 4.0 is released. >>>>>> >>>>>> -- >>>>>> lucidimagination.com >>>>>> >>>>>> ----- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> - >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> >>>> - >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> - >>> To unsubscribe, e-mail: -- - Mark http://www.lucidimagination.com

Re: ElasticSearch

2011-11-17 Thread Mark Miller
ery Parser Support -- - Mark http://www.lucidimagination.com On Thu, Nov 17, 2011 at 6:11 PM, Peter Karich wrote: > > > > I don't think it's possible. > > Eh, of course its possible (if I would understand it I would do it. no, > no, just joking ;)) > > and ye

Re: ElasticSearch

2011-11-17 Thread Mark Harwood
> > Other parameters such as filters, faceting, highlighting, sorting, > etc, don't normally have any hierarchy. I regularly mix filters and queries inside Boolean logic. Attempts to structure data (e.g. geocoding) don't always achieve 100% coverage and so for better recall you must also resor

Re: ElasticSearch

2011-11-17 Thread Mark Harwood
ay. The XMLQueryParser has a set of DTDs that currently serve to generate HTML documentation but also could conceivably be used by tooling to drive query construction. "Runnable documentation" always feels like a useful combo. Cheers Mark On 17 Nov 2011, at 20:21, Yonik Seeley w

Re: Bet you didn't know Lucene can...

2011-10-26 Thread mark harwood
st load a random sample of 100k keys from a flat file *then* start the timer on the look-ups. I'm also using public domain Wikipedia data so can release the code and data somewhere if that's of interest. Cheers Mark - Original Message - From: Dawid Weiss To: java-user@lucene.a

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Mark Harwood
e but the data wouldn't be possible. The English Wikipedia page titles are probably an equivalent size and shape so I could try and package something up around that as a benchmarking tool for others to play with. Cheers Mark On 25 Oct 2011, at 22:47, Dawid Weiss wrote: > Avg lookup time s

Re: Bet you didn't know Lucene can...

2011-10-25 Thread mark harwood
cene call I get significantly better performance (avg 0.5ms lookup vs MySQL 10ms)   Cheers Mark - Original Message - From: Grant Ingersoll To: java-user@lucene.apache.org Cc: Sent: Saturday, 22 October 2011, 10:11 Subject: Bet you didn't know Lucene can... Hi All, I'm giv

Re: optimize with num segments > 1 index keeps growing

2011-09-12 Thread Mark Miller
> we should correct the javadocs for expungeDeletes here I think: so > that its more consistent with the javadocs for optimize? > > "Requests an expunge operation..." ? > +1 - it's a documentation bug now. - Mark Miller lu

Re: What kind of System Resources are required to index 625 million row table...???

2011-08-16 Thread Mark Harwood
Check "norms" are disabled on your fields because they'll cost you1byte x NumberOfDocs x numberOfFieldsWithNormsEnabled. On 16 Aug 2011, at 15:11, Bennett, Tony wrote: > Thank you for your response. > > You are correct, we are sorting on timestamp. > Timestamp has microsecond granualarity, a

Re: implicit closing of an IndexWriter

2011-07-26 Thread Mark Miller
On Jul 26, 2011, at 9:52 AM, Clemens Wyss wrote: > Side note: I am using threads when writing and theses threads are (by design) > interrupted (from time to time) Perhaps you are seeing this: https://issues.apache.org/jira/browse/LUCENE-2239 - Mark Miller lucidimaginati

Re: Search within a sentence (revisited)

2011-07-26 Thread Mark Miller
case tests like I likely should try if I was going to commit this thing. - Mark Miller lucidimagination.com On Jul 26, 2011, at 8:56 AM, Peter Keegan wrote: > Thanks Mark! The new patch is working fine with the tests and a few more. If > you have particular test cases in mind, I'd

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
y use even more tests before feeling too confident here… I've attached a patch for 3X with the new test and fix (changed that include back to exclude). - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:29 AM, Mark Miller wrote: > Thanks Peter - if you supply the unit tests, I'

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
Thanks Peter - if you supply the unit tests, I'm happy to work on the fixes. I can likely look at this later today. - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:14 AM, Peter Keegan wrote: > Hi Mark, > > Sorry to bug you again, but there's another case that

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
I just uploaded a patch for 3X that will work for 3.2. On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: > Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change > that to an IndexReader I believe. > > - Mark > > On Jul 21, 2011, at 4:01 PM, Pe

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change that to an IndexReader I believe. - Mark On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote: > Does this patch require the trunk version? I'm using 3.2 and > 'AtomicReaderContext' isn'

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
Hey Peter, Getting sucked back into Spans... That test should pass now - I uploaded a new patch to https://issues.apache.org/jira/browse/LUCENE-777 Further tests may be needed though. - Mark On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote: > Hi Mark, > > Here is a unit tes

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: > > On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: > >> Mark Miller's 'SpanWithinQuery' patch >> seems to have the same issue. > > If I remember right (It's been more the a couple years),

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: > Mark Miller's 'SpanWithinQuery' patch > seems to have the same issue. If I remember right (It's been more the a couple years), I did index the sentence markers at the same position as the last word in the sentence. A

Re: Questions on index Writer

2011-07-16 Thread Mark Miller
My advice: Don't close the IndexWriter - just call commit. Don't worry about forcing merges - let them happen as they do when you call commit. If you are going to use the IndexWriter again, you generally do not want to close it. Calling commit is the preferred option. - M

[Announce] Lucene-Eurocon Call for Participation Closes Friday, JULY 15

2011-07-12 Thread Mark Miller
e Lucene EuroCon 2011 is presented by Lucid Imagination, the commercial entity for Apache Solr/Lucene Open Source Search; proceeds of the conference benefit The Apache Software Foundation. "Lucene" and "Apache Solr" are trademarks of the Apache Software Foundation. - Mark

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-08 Thread Mark Miller
On Jul 8, 2011, at 5:43 AM, Jahangir Anwari wrote: > I don't think this is the best > solution, am open to other alternatives. Could also make it static public where it is? Either way. - Mark Miller lucidimag

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-07 Thread Mark Miller
sp); } else if (query instanceof TermQuery) { - extractWeightedTerms(terms, query); + extractWeightedSpanTerms(terms, new SpanTermQuery(((TermQuery)query).getTerm())); } else if (query instanceof SpanQuery) { extractWeightedSpanTerms(terms, (SpanQuery) query);

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-06 Thread Mark Miller
icitly set it higher than 0 for now. Feel free to create a JIRA issue and we can give it's own default greater than 0. - Mark Miller lucidimagination.com On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote: > I have a CustomHighlighter that extends the SolrHighlighter and overrides &

Re: Corrupt segments file full of zeros

2011-06-28 Thread mark harwood
From: Michael McCandless To: java-user@lucene.apache.org Sent: Tue, 28 June, 2011 14:59:48 Subject: Re: Corrupt segments file full of zeros On Tue, Jun 28, 2011 at 9:29 AM, mark harwood wrote: > Hi Mike. >>>Hmmm -- what code are you running here, to pr

Re: Corrupt segments file full of zeros

2011-06-28 Thread mark harwood
; will retry: retry=false; gen = 3 SIS [main]: fallback to prior segment file 'segments_2' SIS [main]: success on fallback segments_2 Lucene does the right thing going back to _2. I can't yet see why in Greg's environment (NFS based) it fails to see _4vc as corrupt in the sam

Re: Corrupt segments file full of zeros

2011-06-28 Thread mark harwood
According to the spec there should at least be an Int32 of -9 to declare the Format - http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File - Original Message From: Uwe Schindler To: java-user@lucene.apache.org Sent: Tue, 28 June, 2011 12:32:34 Subject: RE: Corrupt segme

Re: Coloring search results based on score?

2011-06-16 Thread Mark Harwood
See Highlighter's GradientFormatter Cheers Mark On 16 Jun 2011, at 22:01, Itamar Syn-Hershko wrote: > Hi all, > > > Interesting question: is it possible to color search results in a web-page > based on their score? e.g. most relevant results in green, and then differe

Re: Index size and performance degradation

2011-06-14 Thread mark harwood
ations for that design choice. Cheers Mark - Original Message From: Itamar Syn-Hershko To: java-user@lucene.apache.org Sent: Tue, 14 June, 2011 9:03:15 Subject: Re: Index size and performance degradation Thanks. Our product is pretty generic and we can't assume much on the hardware,

Re: When nested indexing and search will be available?

2011-06-06 Thread Mark Harwood
As of 3.2 the necessary changes were put in to safely support indexing nested docs. See http://lucene.apache.org/java/3_2_0/changes/Changes.html#3.2.0.new_features On 6 Jun 2011, at 17:18, 周诚 wrote: > I just saw this: > https://issues.apache.org/jira/secure/attachment/12480123/LUCENE-2454.patc

Re: Ranking docs with all terms higher

2011-05-19 Thread mark harwood
Of course IDF is a factor too meaning a match on a single rare (to the overall index) term may be worth more than a match on 2 different common (to the index) terms. As Ian suggests a custom Similarity implementation can be used to tune this out. - Original Message From: Ian Lea To: j

Re: NRT consistency

2011-04-11 Thread Mark Miller
- Amazon Dynamo uses vector clocks for this. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Mark Miller >> To: java-user@lucene.

Re: NRT consistency

2011-04-11 Thread Mark Miller
On Apr 11, 2011, at 1:05 PM, Em wrote: > Thank you both! > > Mark, could you explain what you mean? I never heard from such an > index-splitter. BTW: The idea of having a segment per document sounds a lot > like an exception for too many FileDescriptors :) This is just an idea

Re: NRT consistency

2011-04-11 Thread Mark Miller
T-consistency-tp2801878p2801878.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java

Re: Difference between regular Highlighter and Fast Vector Highlighter ?

2011-04-11 Thread Mark Miller
er if you do. FVH: works with fewer query types and requires that you store term vectors - but scales better than the std Highlighter to very large documents - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org On Apr 1, 2011, at 8:

Re: Help with delimited text

2011-04-07 Thread Mark Wiltshire
Thu, Apr 7, 2011 at 8:18 AM, Mark Wiltshire<m...@redalertconsultants.co.uk> wrote:Hi Thanks Ian for you help on this, its driving me nuts :-)The StandardAnalyser is only used on the search query term being passedalso.But In this case I am just adding a filter to the search.The actual category

Re: Help with delimited text

2011-04-07 Thread Mark Wiltshire
filter on search, so only those living in /Top/Books/* categories (with *) are returned. I then want to use the actual category in the index to correctly display the item. Hope that makes sense. Many thanks Mark On 6 Apr 2011, at 12:05, Ian Lea wrote

Re: Help with delimited text

2011-04-06 Thread Mark Wiltshire
tting on the /.  Sometime it can be easierto replace a character e.g. / to _.  I think there is a lucene classthat can do that, maybe MappingCharFilter, if you don't want to do ityourself.  You will of course need to be consistent and do the sameprocessing at index and search time.--Ian.On Wed, Apr

Re: Help with delimited text

2011-04-05 Thread Mark Wiltshire
e string my self, but how do I pass this to Lucene, do I have to setup different fields ? I need to keep the full path in the index, as I want to use this when redirecting users, when clicking on the results. Any help would be great. Many thanks Regards Mark

Help with delimited text

2011-04-05 Thread Mark Wiltshire
Hi java-users I need some help. I am indexing categories into a single field category_path Which may contain items such as /Top/Books,/Top/My Prods/Book Prods/Text Books, /Maths/Books/TextBooks i.e. category paths delimited by , I want to store this field, so the Analyser tokenizes the document

Re: Early Termination

2011-03-16 Thread mark harwood
See https://issues.apache.org/jira/browse/LUCENE-1720 - Original Message From: Alex vB To: java-user@lucene.apache.org Sent: Wed, 16 March, 2011 0:12:41 Subject: Early Termination Hi, is Lucene capable of any early termination techniques during query processing? On the forum I only fo

Re: Detecting duplicates

2011-03-10 Thread mark harwood
test abc This is my test abc Another test def This is my test abc test 1 3 - Original Message From: Mark To: java

Re: Detecting duplicates

2011-03-10 Thread Mark
My understanding is It can mark documents with the same signature indicating that they are similar however there is no way at query time to return only 1 "unique" document per signature. Am I missing something? Doc 1) This is my test Doc 2) This is my test Doc 3) Another test Doc 4)

Re: Detecting duplicates

2011-03-05 Thread Mark
I'm familiar with Deduplication however I do not wish to remove my duplicates and my needs are slightly different. I would like to mark the first document with signature 'xyz' as unique but the next one as a duplicate. This way I can filter out "duplicates" during

Detecting duplicates

2011-03-04 Thread Mark
Is there a way one could detect duplicates (say by using some unique hash of certain fields) and marking a document as a duplicate but not remove it. Here is an example: Doc 1) This is my test Doc 2) This is my test Doc 3) Another test Doc 4) This is my test Doc 1 and 3 should be considered u

Re: IndexWriter.close() performance issue

2011-02-22 Thread Mark Kristensson
eld names might cause a problem opening an index reader. However, I'm not sure why closing (committing changes to) an index writer would have such a problem. Why is that? Thank you! Mark On Tue, Nov 23, 2010 at 2:22 PM, Mark Kristensson < mark.kristens...@smartsheet.com> wrote: > I&

Re: termInfosIndexDivisor vs termIndexInterval

2011-02-07 Thread mark harwood
(giving you the luxury of not having to decide on a sensible interval setting up front) Cheers Mark - Original Message From: Anuj Shah To: java-user@lucene.apache.org Sent: Mon, 7 February, 2011 10:58:02 Subject: termInfosIndexDivisor vs termIndexInterval Hi, Is someone able to

Re: Maintaining index for "flattened" database tables

2011-01-13 Thread mark harwood
deleted" column adding to help synch things up whereas option 2 can automatically record create/update/delete changes cleanly in a separate table. Either of these options help you to "remember to reindex" just the changed items. Cheers Mark - Original Message From:

Directory objects for index

2010-11-29 Thread Mark Kristensson
er handing to IndexReader.open() or new IndexWriter()? Or should I close them at the same time I close the IndexReader and IndexWriter objects? 3) Has anyone seem behavior that differs radically across OSes regarding references to open (and unclo

Re: How to Cache Filter Results between Servers

2010-11-29 Thread mark harwood
ntents of an index. They are only used to determine the same instance of a Java object e.g. so that cached items created from one IndexReader can recognise that the same IndexReader instance is still in use in the JVM when the cache is queried with a given reader as context. Cheers Mark

Re: Not query help.

2010-11-23 Thread Mark Kristensson
NOT portion). For my application, I have a user permission field that I can use to select everything that a user has access to and then I can "NOT out" the stuff specified by the != portion of the query. These, of course, are two queries that I AND together with a BooleanQuery. -Mark O

Re: IndexWriter.close() performance issue

2010-11-23 Thread Mark Kristensson
that be instead of monkeying with StringHelper or in addition to it? Thanks, Mark On Nov 20, 2010, at 5:44 AM, Yonik Seeley wrote: > On Fri, Nov 19, 2010 at 5:41 PM, Mark Kristensson > wrote: >> Here's the changes I made to org.apache.lucene.util.StringHelper: >> >> //

Re: IndexWriter.close() performance issue

2010-11-19 Thread Mark Kristensson
lly somewhere. Maybe you can try >> profiling just the init of your IndexReader? (Eg, run java with >> -agentlib:hprof=cpu=samples,depth=16,interval=1). >> >> Yes, both Index.NOT_ANALYZED_NO_NORMS and Index.NO will disable norms >> as long as no document in the inde

Re: IndexWriter.close() performance issue

2010-11-18 Thread Mark Kristensson
I finally bucked up and made the change to CheckIndex to verify that I do not, in fact, have any fields with norms in this index. The result is below - the largest segment currently is #3, which 300,000+ fields but no norms. -Mark Segments file=segments_acew numSegments=9 version

Re: IndexWriter.close() performance issue

2010-11-17 Thread Mark Kristensson
) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(:Unknown line) -Mark On Nov 17, 2010, at 1:51 PM, Michael McCandless wrote

Re: IndexWriter.close() performance issue

2010-11-17 Thread Mark Kristensson
huge amount of the CPU work is spent on String function(s). Below is the summary from the end of the java.hprof.txt. I'm happy to attach the whole file, but I wasn't sure whether that was appropriate for this mailing list. Thanks, Mark CPU SAMPLES BEGIN (total = 5295) Wed Nov 17 11:

Re: IndexWriter.close() performance issue

2010-11-05 Thread Mark Kristensson
ts Index.NOT_ANALYZED_NO_NORMS or Index.NO will not have norms on it, regardless of whether or not the field is stored. Is that not correct? Thanks, Mark On Nov 4, 2010, at 2:56 AM, Michael McCandless wrote: > Likely what happened is you had a bunch of smaller segments, and then >

Re: IndexWriter.close() performance issue

2010-11-03 Thread Mark Kristensson
unique fields names in an index or segment? Any suggestions for potentially mitigating the problem? Thanks, Mark On Nov 3, 2010, at 2:02 PM, Michael McCandless wrote: > On Wed, Nov 3, 2010 at 4:27 PM, Mark Kristensson > wrote: >> >> I've run checkIndex against the in

Re: IndexWriter.close() performance issue

2010-11-03 Thread Mark Kristensson
o I had to add some additional logging. What I found surprised me, opening a search against this index takes the same 6 to 8 seconds that closing the indexWriter takes. Thanks for your help, Mark --- Segments file=segments_9hir numSegments=4 version=FORMAT_DIAG

Re: IndexWriter.close() performance issue

2010-11-03 Thread Mark Kristensson
point and am thinking I may have to rebuild the index, though I would definitely prefer to avoid doing that and would like to know why this is happening. Thanks for your help, Mark On Nov 2, 2010, at 9:26 AM, Mark Kristensson wrote: > > Wonderful information on what happens during indexW

Re: IndexWriter.close() performance issue

2010-11-02 Thread Mark Kristensson
led with the experimental checkIndex object in the past (before we upgraded to 3.0), but have found it to be incredibly slow and of marginal value. Does anyone have any experience using CheckIndex to track down an issue with a production index? Thanks again! Mark On Nov 2, 2010, at 2:20 AM, Shai

IndexWriter.close() performance issue

2010-11-01 Thread Mark Kristensson
n any need to optimize based upon the performance of the search queries. Thanks, Mark - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Next Word - Any Suggestions?

2010-10-26 Thread mark harwood
See the Collocation stuff here https://issues.apache.org/jira/browse/LUCENE-474 - Original Message From: Lucene To: java-user@lucene.apache.org Sent: Tue, 26 October, 2010 13:27:06 Subject: Next Word - Any Suggestions? Am about to implement a custom query that is sort of mash-up of Fac

Re: Consider only documents of a category for IDF

2010-10-18 Thread mark harwood
Can you not just call reader.docFreq(categoryTerm) ? The returned figure includes deleted docs but then the search term uses this method too so should suffer from the same inaccuracy. Cheers Mark - Original Message From: Max Jakob To: java-user@lucene.apache.org Sent: Mon, 18

Re: Merge and commit behaviour - changed between 2.4 and 2.9?

2010-10-06 Thread mark harwood
ooks like this blocking logic is not new between 2.4 and 2.9 so maybe this has just appeared now simply as a result of the index reaching a particular size/state. Cheers Mark - Original Message From: Michael McCandless To: java-user@lucene.apache.org Sent: Wed, 6 October, 2010 0:18:20

Re: Merge and commit behaviour - changed between 2.4 and 2.9?

2010-10-05 Thread Mark Harwood
> > In both 2.4 and 2.9.x (and all later versions), neither .prepareCommit > nor .commit wait for merges. > > That said, if a merge happens to complete before you call those > methods, then it is in fact committed. > > Mike > > On Tue, Oct 5, 2010 at 1:13 PM, Mar

Merge and commit behaviour - changed between 2.4 and 2.9?

2010-10-05 Thread Mark Harwood
ran in the background after IW.commit calls. This seems to make sense to me but I couldn't see any direct reference to this change in behaviour in changes.txt. Can anyone confirm this change between versions? Cheers, Mark ---

Re: Federated search with opensearch or proprietary APIs for Atlassian

2010-09-02 Thread mark harwood
ing any security around source material (authentication, authorisation and auditing) * Tackling concerns around overheads in "duplicating data" for large installations Cheers, Mark - Original Message From: Lukáš Vlček To: java-user@lucene.apache.org Sent: Fri, 3 September

  1   2   3   4   5   6   7   8   9   10   >