A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Martin
son I can't use something like the StandardTokenizer is that ngrams should really include spaces and pretty much every tokenizer gets rid of them. Thank you very much in advance for any suggestions. Regards, Martin - To u

Re: A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Martin
Ahh, I knew I saw it somewhere, then I lost it again... :) I guess the name is not quite intuitive, but anyway thanks a lot! and I'm just wondering if there is a tokenizer that would return me the whole text. KeywordTokenizer does this. -

JTRES 2012 Call for Paper

2012-02-21 Thread Martin Schoeberl
, 2012 * Camera Ready Paper Due: August 20, 2012 * Workshop: October 24-26, 2012 Program Chair: -- Andy Wellings, University of York Workshop Chair: -- Martin Schoeberl, Technical University of Denmark Program Committee

Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-04 Thread Martin O'Shea
If a Lucene ShingleFilter can be used to tokenize a string into shingles, or ngrams, of different sizes, e.g.: "please divide this sentence into shingles" Becomes: shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles" Does anyone know

RE: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-06 Thread Martin O'Shea
: 05 Sep 2012 01 53 To: java-user@lucene.apache.org Subject: Re: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene On Tue, Sep 4, 2012 at 12:37 PM, Martin O'Shea wrote: > > Does anyone know if this can be used in conjunction with other > analyzers to return the

RE: Using stop words with snowball analyzer and shingle filter

2012-09-20 Thread Martin O'Shea
Thanks for the responses. They've given me much food for thought. -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: 20 Sep 2012 02 19 To: java-user@lucene.apache.org Subject: RE: Using stop words with snowball analyzer and shingle filter Hi Martin, SnowballAna

NPE while decrement ref count

2012-11-12 Thread Martin Sachs
at org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java:394) at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:222) at org.apache.lucene.index.DirectoryReader.doClose(DirectoryReader.java:904) at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:222) Martin -- ** Dipl. I

Java HotSpot problem with search and 64-bit JVM

2012-11-12 Thread Martin Sachs
with no results yet. martin -- ** Dipl. Inform. Martin Sachs ** Senior Software-Developer / Software-Architect T +49 (30) 443 50 99 - 33 F +49 (30) 443 50 99 - 99 E martin.sa...@artnology.com Google+: martin.sachs.artnol...@gmail.com skype: ms ** artnology GmbH A Milastraße 4 / D

Re: NPE while decrement ref count

2012-11-12 Thread Martin Sachs
oh yes i missed the version: I'm using lucene 3.6.1 Martin Am 12.11.2012 09:40, schrieb Uwe Schindler: > Which Lucene version? > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >>

Re: NPE while decrement ref count

2012-11-12 Thread Martin Sachs
I write this, I download the newest oracle version and try it. Also I just enabled assertions in JVM. I have to wait for occurrence. martin Am 12.11.2012 09:56, schrieb Uwe Schindler: > Hi, > > I opened the code, the NPE occurs here: > > if (bytes != null) { >

on-the-fly "filters" from docID lists

2010-07-21 Thread Martin J
; to from a keyvalue store and then we'd query the main index with content:"cars" but only allow the docIDs that came back to be part of the response. The list of docIDs can near the hundreds of thousands. What should I be looking at to implement such a feature? Thank you Martin

RE: Use of Lucene to store data from RSS feeds

2010-10-15 Thread Martin O'Shea
ed upon the length of time required. > > This can be done as a database table and hashmaps used to calculate word > frequencies. But can I do this in Lucene to > this degree of granularity at all? If so, would each feed form a Lucene > doc

Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Martin O'Shea
"Lucene for Dummies"); And the queryString being used is simply "dummies". Thanks Martin O'Shea.

RE: Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Martin O'Shea
-- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Martin O'Shea [mailto:app...@dsl.pipex.com] > Sent: Wednesday, October 20, 2010 8:23 PM > To: java-user@lucene.apache.org > Subject: U

RE: Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Martin O'Shea
0, at 2:53 PM, Martin O'Shea wrote: > Uwe > > Thanks - I figured that bit out. I'm a Lucene 'newbie'. > > What I would like to know though is if it is practical to search a single > document of one field simply by doing this: > > IndexReader trd

Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
I've tried combinations of: addDoc(w, "lucene \"Lawton-Browne\" Lucene"); And single quotes but without success. Thanks Martin O'Shea.

RE: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
ns in StandardAnalyzer Hi Martin, StandardTokenizer and -Analyzer have been changed, as of future version 3.1 (the next release) to support the Unicode segmentation rules in UAX#29. My (untested) guess is that your hyphenated word will be kept as a single token if you set the version to 3.1 or higher i

FW: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
t: RE: Use of hyphens in StandardAnalyzer Hi Martin, StandardTokenizer and -Analyzer have been changed, as of future version 3.1 (the next release) to support the Unicode segmentation rules in UAX#29. My (untested) guess is that your hyphenated word will be kept as a single token if you se

Combining analyzers in Lucene

2011-03-05 Thread Martin O'Shea
Hello I have a situation where I'm using two methods in a Java class to implement a StandardAnalyzer in Lucene to index text strings and return their word frequencies as follows: public void indexText(String suffix, boolean includeStopWords) { StandardAnalyzer analyzer = null;

JTRES 2011 Call for Papers

2011-04-25 Thread Martin Schoeberl
. Leavens, University of Central Florida Doug Locke, LC Systems Services Kelvin Nilsen, Aonix Marek Prochazka, European Space Agency Anders Ravn, Aalborg University Corrado Santoro, University of Catania Martin Schoeberl, Technical University of Denmark Fridtjof Siebert, Aicas

SpanTermQuery getSpans

2014-04-01 Thread Martin Líška
Dear all, I'm experiencing troubles with SpanTermQuery.getSpans(AtomicReaderContext context, Bits acceptDocs, Map termContexts) method in version 4.6. I want to use it to retrieve payloads of matched spans. First, I search the index with IndexSearcher.search(query, limit) and I get TopDocs. In th

Re: SpanTermQuery getSpans

2014-04-02 Thread Martin Líška
Gregory, that was indeed my problem. Thank you very much for your support. Martin This is a reply to http://mail-archives.apache.org/mod_mbox/lucene-java-user/201404.mbox/%3CCAASL1-8jRbEG%3DLi96eDLY-Pr_zwev6vk4vk4BW_ryKF1Dnb4KA%40mail.gmail.com%3E On 1 April 2014 23:52, Martin Líška wrote

RE: A really hairy token graph case

2014-10-24 Thread Will Martin
HI Benson: This is the case with n-gramming (though you have a more complicated start chooser than most I imagine). Does that help get your ideas unblocked? Will -Original Message- From: Benson Margulies [mailto:bimargul...@gmail.com] Sent: Friday, October 24, 2014 4:43 PM To: java-us

RE: A really hairy token graph case

2014-10-24 Thread Will Martin
lemma2 PI 0 lemmaN PI 0 comp0-1 PI 0 comp1-1 PI 0 comp0-N compM-N That is, group all the first-components, and all the second-components. But now the bits and pieces of the compounds are interspersed. Maybe that's OK. On Fri, Oct 2

How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-10 Thread Martin O'Shea
I realise that 3.0.2 is an old version of Lucene but if I have Java code as follows: int nGramLength = 3; Set stopWords = new Set(); stopwords.add("the"); stopwords.add("and"); ... SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30, "English", stopWords);

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-10 Thread Martin O'Shea
dTokenizer,...); stopFilter = new StopFilter(standardFilter,...); snowballFilter = new SnowballFilter(stopFilter,...); But ignore LowerCaseFilter. Does this make sense? Thanks Martin O'Shea. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: 10 Nov 2014 14 0

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-10 Thread Martin O'Shea
ilter(boolean enablePositionIncrements, TokenStream input, Set stopWords, boolean ignoreCase) Uwe > Martin O'Shea. > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: 10 Nov 2014 14 06 > To: java-user@lucene.apache.org > Subject: RE: How to disa

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Martin O'Shea
t this one allows to handle that: You should make stop-filter case insensitive (there is a boolean to do this): StopFilter(boolean enablePositionIncrements, TokenStream input, Set stopWords, boolean ignoreCase) Uwe > Martin O'Shea. > -Original Message- > From: Uwe Schindler [mail

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Martin O'Shea
Ahmet, Yes that is quite true. But as this is only a proof of concept application, I'm prepared for things to be 'imperfect'. Martin O'Shea. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 11 Nov 2014 18 26 To: java-user@lucene.ap

CfP: ISORC 2015 - IEEE International Symposium On Real-Time Computing

2014-11-19 Thread Martin Schoeberl
, University of Leeds, United Kingdom Program Co-Chairs: -- Martin Schoeberl, Technical University of Denmark, Denmark Chunming Hu, Beihang University, China Workshop Chair: --- Marco Aurelio Wehrmeister, Federal Univ. Technology - Parana, Brazil Program Committee

CfP: ISORC 2015 - IEEE International Symposium On Real-Time Computing

2014-12-05 Thread Martin Schoeberl
, University of Leeds, United Kingdom Program Co-Chairs: -- Martin Schoeberl, Technical University of Denmark, Denmark Chunming Hu, Beihang University, China Workshop Chair: --- Marco Aurelio Wehrmeister, Federal Univ. Technology - Parana, Brazil Program Committee

ISORC 2015 - Deadline Extension: 28/12/2014

2014-12-12 Thread Martin Schoeberl
: -- Anirudda Gokhale, Vanderbilt University, USA Parthasarathi Roop, University of Auckland, New Zealand Paul Townend, University of Leeds, United Kingdom Program Co-Chairs: -- Martin Schoeberl, Technical University of Denmark, Denmark Chunming Hu, Beihang

Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

2015-01-11 Thread Martin Wunderlich
r the type IndexReader." Both problems occur here, for instance: TermPositionVector termVector = (TermPositionVector) reader.getTermFreqVector(...); ("reader" is of Type IndexReader) I would appreciate any help with these issues. Thanks a lot in advance. Cheers, Mar

Re: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

2015-01-11 Thread Martin Wunderlich
work, I guess. Cheers, Martin Am 11.01.2015 um 11:05 schrieb Uwe Schindler : > Hi, > > > > First, there is also a migrate guide next to the changes log: > http://lucene.apache.org/core/4_10_3/MIGRATE.html > > > > 1. If you implement analyzer, you have t

RE: hello,I have a problem about lucene,please help me to explain ,thank you

2015-09-22 Thread will martin
Hi: Would you mind doing websearch and cataloging the relevant pages into a primer? Thx, Will -Original Message- From: 王建军 [mailto:jianjun200...@163.com] Sent: Tuesday, September 22, 2015 4:02 AM To: java-user@lucene.apache.org Subject: hello,I have a problem about lucene,please help me t

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread will martin
http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/ -Original Message- From: Ajinkya Kale [mailto:kaleajin...@gmail.com] Sent: Monday, September 28, 2015 2:46 PM To: solr-u...@lucene.apache.org; java-user@lucene.apache.org Subject: Solr jav

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
So, if its new, it adds to pre-existing time? So it is a cost that needs to be understood I think. And, I'm really curious, what happens to the result of the post merge checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if you let it merge anyway could you get a false

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
o we implemented a check step once the index is in its final state to ensure that it is OK. So, since we want to do the check post-merge, is there a way to disable the check during merge so we don't have to do two checks? Thanks! Jim ____ From: will mar

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
rom the runtime system. The file system is EMC Isilon via NFS. Jim ____ From: will martin Sent: 29 September 2015 14:29 To: java-user@lucene.apache.org Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x? This sounds robust. Is the index

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-30 Thread will martin
call IndexReader.checkIntegrity. Mike McCandless http://blog.mikemccandless.com On Tue, Sep 29, 2015 at 9:00 PM, will martin wrote: > Ok So I'm a little confused: > > The 4.10 JavaDoc for LiveIndexWriterConfig supports volatile access on > a flag to setCheckIntegrityAtMerge ..

Re: debugging growing index size

2015-11-13 Thread will martin
Hi Rob: Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter Levart and Uwe Schindler on a lucene-dev thread 9/9/2015? MappedByteBuffer …. what OS are you on Rob? What JVM? http://bugs.java.com/view_bug.do?bug_id=4724038 http://mail-archives.apache.org/mod_mbox/lucene-dev/

Re: Jensen–Shannon divergence

2015-12-13 Thread will martin
expand your due diligence beyond wikipedia: i.e. http://ciir.cs.umass.edu/pubfiles/ir-464.pdf > On Dec 13, 2015, at 8:30 AM, Shay Hummel wrote: > > LMDiricletbut its feasibilit

Re: Jensen–Shannon divergence

2015-12-13 Thread will martin
g'luck > On Dec 13, 2015, at 10:55 AM, Shay Hummel wrote: > > Hi > > I am sorry but I didn't understand your answer. Can you please elaborate? > > Shay > > On Sun, Dec 13, 2015 at 3:41 PM will martin wrote: > >> expand your due d

Re: Jensen–Shannon divergence

2015-12-14 Thread will martin
cool list. Thanks Uwe. Opportunities to gain competitive advantage in selected domains. > On Dec 14, 2015, at 6:02 PM, Uwe Schindler wrote: > > Hi, > > Next to BM25 and TF-IDF, Lucene also privides many more similarity > implementations: > > https://lucene.apache.org/core/5_4_0/core/org/apac

Re: Any lucene query sorts docs by Hamming distance?

2015-12-22 Thread will martin
Yonghui: Do you mean sort, rank or score? Thanks, Will > On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote: > > Hi, > > Is there any query can sort docs by hamming distance if field values are > same length, > > Seems fuzzy query only works on edit distance. ---

Re: range query highlighting

2015-12-23 Thread will martin
Todd: "This trick just converts the multi term queries like PrefixQuery or RangeQuery to boolean query by expanding the terms using index reader." http://stackoverflow.com/questions/7662829/lucene-net-range-queries-highlighting beware cost. (my comment) g’luck will > On Dec 23, 2015, at 4:49

Re: Any lucene query sorts docs by Hamming distance?

2015-12-24 Thread will martin
m distance 0 to 3. > > 2015-12-22 21:42 GMT+08:00 will martin : > >> Yonghui: >> >> Do you mean sort, rank or score? >> >> Thanks, >> Will >> >> >> >>> On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote: >>> >&

Re: SolrIndexSearcher throws Misleading Error Message When timeAllowed is Specified.

2016-01-08 Thread will martin
Please read the javadoc for System.nanoTime(). I won’t bore you with the details about how computer clocks work. > On Jan 8, 2016, at 4:14 AM, Vishnu Mishra wrote: > > I am using Solr 5.3.1 and we are facing OutOfMemory exception while doing > some complex wildcard and proximity query (even fo

Re: how to backup index files with Replicator

2016-01-23 Thread will martin
Hi Dancer: Found this thread with good info that may be irrelevant to your scenario but, this in particular struck me writer.waitForMerges(); writer.commit(); replicator. replicate(new IndexRevision(writer)); writer.close(); — even though writer.close() can

Lucene 5.4 - scoring divided by number of search terms?

2016-03-13 Thread Martin Krämer
I have a simple setup with IndexSearcher, QueryParser, SimpleAnalyzer. Running some queries I recognised that a query with more than one term returns a different ScoreDoc[i].score than shown in explain query statement. Apparently it is the score shown in explain divided by the number of search term

Re: Searching in a bitMask

2016-08-27 Thread will martin
hi aren’t we waltzing terribly close to the use of a bit vector in your field caches? there’s no reason to not filter longword operations on a cache if alignment is consistent across multiple caches just be sure to abstract your operations away from individual bits….imo -will > On Aug 27, 2

Re: Multi-field IDF

2016-11-17 Thread Will Martin
are you familiar with pivoted normalized document length practice or theory? or croft's recent work on relevance algorithms accounting for structured field presence? On 11/17/2016 5:20 PM, Nicolás Lichtmaier wrote: That depends on what you want. In this case I want to use a discrimination po

Re: Multi-field IDF

2016-11-18 Thread Will Martin
In this work, we aim to improve the fi eld weighting for structured doc- ument retrieval. We fi rst introduce the notion of fi eld relevance as the generalization of fi eld weights, and discuss how it can be estimated using relevant documents, which eff ectively implements relevance feedback for f

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Will Martin
https://doi.org/10.3115/981574.981579 On 12/20/2016 12:21 PM, Dwaipayan Roy wrote: Hello, Can anyone help me understand the scoring function in the LMJelinekMercerSimilarity class? The scoring function in LMJelinekMercerSimilarity is shown below: -

Re: Format of Wikipedia Index

2018-01-22 Thread Will Martin
From the javadoc for DocMaker: * *doc.stored* - specifies whether fields should be stored (default *false*). * *doc.body.stored* - specifies whether the body field should be stored (default = *doc.stored*). So ootb you won't get content stored. Does this help? regards -will On 1/22/2

Re: How groupingSearch specifies SortedNumericDocValuesField

2019-05-14 Thread Martin Grigorov
Hi, On Tue, May 14, 2019 at 8:28 PM 顿顿 wrote: > When I use groupingSearch specified as SortedNumericDocValuesField, > I got an "unexpected docvalues type NUMERIC for field 'id' > (expected=SORTED)" Exception. > > My code is as follows: > String indexPath = "tmp/grouping"; > Analyzer sta

Re: AlphaNumeric analyzer/tokenizer

2019-08-19 Thread Martin Grigorov
Hi, On Mon, Aug 19, 2019 at 9:31 AM Uwe Schindler wrote: > You already got many responses. Check you inbox. > "many" made me think that I've also missed something. https://markmail.org/message/ohv5qcvxilj3n3fb > > Uwe > > Am August 19, 2019 6:23:20 AM UTC schrieb Abhishek Chauhan < > abhishe

Re: Limitations of StempelStemmer

2019-09-24 Thread Martin Grigorov
Hi, On Tue, Sep 10, 2019, 22:31 Maciej Gawinecki wrote: > Hi, > > I have just checked out the latest version of Lucene from Git master > branch. > > I have tried to stem a few words using StempelStemmer for Polish. > However, it looks it cannot handle some words properly, e.g. > > joyce -> ąć >

Translating Lucene Query Syntax to Traditional Boolean Syntax

2007-09-24 Thread Martin Bayly
s of the Lucene representation differently. But that's probably not an issue provided the Boolean representation is semantically equivalent to the first Lucene representation. Anyone ever tried this before or have any comments on whether my 'logic' is flawed! Thanks Martin -- V

Weird operator precedence with default operator AND

2007-10-09 Thread Martin Dietze
theparser handles the default case well, but what I get with the default operator set to AND is completely incorrect. I've seen this behaviour with both version 2.1.0 and 2.2.0. Any hints? Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+=

Re: Weird operator precedence with default operator AND

2007-10-10 Thread Martin Dietze
Lucene QueryParser, and I found it produces the same output, however the search queries are still handled correctly, i.e. the results I get indicate that deep down inside it seems to get it right in the end. Cheers, Martin -- --- / http://herbert.the-little-red-haired-gi

Re: Weird operator precedence with default operator AND

2007-10-10 Thread Martin Dietze
sounds promising, I will check this out right now! Thannk you! Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= Die Freiheit ist uns ein schoenes Weib. Sie hat einen Ober- und Unterleib. ---

Re: Weird operator precedence with default operator AND

2007-10-10 Thread Martin Dietze
Mark, On Wed, October 10, 2007, Martin Dietze wrote: > > Qsol: myhardshadow.com/qsol (A query parser I wrote that has fully > > customizable precedence support - don't be fooled by the stale website...I > > am actually working on version 2 as i have time) > >

Re: Weird operator precedence with default operator AND

2007-10-11 Thread Martin Dietze
filter out blacklisted facettes and then parse them on to solr using solrj. Maybe I am missing out on something obvious, and there's an entirely simple way to accomplish this? Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= Yoda o

Re: Weird operator precedence with default operator AND

2007-10-11 Thread Martin Dietze
x27; results in a SpanQuery `+spanNear([foo, bar], 0, true)' (I may not understand the concept here). Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= Who the fsck is "General Failure"

Re: Weird operator precedence with default operator AND

2007-10-12 Thread Martin Dietze
are part of an AND query (actually AND is our default operator, so that a query "body:boy secret_field\:xyzyq" would always fail. It seems obvious that in any case you end up parsing the query in some way... Cheers, Martin -- --- / http://herbert.the-little-red-hair

Most efficient way to find related terms

2008-02-29 Thread Martin Bayly
ot;) - don't particularly care about the frequencies as it will always be 1 for a particular doc. Other approaches? I'm going to perf test to see how (b) and (c) compare but would be glad if anyone has any insights. Thanks Martin

Get BestFrequentKeywords

2008-08-04 Thread Martin vWysiecki
terms in result set. This would be in my example: tyres, dealer tyres 3x dealer 2x How can i do that? THX -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49 (0) 621 - 71600 33 Telefax +49 (0) 621 - 71600 10 [EMAIL PROTE

Term Based Meta Data

2008-08-05 Thread Martin Owens
CR program. So, is it possible to store the data alongside the terms in lucene and then recall them when doing certain searches? and how much custom code needs to be written to do it? Best Regards, Martin Owens - To unsubscribe,

Re: Term Based Meta Data

2008-08-05 Thread Martin Owens
Thank you very much, I'm using Solr so it's very relivent to me. Even though the indexing is being done by a smaller RMI method (since Solr doesn't support streaming of very large files and has term limits) but all the searching is done through solr. Thanks again, Best Regards, M

Unique list of keywords

2008-08-08 Thread Martin vWysiecki
Hello, i have very much data, about 20GB of text, and need a unique list of keywords based on my text in all docs from the whole index. Some ideas? THX Martin -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49

Re: Term Based Meta Data

2008-08-08 Thread Martin Owens
ad of TermPositions because of that data is available without storing the text in the index. Is it possible to translate code which uses TermPositions to using TermPositionsVector with regards to payloads? Best Regards, Martin Owens On Tue, 2008-08-05 at 11:14 -0600, Tricia Williams wrote: > H

Re: Term Based Meta Data

2008-08-11 Thread Martin Owens
We're not storing the text context because a) there is rather a lot of it, b) we have the text files stored on special storage boxes mounted to the webservers and they're using directly and c) It didn't seem worth it. Thoughts? So

Results by unique id's

2008-08-12 Thread Martin vWysiecki
the same company, same company_id, same for doc 4 Is this possible? Thank you -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49 (0) 621 - 71600 33 Telefax +49 (0) 621 - 71600 10 [EMAIL PROTECTED] Geschäftsführung:

Re: Results by unique id's

2008-08-12 Thread Martin vWysiecki
-- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight custo

[Fwd: Spam filter for lucene project]

2006-10-05 Thread Martin Braun
Hello Rajiv, perhaps captcha's will solve your problem: http://en.wikipedia.org/wiki/CAPTCHA many open-source PHP products are using this like phpmyfaq and phpBB. So you can take a look at this code. hth, martin Original-Nachricht Von: Rajiv Roopan <[EMAIL P

experiences with lingpipe

2006-10-23 Thread Martin Braun
better than lucenes spell-check contribution? What about performance? What about the quality of suggestions? Does anybody have a good idea how to find typos in the index. tia, martin - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: experiences with lingpipe

2006-10-25 Thread Martin Braun
#x27;t understand what you mean with "beams"). Did I unterstand the license term correctly, that I could use Lingpipe for free when I am building a Search Engine for a Academic Website (for free use)? thanks, martin > Tuning is a big deal and I need to write a tuning tutorial. I am doing

Re: experiences with lingpipe

2006-11-02 Thread Martin Braun
t;, though I started java with -Xms1024m -Xmx1024m. How many RAM will I need for the Model (I only have 2 GB of physical RAM, and lucene's also using some memory). Is there a "rule of thumb" to calculate the needed amount of memory of the model? thanks in advance, martin >>

Re: Update an existing index

2006-11-08 Thread Martin Braun
WATHELET Thomas schrieb: > how to update a field in lucene? > I think you'll have to delete the whole doc and add the doc with the new field to the index... hth, martin - To unsubscribe, e-mail: [EMAIL PRO

Best approach for exact Prefix Field Query

2006-11-14 Thread Martin Braun
Using the regex contribution 3) a super -fast lucene function I have overseen :) with 2) I am worrying about performance, anybody have experiences with regex-queries? .. but same for 1) anybody already impolemented this already and could gi

Re: Best approach for exact Prefix Field Query

2006-11-14 Thread Martin Braun
t; in the title) but i get (correct) results for "action", What am I doing wrong here? tia, martin > > Erik > > > On Nov 14, 2006, at 8:32 AM, Martin Braun wrote: > >> hi, >> >> i would like to provide a exact "PrefixField Search&quo

Re: Best approach for exact Prefix Field Query

2006-11-16 Thread Martin Braun
the QueryParser. Is there a way to merge these two query-classes? tia, martin SpanFirstQuery = org.apache.lucene.search.spans.SpanFirstQuery SpanTermQuery = org.apache.lucene.search.spans.SpanTermQuery SpanQuery = org.apache.lucene.search.spans.SpanQuery SpanNearQuery = org.apache.l

Search "C++" with Solrs WordDelimiterFilter

2006-11-17 Thread Martin Braun
lucene-src) this feature? Should I override the WhitespaceTokenizer and using java-cc ( are there any docs on doing this?). tia, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to search string with words

2006-11-21 Thread Martin Braun
'll better do a slop of "0" SpanFirstQuery sfq = new SpanFirstQuery( new SpanNearQuery(spanq_ar,1,true), spanq_ar.length); hth, martin >Below r the codes that I wrote, please point me out where

Re: Index XML file

2006-12-14 Thread Martin Braun
tp://www-128.ibm.com/developerworks/java/library/j-lucene/ regards, martin > >Thanks > > regards, > Wooi Meng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

to boost or not to boost

2006-12-20 Thread Martin Braun
(docFreq=2) 1.0 = fieldNorm(field=AU, doc=1) so the "older" doc is better rated or with the same rank as the newer? any ideas? tia, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comm

boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Martin Braun
h I have a list of book titles and I want a sort by score AND by year of publication. But for performance reasons I want to avoid this sorting at query-time by boosting at index time. Is that possible? thanks, Martin > -- Universitaetsbibliothek Heidelberg Tel: +49 6221 54-258

spnafirstquery and multiple field instances

2006-12-21 Thread Martin Braun
n't work. The query finds only matches for the first token in that field of a document. Is there a way to do a SpanFirstQuery for each token? tia, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

autocomplete with multiple terms

2007-02-22 Thread Martin Braun
w a better way? I am not sure if we get enough queries for a search over an index base on the user-queries. the only thing I have found in the list before concerning this subject is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not sure if it does the things I wan

recovering an index from RAM disk.

2007-02-27 Thread Martin Spamer
I generate my index to the file system and load that index into a RAMDirectory for speed. If my indexer fails the directory based index can be left in an inadequate state for my needs. I therefore wish to flush the current index from the RAMDirectory back to the File system. The RAMDirectory cla

Re: similar contrib in lucene 2.1.0

2007-03-02 Thread Martin Braun
hy the code moved... hth, martin > > Cheers > Hans Lund > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --

Spelt, for better spelling correction

2007-03-20 Thread Martin Haye
the existing engine. - Other bells and whistles... There is already a standalone test program that people can try out, and we're interested in feedback. If you're interested in discussing, testing, or previewing, consider joining the Google group: http://groups.google.com/group/spelt/ --Martin

Re: Spelt, for better spelling correction

2007-03-21 Thread Martin Haye
applications that are continuously adding things to an index. Happily, it's not as important to keep the spelling dictionary absolutely up to date, so it would be fine to queue words over several index runs, and refresh the dictionary less often. --Martin On 3/20/07, Yonik Seeley <[EMAIL P

Re: Spelt, for better spelling correction

2007-03-22 Thread Martin Haye
nse for a lot of people. I'll make sure the contribution includes an index-to-dictionary API, and thank you very much for the input. --Martin On 3/21/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Martin, This sounds like the spellchecker dictionary needs to be built in parallel wi

phrases containing escaped quotes

2007-05-15 Thread Martin Kobele
Hi, I tried to parse the following phrase: "foo \"bar\"" I get the following exception: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 18. Encountered: after : "\") " Am I mistaken that "foo \"bar\""

Re: phrases containing escaped quotes

2007-05-15 Thread Martin Kobele
thank you! I was indeed using lucene 2.0 and it works very nicely with 2.1 thanks! Martin On Tuesday 15 May 2007 09:59:42 Michael Busch wrote: > Martin Kobele wrote: > > Hi, > > > > I tried to parse the following phrase: "foo \"bar\"

Obtain Lock file timeout during deleteDocument()

2007-05-30 Thread Martin Kobele
. After I deleted a document, the write.lock file is still there, and directoryOwner is still true. Maybe knowing more about this will help me to find out why I get the exception "Lock obtain timed out" after a while and after several successful document deletions. Thank you! Regar

Re: Obtain Lock file timeout during deleteDocument()

2007-05-30 Thread Martin Kobele
On Wednesday 30 May 2007 11:49:41 Michael McCandless wrote: > "Martin Kobele" <[EMAIL PROTECTED]> wrote: > > I was trying to find an answer to this. > > I call IndexReader.deleteDocument() for the _first_ time. > > If my index has several segments, my Inde

Re: Obtain Lock file timeout during deleteDocument()

2007-05-30 Thread Martin Kobele
On Wednesday 30 May 2007 11:53:09 Martin Kobele wrote: > On Wednesday 30 May 2007 11:49:41 Michael McCandless wrote: > > You are only using a single instance of IndexReader, right? If for > > example you try to make a new instance of IndexReader and then call > > deleteDoc

  1   2   >