Re: Lucene search in attachments

2015-02-10 Thread David Pilato
-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 <https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456> -- David

Re: Lucene search in attachments

2015-02-10 Thread David Pilato
I don’t understand. If you don’t raise this restriction to a higher value (or to -1), all the text won’t be extracted so only a subset of the text will be indexed. Non indexed parts of the text won’t be searchable. Did I misunderstand your question? -- David Pilato | Technical Advocate

Part of speech search with lucene

2015-03-03 Thread David Villarejo
e queries will work. (correct me if I'm wrong) The second thing I thought was to index extra info as synonyms of the term but, this way, the second query won't work since I can't ask if the first term is an adj and the specific word "brown" simultaneously. Any way to address this problem, suggestions, etc. will be appreciated. David.

Re: Part of speech search with lucene

2015-03-03 Thread David Villarejo
} { fox | > noun:fox } > > with punctuation to suggest the token graph > > -Mike > > > On 03/03/2015 01:21 PM, David Villarejo wrote: > >> After many google searchs I decided to post my problem here hoping that >> someone help me. What I want to achieve is to pe

Re: Part of speech search with lucene

2015-03-04 Thread David Villarejo
Hi Mike, Your solution work! I've been trying it with PhraseQuery and It works pretty good. Thank you so much. David. 2015-03-03 23:00 GMT+01:00 Michael Sokolov : > I believe you can accomplish what you are talking about using PhraseQuery, > say: note that it has > > public v

Classpath issue

2015-07-12 Thread David Yanay
thing wrong? I would appreciate help on this issue. Many Thanks! David. -- David Yanay CTO SmartMedia Marketing S.M.M. Derech HaYam 11, Haifa, 3463106, Israel http://www.smartmediamarketing.com Mobile: +972-50-6856644 Tel: +972-4-8583435 Fax: +972-4-8583436 LinkedIn: https://www.linkedin.com/in/yanay

Re: Best way to plug in alternative range query support

2016-05-25 Thread David Smiley
Ken, See BooleanQuery.Builder. p.s. nice to see you at Apache Big Data in Vancouver. ~ David On Thu, May 19, 2016 at 4:28 PM Ken Krugler wrote: > Hi all, > > I’ve got an alternative representation in the index for numeric fields, > and I need to construct an alternative approa

Re: highlighter with query over more than one word

2016-06-03 Thread David Smiley
It would help tremendously if you can give a specific code example showing the problem. On Thu, Jun 2, 2016 at 6:41 AM Sascha Janz wrote: > > we use highlighter to get textfragments for our hit list. > > the code is straight forward like this > >Analyzer analyzer = new StandardAnalyzer(; >

Re: "Point in polygon" search with Lucene / Spatial4j / JTS

2016-06-05 Thread David Smiley
n that example in the 4x branch but are unaware it exists in 5x & 6x or wether you deliberately referenced 4x because you must use that version. Good luck, ~ David On Sat, Jun 4, 2016 at 12:34 PM Randall Tidd wrote: > Hello, > > I have what I think is a relatively simple use case t

Lucene paid development for "SpanAndQuery" / "SpanAllNearQuery" support

2016-09-14 Thread David Sitsky
s and are interested, please send me an email to get the ball rolling. Many thanks. Cheers, David

java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97 opening an index

2016-11-24 Thread David Sitsky
ex? It is 120 GB in size and there are no backups.. :-/ Cheers, David

TimeLimitingCollector accuracy

2016-12-21 Thread David Causse
Hi, This subject has been discussed in the past but I don't think that any real solution was implemented yet. Here is a small test case to illustrate the problem: https://github.com/nomoa/lucene-solr/commit/2f025b18899038c8606da64c2cf9f4e1f643607f#diff-65ae49ceb38e45a3fc05115be5e61a2dR387 T

Re: TimeLimitingCollector accuracy

2016-12-22 Thread David Causse
Le 21/12/2016 à 13:27, David Causse a écrit : But given that some efforts have been done to separate sub scorers from "top-level" scorers (see https://issues.apache.org/jira/browse/LUCENE-5487) would it make sense now to make BulkScorers aware of some time constraints? Looking a

Highlighting and delineating Passages (fragmenting)

2017-05-26 Thread David Smiley
terface for UH-aware BreakIterators. The former (a new abstraction) would be cleaner, and might also remove a wart in the API due to the statefulness of BreakIterators. It's also kinda hard to write a BI correctly. I've implemented a few already and I know. It's an old API. ~

Re: Highlighting and delineating Passages (fragmenting)

2017-05-30 Thread David Smiley
Looks like you should use the original Highlighter until requirement #2,3 can be done with the UnifiedHighlighter. Other than #2,3, the UH can handle all these requirements, and the OH can do all. On Sat, May 27, 2017 at 6:08 AM Dawid Weiss wrote: > Thanks for your explanation, David. &g

Re: Highlighting and delineating Passages (fragmenting)

2017-05-30 Thread David Smiley
proach: https://issues.apache.org/jira/browse/LUCENE-5455 Or are the overlaps coming from passage offset ranges from separate queries to the same content? That I could understand better based on everything you said. I'm not sure how your code could be contributed in a way that fits in

Re: Term Dictionary taking up lots of heap memory, looking for solutions, lucene 5.3.1

2017-06-06 Thread David Smiley
m super pleased with the performance. ~ David On Wed, May 17, 2017 at 10:59 PM Tom Hirschfeld wrote: > Hey! > > I am working on a lucene based service for reverse geocoding. We have a > large index with lots of unique terms (550 million) and it appears that > we're running in

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-06-14 Thread David Smiley
Nice! On Tue, Jun 13, 2017 at 11:12 PM Tom Hirschfeld wrote: > Hey All, > > I was able to solve my problem a few weeks ago and wanted to update you > all. The root issue was with the caching mechanism in > "makedistancevaluesource" method in the lucene spatial module, it appears > that documents

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-07-02 Thread David Smiley
If there are no filters, then LatLonDocValuesField is going to be asked to sort all of your docs, which is obviously going to take awhile. Can you simply add a filter? Like a distance filter using LatLonPoint? On Thu, Jun 29, 2017 at 11:49 AM sc wrote: > Hi, > >I have similar requirement o

Re: Lucene GeoNear Search and Sort Performance

2017-07-14 Thread David Smiley
pose. For that strategy, you only need it to do just that, so you can enable docValues and disable the "index". That strategy accepts a FieldType in the constructor. PointVectorStrategy is limited to one point per document per field, and always uses double precision on both "x&quo

Re: Lucene GeoNear Search and Sort Performance

2017-07-16 Thread David Smiley
As I mentioned that PointVectorStrategy has an argument that accepts a Lucene FieldType that you can add docValues to. On Sun, Jul 16, 2017 at 2:07 PM sc wrote: > Thanks for the suggestion. > > I changed the strategy to > > this.strategy = new PointVectorStrategy(ctx, "pointVector"); > > And the

Re: Lucene GeoNear Search and Sort Performance

2017-07-17 Thread David Smiley
port it to Lucene 5x. It shouldn't be too hard. Since you only need this for distance sorting, you could only port what's needed; have makeQuery(...) throw an exception. createIndexableFields need only output a 2-element array, one for each DoubleDocValuesField. ~ David On Mon, Jul 1

Re: Lucene GeoNear Search and Sort Performance

2017-07-19 Thread David Smiley
here is an issue in your approach to measuring this. On Tue, Jul 18, 2017 at 9:26 PM sc wrote: > David, > > I was able to get it working with minor changes in my codebase. I didn't > have back port PointVectorStrategy class from 6.6.0 to 5.5.4 > > Code: > fina

Re: Spatial Indexing of Polygons

2017-08-15 Thread David Smiley
ng it or trying to document it. BTW I'm in FOSS4G Boston the next few days. Perhaps you might be there? ~ David On Mon, Aug 14, 2017 at 2:27 PM Tom Hirschfeld wrote: > Hey, > Is there a way to spatially index polygons that takes advantage of the new > BKD tree functionality? I was

Re: More Spatial Relations

2018-06-01 Thread David Smiley
lStrategy and ShapeValuesPredicate. You could cast the value, a Shape, to a JtsGeometry (a Spatial4j shape) and then call getGeom() to get the underlying JTS Geometry instance. If you find you need to fork entire classes then feel free to suggest improvements to the extensibility. ~ David On Tue, May 29, 201

Re: More Spatial Relations

2018-06-01 Thread David Smiley
For predicates other than "intersects", that is true :-/ Any help you might be interested in offering here is most welcome. On Fri, Jun 1, 2018 at 8:38 PM Bingtao Yin wrote: > Hi David, > > Thanks for you reply. > > Compared to the prefix tree, implementation through d

Re: How can I decode geo point postings?

2019-03-31 Thread David Smiley
Yup. And if you have the original lat/lon then you can forgo the complexity of reverse-engineering it from postings. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Thu, Mar 28, 2019 at 2:49 PM Adrien Grand wrote: > Hi Trejkaz, > > My b

Live index upgrading

2019-06-17 Thread David Allouche
Hello, I use Lucene with PyLucene on a public-facing web application. We have a moderately large index (~24M documents, ~11GB index data), with a constant stream of new documents. I recently upgraded to PyLucene 7. When trying to test the new release of PyLucene 8, I encountered an IndexForma

Re: Live index upgrading

2019-06-21 Thread David Allouche
om scratch on whatever version of Lucene > you want to use. > > Best, > Erick > > > >> On Jun 17, 2019, at 8:41 AM, David Allouche wrote: >> >> Hello, >> >> I use Lucene with PyLucene on a public-facing web application. We have a >> moder

Re: Live index upgrading

2019-06-21 Thread David Allouche
omputed index from them. Yes, Solr/ES can add database-like > behavior where they hold the true original source of the document and use > that to rebuild Lucene indices over time. But Lucene really is just a > "search index" and we need to be free to make important improvem

Re: Live index upgrading

2019-06-21 Thread David Allouche
The bottom line for me, is that I am not going to upgrade to Lucene8 for a while. The index migration would either cause a service interruption, or would require a little while to implement. I have more urgent technical debt to deal with. > On 21 Jun 2019, at 19:11, David Allouche wr

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread David Smiley
duates" to Lucene core some day. It's placement in sandbox is why it can't be added to any of Lucene's query parsers like complex phrase. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, Feb 12, 2020 at 11:07 AM wrote: > H

Re: [VOTE] Lucene logo contest

2020-06-15 Thread David Smiley
C. The current Lucene logo [4] ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 15, 2020 at 6:08 PM Ryan Ernst wrote: > Dear Lucene and Solr developers! > > In February a contest was started to design a new logo for Lucene [1

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-03 Thread David Smiley
(binding) vote: D, A1 (thanks Ryan for your thorough vote instructions & preparation)

JCC build fails with Python>=3.8

2021-11-18 Thread David Allouche
Hello, https://issues.apache.org/jira/projects/PYLUCENE/issues/PYLUCENE-52 Currently porting a code base to Python3, I have found about this issue. If I understand correctly, that means that pylucene cannot be built on Python

Re: JCC build fails with Python>=3.8

2021-11-18 Thread David Allouche
python%s.%s' % (sys.version_info[0:2])] kwds["force_shared"] = True# requires jcc/patches/patch.43 elif platform in IMPLIB_LFLAGS: jcclib = 'jcc%s%s.lib' %(py_version_suffix, debug and '_d' or '') > On 18 Nov 2021

Re: JCC build fails with Python>=3.8

2021-11-18 Thread David Allouche
python%s.%s' % (sys.version_info[0:2])] kwds["force_shared"] = True# requires jcc/patches/patch.43 elif platform in IMPLIB_LFLAGS: jcclib = 'jcc%s%s.lib' %(py_version_suffix, debug and '_d' or '') > On 18 Nov 2021

Re: JCC build fails with Python>=3.8

2021-11-18 Thread David Allouche
python%s.%s' % (sys.version_info[0:2])] kwds["force_shared"] = True# requires jcc/patches/patch.43 elif platform in IMPLIB_LFLAGS: jcclib = 'jcc%s%s.lib' %(py_version_suffix, debug and '_d' or '') > On 18 Nov 2021

Re: JCC build fails with Python>=3.8

2021-11-18 Thread David Allouche
python%s.%s' % (sys.version_info[0:2])] kwds["force_shared"] = True# requires jcc/patches/patch.43 elif platform in IMPLIB_LFLAGS: jcclib = 'jcc%s%s.lib' %(py_version_suffix, debug and '_d' or '') > On 18 Nov 2021

Re: Info required on licensing of Lucene component

2023-03-22 Thread David Smiley
I suppose this begs the question, why are we including NOTICE.txt in our distribution for *anything* we don't distribute? ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Mar 21, 2023 at 7:57 PM Michael Sokolov wrote: > Lucene is

Lucene Index Structure

2008-08-21 Thread David Lee
Clarification question: If I don't store term vectors, then I: -- won't have information on the position of matching terms -- I don't have the term frequency vector -- but I should still have the frequency of terms per document in the .frq file, right? So what's the difference between the term f

Re: Lucene Index Structure

2008-08-21 Thread David Lee
ley wrote: > > On Thu, Aug 21, 2008 at 7:20 PM, David Lee <[EMAIL PROTECTED]> wrote: >> >>> Clarification question: >>> >>> If I don't store term vectors, then I: >>> -- won't have information on the position of matching terms >>&

Clarification about segments

2008-08-22 Thread David Lee
So from what I understand, is it true that if mergeFactor is 10, then when I index my first 9 documents, I have 9 separate segments, each containing 1 document? And when searching, it will search through every segment? Thanks! David

Re: Clarification about segments

2008-08-25 Thread David Lee
the documents in memory to the disk, it will merge all the documents in that flush to one segment? Thanks! David On Sat, Aug 23, 2008 at 2:40 AM, Karsten F. <[EMAIL PROTECTED]>wrote: > > Hi David, > > this is not true, please take a look to > IndexWriter#setRAMBufferSizeM

10Gb of .nfsXXX files about a week old in NFS based index directory

2008-09-09 Thread David Loeng
Hi, We have a customer using lucene on an NFS directory, which contains ~10Gb of .nfs files. These files are the means by which NFS implements delete-on-close semantics (that is, if the index writer commits a delete of a file that is still held open by an index reader, the file is ren

Lucene AND queries

2008-09-25 Thread David Lee
hem together? Or can it limit the amount of things it needs to retrieve from the index for 'apache' based on what it has already retrieved for 'lucene'? Is there documentation on how queries work in lucene in regards to how it deals with the actual index files? David

Re: Lucene AND queries

2008-09-25 Thread David Lee
ROTECTED]> wrote: > On Thu, Sep 25, 2008 at 1:39 PM, David Lee <[EMAIL PROTECTED]> wrote: > > I was wondering when lucene queries two or more terms, does that mean the > > time it takes will be twice as long? For example if I search +lucene > > +apache, then does lucene ge

searching for string with a blank

2008-09-30 Thread David Massart
uot; doesn't return any result. I've also try to QueryParser.escape( "case study") but it doesn't seem to affect blank characters? Thanks for your help. David

Re: searching for string with a blank

2008-09-30 Thread David Massart
same operation at query time). Or are there more efficient ways to deal with the problem? Cheers, David On Tue, Sep 30, 2008 at 3:51 PM, Erick Erickson <[EMAIL PROTECTED]>wrote: > What *analyzer* are you using for your queries? Have Luke explain > your queries and I suspect you

Extracting Dates

2008-10-02 Thread David Lee
t of these projects are associated to lucene, someone might know. David Lee

querying without hits

2008-10-13 Thread David Massart
Dear all, Could one of you point me to an example of code for querying without using the deprecated class Hits ? Thank you, David

Re: querying without hits

2008-10-15 Thread David Massart
? Cheers, David On Wed, Oct 15, 2008 at 5:16 AM, Chris Hostetter <[EMAIL PROTECTED]>wrote: > > : Could one of you point me to an example of code for querying without > using > : the deprecated class Hits ? > > The demo code included with Lucene releases was updated in Luce

InstatiatedIndex questions

2008-11-19 Thread David Causse
cannot on a InstantiatedIndex because of : java.io.NotSerializableException: org.apache.lucene.index.TermVectorOffsetInfo Do you consider this as problems or normal features? Thank you. David. - To unsubscribe, e-mail: [EMAIL

Re: InstatiatedIndex questions

2008-11-19 Thread David Causse
resulting byte[] and as InstantiatedIndex is Serializable I was hoping to use the perf gain of your implementation in our context. I will fix my working copy as you suggested. Thank you. David. karl wettin a écrit : Hi David, thanks for the report! I suppose you speak of IndexWriter vs

Re: [ot] a reverse lucene

2008-11-23 Thread David Sheldon
ster (depending on your incoming document rate, though you can batch them up and do the queries every 15 mintues or something if you don't mind the lag and you're getting lots of incomming documents). Just an idea. David -- About the use of language: it is impossible to sharpen a penci

[OT] About stopwords

2008-11-27 Thread David Causse
Hi, Look at this google query : http://www.google.fr/search?q=%22HOW+at+at+of+a+A+a%22 What do you think about that concerning stop words? Google has no stop words? David. - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: [OT] About stopwords

2008-11-27 Thread David Causse
Thanks for the tip, but I can't imagine the number of documents google has to join in order process such results... There must be a trick. Maybe stopwords are not indexed alone but twice with previous and next token, some sort of 2-gram index? David. Aleksander M. Stensby a écrit :

Re: Payload Question

2008-12-15 Thread David Causse
TokenStream and TokenFilters cause we have pre-analyzed Tokens. It's very simple to use, the old api is self-explanatory, but this API seems in big move and methods are now deprecated and I couldn't understand the new API. David. Todd Benge a écrit : Hi, I've been reading about pay

Testing Precision and Recall on Lucene

2009-01-14 Thread david muchangi
Dear All, I wish to have a quick test on how lucene performs in terms of precision and recall.Anyone with a small application that I can use quickly without having to program using the APIs? Thanks. David

Re: Testing Precision and Recall on Lucene

2009-01-15 Thread david muchangi
am doing some simulation using Mathlab. Thank you David --- On Thu, 15/1/09, Murat Yakici wrote: From: Murat Yakici Subject: Re: Testing Precision and Recall on Lucene To: java-user@lucene.apache.org Date: Thursday, 15 January, 2009, 12:17 PM Let's please don't forget the scorin

Words that need protection from stemming, i.e., protwords.txt

2009-01-16 Thread David Woodward
Hi. Any good protwords.txt out there? In a fairly standard solr analyzer chain, we use the English Porter analyzer like so: For most purposes the porter does just fine, but occasionally words come along that really don't work out to well, e.g., "maine" is stemmed to "main" - clearly goofing

MergePolicy$MergeException during IndexWriter.addIndexesNoOptimize

2009-02-02 Thread David Fertig
Hello. Hopefully this is the correct forum. I am currently using release 2.3.2 as my stable release, but have tried this 2.4 as well. I have 4 threads indexing documents into separate indexes and then merging them into a larger master index. If the master index is previously corrupted (suc

Re: First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread David Causse
thing like that. David. thiruvee a écrit : Hi I am using Lucene 2.4 in our project. I am using FSdirectory to store the index. when ever index is updated the first search is very slow. I am using the combination of CustomScoreQuery and DisjunctionMaxQuery for searching. This slowness I observed

Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread David Seltzer
Hi all, I'm trying to figure out how to use SpanNearQuery.getSpans(IndexReader) when working with a result set from a query. Maybe I have a fundamental misunderstanding of what an IndexReader is - I'm under the impression that it's a mechanism for sequentially accessing the documents in an

Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-02 Thread David Seltzer
Hi All, I have a document with a field called "TextTranscript". Its created using the following command: myDoc.add(new Field("TextTranscript", sTranscriptBody, Field.Store.NO, Field.Index.TOKENIZED)); I'm then trying to retrieve the TokenStream by pulling the field. Field fTextTranscript = lucDo

RE: Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-03 Thread David Seltzer
have to mark the field as Field.Store.YES in order to see that field when you retrieve the doc at search time. You'll then be able to retrieve the string value. Mike On Thu, Apr 2, 2009 at 10:45 AM, David Seltzer wrote: > Hi All, > I have a document with a field called "Text

Faceting, Sort and DocIDSet

2009-04-17 Thread David Seltzer
itCollector and sort by a field? 2) Is using BitSets the wrong way to quickly generate facet counts? I've read about DocIDSets, but I'm not sure how to use them in the same way. (I'm basing my faceting technique on Sujit Pal's article http://sujitpal.blogspot.com/2007/04/lucene-

RE: Faceting, Sort and DocIDSet

2009-04-20 Thread David Seltzer
er document/how many in average? Possible http://www.nabble.com/Taxonomy-in-Lucene-td20929487.html is also interesting for you. Best regards Karsten David Seltzer wrote: > > I have a set of indexes, each index contains a month's worth of > Articles. I need to be able to sear

RE: Faceting, Sort and DocIDSet

2009-04-20 Thread David Seltzer
w how this behaves? Thanks, -Dave -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, April 20, 2009 10:26 AM To: java-user@lucene.apache.org Subject: Re: Faceting, Sort and DocIDSet David, One suggestion I have for your large index. Is it possible to index

RE: Faceting, Sort and DocIDSet

2009-04-21 Thread David Seltzer
From: Karsten F. [mailto:karsten-luc...@fiz-technik.de] Sent: Monday, April 20, 2009 4:00 PM To: java-user@lucene.apache.org Subject: RE: Faceting, Sort and DocIDSet Hi David, correct: you should avoid reading the content of a document inside a hitcollector. Normaly that means to cache all you need

Servlets Sharing Resources

2009-04-21 Thread David Seltzer
Hi All, Sorry for the slightly off-topic question, but I've just run into a gap in my understanding of Servlet programming. The question: Is it possible for two servlets to share access to an instance of IndexSearcher or an IndexReader? I'm thinking about setting up a Search servlet to provide XM

RE: Servlets Sharing Resources

2009-04-21 Thread David Seltzer
bably unrealistic. The stuff you want to achieve normally works by either placing objects into the HTTP session (user-bound) or attaching them to your application context (application-bound). Regards, Mindaugas On Tue, Apr 21, 2009 at 5:01 PM, David Seltzer wrote: > Hi All, > > Sorry fo

RE: Servlets Sharing Resources

2009-04-21 Thread David Seltzer
f the hassle of dealing with jndi / contexts / spring or SingleTons On Tue, Apr 21, 2009 at 12:01 PM, David Seltzer wrote: > Hi All, > > Sorry for the slightly off-topic question, but I've just run into a gap > in my understanding of Servlet programming. > > The question: Is

RE: Servlets Sharing Resources

2009-04-21 Thread David Seltzer
urse it, and do it this way in the end.. On Tue, Apr 21, 2009 at 12:56 PM, David Seltzer wrote: > That certainly seems like the simple way to solve the problem. I was > just wondering if I was overlooking a simple way to do this via web.xml > servlet-mapping. I was trying to

Boolean Logic inside a QueryWrapperFilter

2009-04-22 Thread David Seltzer
Hi Everyone, I have some code that dynamically creates a Boolean query designed to work as a filter. After the query runs I end up with this filter. Filter: QueryWrapperFilter(+(-SourceID:100) +spanNear([ArticleContent:nuclear, ArticleContent:proliferation], 30, false)) My expectation is that

Yet another NFS Question...

2009-04-27 Thread David Seltzer
Hi everyone, There has been a lot of discussion regarding Lucene+NFS pitfalls. I'm not sure how to proceed with a more distributed operation. I'm trying to take the indexing load off of our search server. I can do this either by building a new server which hosts the Indexer and the Index, or a se

Re: IndexReader.Terms - internals

2009-05-11 Thread David Causse
from outside) 3. you reached non-matching Terms by checking a prefix. If there is better way to do I'd be glad to hear of. David. Ian Vink a écrit : IndexReader rdr = IndexReader.Open(myFolder); TermEnum terms = rdr.Terms((new Term(myTermName, ""))); (

Use of tika for parsing, offsets questions

2009-09-02 Thread David Causse
tive array of tika parsed string offsets vs actual offsets and use a sort of token filter to rectify OffsetAttribute? -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Re: Use of tika for parsing, offsets questions

2009-09-04 Thread David Causse
On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote: > Hi, > > On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote: > > If I use tika for parsing HTML code and inject parsed String to a lucene > > analyzer. What about the offset information for KWIC and return

InstantiatedIndex feedback

2009-10-05 Thread David Causse
- Optimize duration : 0ms 4009 [main] DEBUG spotter - next/exportForSort/export (MATCHES_WITH_OFFSET) average : 139/62 011/287 332 ns, total 6 125 691, nb (tot/exp) 14/14 4010 [main] DEBUG spotter - Total time spent (14 result(s)) : 7ms -- David Causse Spotter http://www.spotter.com

Reverse stemmer?

2009-10-06 Thread David Leangen
a "reverse stemmer"? In other words, given the stem of a word, is there any algorithm to find the original word? Or is this just fantasy? ;-) Now, I understand that there is a 1:n mapping of stems:words. I can deal with tha

Forwarded: InstantiatedIndex questions

2009-10-06 Thread David Causse
Hi, Karl prefer to answer on the ml so here is some informations he asked on how we use InstantiatedIndex. - Forwarded message from David Causse - Date: Tue, 6 Oct 2009 15:45:57 +0200 From: David Causse To: Karl Wettin Subject: Re: InstatiatedIndex questions Hi, sorry for the delay

Re: InstantiatedIndex questions

2009-10-08 Thread David Causse
On Tue, Oct 06, 2009 at 07:51:44PM +0200, Karl Wettin wrote: > > 6 okt 2009 kl. 18.54 skrev David Causse: > > David, your timing couldn't be better. Just the other day I proposed > that we deprecate InstantiatedIndexWriter. The sum of the reasons to > this is that I&

Re: Getting left and right offsets of term search results

2009-10-09 Thread David Causse
--- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- David Causse Spotter http://www.spotter.com/ ---

localToken contains a termBuffer with 10 empty chars ('')

2009-10-17 Thread David Ginzburg
en; import org.apache.lucene.analysis.**TokenFilter; import org.apache.lucene.analysis.**TokenStream; import org.apache.lucene.analysis.**payloads.PayloadHelper; import org.apache.lucene.index.**Payload; /** * * @author david */ public class DTSynonymFilter extends TokenFilter { public DTSyn

Re: localToken contains a termBuffer with 10 empty chars ('')

2009-10-18 Thread David Ginzburg
Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: David Ginzburg [mailto:davidginzb...@gmail.com] > > Sent: Sunday, October 18, 2009 2:28 AM > > To: java-user@lucene.apache.org > > Subject: localToken

synonym payload boosting

2009-11-08 Thread David Ginzburg
Hi, I have a field and a wighted synonym map. I have indexed the synonyms with the weight as payload. my code snippet from my filter *public Token next(final Token reusableToken) throws IOException * *. * *. * *.* * Payload boostPayload;* * * *for (Synonym sy

Re: index reader for multiple indexes

2009-12-09 Thread David Causse
> View this message in context: > http://www.nabble.com/index-reader-for-multiple-indexes-tp25716741p25726159.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

much memory overhead does Tika generally require

2010-01-04 Thread Baldwin, David
I need to get a handle on how much memory Tika needs to token-ize different= file types. In other words, I need to find information on required overhe= ad (including copies of buffers made if applicable) so that I can produce s= ome kind of guidelines for memory possibly needed by users of the

Re: surrogate pairs

2010-03-12 Thread David Leangen
there a link somewhere to your project? I am very interested. Thank you! =David - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Designing a multilingual index

2010-03-31 Thread David Vergnaud
m the start, so I was wondering whether some Lucene gurus might give me some insights as to what in their eyes would be the better approach -- or whether there might be a different, much better technique I haven't thought of. Thanks a lot in advance for your support and ideas! David

Re: Designing a multilingual index

2010-04-01 Thread David Vergnaud
rience. Does anyone have any technical arguments why the one (several indices) or the other (localized fields in a single index) method might be better? Cheers, David - Original Message From: Paul Libbrecht To: java-user@lucene.apache.org Sent: Wed, March 31, 2010 10:00:

Re: Designing a multilingual index

2010-04-01 Thread David Vergnaud
al documents). If say my search term matches a document in the English index and the same document in the French index (which would quite often be the case for e.g. proper names), then how do I get about mixing the two rankings? (as I don't want to display the same result twice) I think

Too many open files

2010-04-12 Thread David Causse
398bde30b9/indexes/FR/main/_27.cfs (deleted) -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Too many open files

2010-04-12 Thread David Causse
with IW.getReader() overriding the old NRT reader reference with no care... So I'll take extra care of my NRT reader instances and pool it myself. Sorry for the noise. On Mon, Apr 12, 2010 at 12:46:02PM +0200, David Causse wrote: > Hi, > > I found a bug in my application, there was

Re: How to get the tokens for a given document

2010-04-12 Thread David Causse
#x27;m looking for alternative ways to skin this cat. > > Herb -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Analyzer for WikipediaTokenizer

2008-04-16 Thread David Etter
Is there an Analyzer for the WikipediaTokenizer? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Question: Can lucene do parallel indexing?

2008-06-27 Thread David Lee
If I'm using a computer that has multiple cores, or if I want to use several computers to speed up the indexing process, how should I do that? Is there some kind of support for that in the API? David Lee

Nested Proximity searches

2008-06-30 Thread David Lee
Is it possible to do nested proximity searches with lucene? i.e. can I say I want a to be within 1 word of b and then that group to be within 4 words of c? The syntax ""a b"~1" c"~4 doesn't seem to work (since it treats the first two quotes as a pair and the later 2 as another pair).

Do Lucene Deletes delete the physical file? If yes, is there a way not to?

2008-07-02 Thread David Lee
iling list for simple questions like this? I tried googling, but didn't seem to get the information I wanted. Thanks! David Lee

<    1   2   3   >