from:"Jose Carlos Canova"

Index size for Same DataSet.

2014-03-24 Thread Jose Carlos Canova

Hello, I have a doubt about index size, I am testing a program using Lucene to index some dataset. At the final the result of index size is varying a little, since i haven't finished the tests at all, i'm doubt if it is normal the index size vary on size among different tests. att.

Re: Index size for Same DataSet.

2014-03-25 Thread Jose Carlos Canova

dataset. Please don't compare file MD5/SHA1, the files will *not* be > identical, because order of documents may still vary. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original

Re: Apache Lucene 4.x word counting

2014-03-28 Thread Jose Carlos Canova

There is a small problem in your problem formulation and Lucene, Lucene don't count words, you count terms based on an Analyzer that you have defined during a phase called IndexWriting, such analyzer will tokenize (which does not means use the white space between the words) a sequence of strings

Re: Lucene Suggest Phrase

2014-04-01 Thread Jose Carlos Canova

Hi, I haven't reach this point but it seems that Lucene has a "suggester" project that works over "Lucene's Index" it self which simplifies terms (for query suggestion) collecting. I saw something on GitHub to be used with javascript but i cant remember now the name of the project. att. On Tue

Re: Lucene Suggest Phrase

2014-04-01 Thread Jose Carlos Canova

I've "remember now" name is /liblevenshtein works with Node.js if i am not wrong. "Lucene suggest" works on same algorithm. Which in practice is enough for words with same "character sequence". On Tue, Apr 1, 2014 at 1:30 PM, Jose Carlos Canova < jose

Re: background merge hit exception

2014-04-05 Thread Jose Carlos Canova

Seems that you want to force a max number of segments to 1, On a previous thread someone answered that the number of segments will affect the Index Size, and is not related with Index Integrity (like size of index may vary according with number of segments). on version 4.6 there is a small issue o

Re: background merge hit exception

2014-04-08 Thread Jose Carlos Canova

throw new > OutOfMemoryError(e.toString()); > } > } > } > } > } > > } else { > FileInpu

Re: NRT facet issue (bug?), hard to reproduce, please advise

2014-04-12 Thread Jose Carlos Canova

One thing that maybe affect and usually i forget is that if your object has a unique identifier (client_no) such identifier must be present on the override of "equals" methods and be part of the generation of the hashCode, otherwise if you store this object in a collection and different routines ac

Re: Multiply instead of summing two scores

2014-04-12 Thread Jose Carlos Canova

Hum, You don't have a document weight you have a Document Score in relation of other documents on the index during a search event. On practice the document weight will be the sum of the weight of the terms in relation with an Index. You might find this presentation useful. http://www.cs.cmu.edu/

Re: make data search as index progress.

2014-04-14 Thread Jose Carlos Canova

Hello, That's because NRTCachingDirectory uses a in cache memory to "mimic in memory the Directory that you used to index your files ", in theory the commit is needed because you need to flush the documents recently added otherwise this document will not be available for search until the end of th

Re: make data search as index progress.

2014-04-15 Thread Jose Carlos Canova

e > what went wrong. > > /Jason > > > > > > > > On Mon, Apr 14, 2014 at 9:01 PM, Jose Carlos Canova < > jose.carlos.can...@gmail.com> wrote: > > > Hello, > > > > That's because NRTCachingDirectory uses a in cache memory to "

Re: is there a historical reason why default conjunction operator is "OR"?

2014-04-16 Thread Jose Carlos Canova

In fact you have both, the documents at see looking at first time is first the results with all words (AND) then the ORed results, which makes perfect sense. Google sometimes marks on the result which word was not found with a "strike through". But it is not so powerful as logical operators on qu

Re: is there a historical reason why default conjunction operator is "OR"?

2014-04-16 Thread Jose Carlos Canova

e of the terms to be ranked higher, so it merely LOOKS like the > terms were ANDed. This gives you the best of both worlds. > > Using explicit operators gives you "precision", which power users will > appreciate. Average users just get annoyed when the search engine is

Re: Getting IndexWriterConfig details for a closed index

2014-04-22 Thread Jose Carlos Canova

You can persist the IndexConfiguration somewhere using a Serializable object and persisting the configuration on a "File using an ObjectOutputStream", persist the configuration on a "persistent mechanism like a Database or on a fever of the moment a JSON storage" or like "Solr" using a Xml File. I

Re: Fields, Index segments and docIds (second Try)

2014-04-29 Thread Jose Carlos Canova

My suggestion is you not worry about the docId, in practice it is an "internal lucene" id, quite similar with a rowId on a database, each index may generate a different docId (it is their problem) from a translated document, you may use your own ID that relates one document to another on different

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-11 Thread Jose Carlos Canova

try to use the lucene wildcard. *John*Mail* The analyzer is just how you want the segment terms on your index. the query parser is how you tokenize the terms that that you want to query against the index (something like that). But lucene allows you use the wild card to handle with "other cases" th

Index size for Same DataSet.

Re: Index size for Same DataSet.

Re: Apache Lucene 4.x word counting

Re: Lucene Suggest Phrase

Re: Lucene Suggest Phrase

Re: background merge hit exception

Re: background merge hit exception

Re: NRT facet issue (bug?), hard to reproduce, please advise

Re: Multiply instead of summing two scores

Re: make data search as index progress.

Re: make data search as index progress.

Re: is there a historical reason why default conjunction operator is "OR"?

Re: is there a historical reason why default conjunction operator is "OR"?

Re: Getting IndexWriterConfig details for a closed index

Re: Fields, Index segments and docIds (second Try)

Re: How to locate a Phrase inside text (like a Browser text searcher)

16 matches

Site Navigation

Mail list logo

Footer information