Re: Can I using HFS in lucene 2.3.1?

2008-04-25 Thread Mathieu Lecarme
Alex Chew a écrit : Hi, Does somebody have practice building a distributed application with lucene and Hadoop/HFS? Lucene 2.3.1 looks not explose HFSDirectory. Any advice will be appreciated. Regards, Alex have a look to Nutch. M. --

Re: Lucene and Google Web 1T 5 Gram

2008-04-24 Thread Mathieu Lecarme
Rafael Turk a écrit : Hi Mathieu, *What do you wont to do?* An spell checker and related keyword suggestion Here is a spell checker wich I try to finalize : https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java If you wont an ngram => popularity map, just use a berkl

Re: Lucene and Google Web 1T 5 Gram

2008-04-23 Thread Mathieu Lecarme
Rafael Turk a écrit : Hi Folks, I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams(single words) to five-grams) I´m loading each ngram (each row is a ngram) as an

Re: Lucene index on relational data

2008-04-12 Thread Mathieu Lecarme
Regarding data and its relationships - the use case I am trying to solve is to partition my data into 2 indexes, a primary index that will contains majority of the data and it is fairly static. The secondary index will have related information for the same data set in primary index and this relate

Re: Lucene index on relational data

2008-04-11 Thread Mathieu Lecarme
ild a Filter for the second, just like with the previous JDBC example. You can even cache the filter, like Solr does with its faceted search. M. Regards, Rajesh --- Mathieu Lecarme <[EMAIL PROTECTED]> wrote: Have a look at Compass 2.0M3 http://www.kimchy.org/searchable-cascading-map

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Mathieu Lecarme
Antony Bowesman a écrit : We're planning to archive email over many years and have been looking at using DB to store mail meta data and Lucene for the indexed mail data, or just Lucene on its own with email data and structure stored as XML and the raw message stored in the file system. For so

Re: Lucene index on relational data

2008-04-11 Thread Mathieu Lecarme
Have a look at Compass 2.0M3 http://www.kimchy.org/searchable-cascading-mapping/ Your multiple index will be nice for massive write. In a classical read/write ratio, Compass will be much easier. M. Rajesh parab a écrit : Hi, We are using Lucene 2.0 to index data stored inside relational dat

Re: Use of Lucene for DB Search

2008-04-10 Thread Mathieu Lecarme
have a look at Compass. M. Prashant Saraf a écrit : Hi, We are planning to provide search functionality in the a web base application. Can we use Lucene for it to search data from database like oracle and MS-Sql? Thanks and Regards प्रशांत सराफ (Prashant Saraf) S

Re: designing a dictionary filter with multiple word entries

2008-04-09 Thread Mathieu Lecarme
Allen Atamer a écrit : My dictionary filter currently implements next() and everything works well when dictionary entries are replaced one-to-one. For example: Can => Canada. A problem arises when I try to replace it with more than one word. Going through next() I encounter "shutdown". But

Re: Questions about use of SpellChecker: Constructor and Simillarity...

2008-04-08 Thread Mathieu Lecarme
I'm cool :) I just think you are overcomplicating things. Yes... I can use two words and OR Suposse I query on this The Lord of Rings: Return of King The Lord of Rings: Fellowship The Lord of Rings: The Two towers The Lord of Weapons The Lord of War Suposse an user search: "The Lord of Rings

Re: Questions about use of SpellChecker: Constructor and Simillarity...

2008-04-08 Thread Mathieu Lecarme
Le 8 avr. 08 à 18:34, Karl Wettin a écrit : dreampeppers99 skrev: 1º Why need I pass a Directory objecto (obligatory) on constructor of SpellChecker? Mainly because it is a nasty peice of code. But it does a good job. Because spellChecker use a directory to store data. It can be FSDirectory

Re: Questions about use of SpellChecker: Constructor and Simillarity...

2008-04-08 Thread Mathieu Lecarme
Use shingleFilter. I'm working on a wider SpellChecker, I'll post a third patch soon. https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java M. dreampeppers99 a écrit : Hi, I have two question about this GREAT tool.. (framework, library... "whatever") Well I decide put spe

Re: Indexing and Searching from within a single Document

2008-04-08 Thread Mathieu Lecarme
[EMAIL PROTECTED] a écrit : The need is: I have millions of entries in database, each entry is in such format (more or less) ID NameDescription start (number) stop(number) Currently my application uses the database to do search, queries are in the following format: Select * fr

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Mathieu Lecarme
Marjan Celikik a écrit : Mathieu Lecarme wrote: wever I don't fully understand what do you mean by "iterate over your query". I would like a conceptual answer how is this done with Lucene, not a technical one.. Your query is a tree, with BooleanQuery as branch and other que

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Mathieu Lecarme
Marjan Celikik a écrit : Mathieu Lecarme wrote: You have to iterate over your query, if it's a BooleanQuery, keep it, if it's a TermQuery, replace it with a BooleanQuery with all variants of the Term with Occur.SHOULD M. Thanks.. however I don't fully understand what

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Mathieu Lecarme
Marjan Celikik a écrit : Hi everyone, I know that there are packages that support the "Did you mean ... ?" search features with lucene which tries to find the most suited correct-word query.. however, so far I haven't encountered the opposite search feature: given a correct query, find all docum

Re: stemming in Lucene

2008-04-02 Thread Mathieu Lecarme
Wojtek H a écrit : Hi all, Snowball stemmers are part of Lucene, but for few languages only. We have documents in various languages and so need stemmers for many languages (in particular polish). One of the ideas is to use ispell dictionaries. There are ispell dicts for many languages and so thi

Re: Integrating Spell Checker contributed to Lucene

2008-03-26 Thread Mathieu Lecarme
Ivan Vasilev a écrit : Thanks Mathieu, I tryed to checkout but without success. Anyway I can do it manually, but as the contribution is still not approved from Lucene our chiefs will not whant it to be included to our project by now. It's a right decision. I hope the third patch will be good

Re: Integrating Spell Checker contributed to Lucene

2008-03-26 Thread Mathieu Lecarme
Ivan Vasilev a écrit : Thanks Mathieu for your help! The contribution that you have made to Lucene by this patch seems to be great, but the hunspell dictionary is under LGPL which the lawyer of our company does not like. It's the spell tool used by Openoffice and firefox. Data must be multi l

Re: Integrating Spell Checker contributed to Lucene

2008-03-25 Thread Mathieu Lecarme
Ivan Vasilev a écrit : Hi Guys, Has anybody integrated the Spell Checker contributed to Lucene. http://blog.garambrogne.net/index.php?post/2008/03/07/A-lexicon-approach-for-Lucene-index https://issues.apache.org/jira/browse/LUCENE-1190 I need advise from where to get free dictionary file (one

Re: Call Lucene default command line Search from PHP script

2008-03-25 Thread Mathieu Lecarme
milu07 a écrit : Hello, My machine is Ubuntu 7.10. I am working with Apache Lucene. I have done with indexer and tried with command line Searcher (the default command line included in Lucene package: http://lucene.apache.org/java/2_3_1/demo2.html). When I use this at command line: java Searcher

Re: Relevance

2008-03-19 Thread Mathieu Lecarme
luceneuser a écrit : Hi All, I need help on retrieving results based on relevance + freshness. As of now, i get based on either of the fields, either on relevance or freshness. how can i achieve this. Lucene retrieves results on relevance but also fetches old results too. i need more relevan

Re: Language identification ??

2008-03-14 Thread Mathieu Lecarme
Raghu Ram a écrit : to complicate it further ... the text for which language identification has to be done is small, in most cases a short sentence like " I like Pepsi ". Can something be done for this ? Drinking water? More seriously, if ngram pattern language guessing is too ambigous, sear

Re: Language identification ??

2008-03-14 Thread Mathieu Lecarme
Itamar Syn-Hershko a écrit : For what it worths, I did something similar in my BidiAnalyzer so I can index both Hebrew/Semitic texts and English/Latin words without switching analyzers, giving each the proper treatment. I did it simply by testing the first char and looking at its numeric value -

Re: Search against an index on a mapped drive ...

2008-03-14 Thread Mathieu Lecarme
Dragon Fly a écrit : Hi, I'd like to find out if I can do the following with Lucene (on Windows). On server A: - An index writer creates/updates the index. The index is physically stored on server A. - An index searcher searches against the index. On server B: - Maps to the index directory.

Re: Language identification ??

2008-03-14 Thread Mathieu Lecarme
Raghu Ram a écrit : Hi all, I guess this question is a bit off the track. Are there any language identification modules inside Lucene ??? If not can somebody please suggest me a good one. Thank You. nutch provide a tool for that, with ngram pattern, just like OO.o do it. M. ---

Using Lucene from scripting language without any java coding

2008-03-12 Thread Mathieu Lecarme
Here is a POC about using Lucene, via Compass, from PHP or Python (other languages will come later), with only XML configuration, object notation, and native use of scripting language. http://blog.garambrogne.net/index.php?post/2008/03/11/Using-Compass-without-dirtying-its-hands-with-java It's

Re: Best way to do Query inflation?

2008-03-10 Thread Mathieu Lecarme
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java/lexicon/src/java/org/apache/lucene/lexicon/QueryUtils.java M. Itamar Syn-Hershko a écrit : Hi all, I'm looking for the best way to inflate a query, so a query like: "synchronous AND colour" -- will become something lik

Re: Using a thesaurus/onthology

2008-03-05 Thread Mathieu Lecarme
Borgman, Lennart a écrit : Is there any possibility to use a thesaurus or an onthology when indexing/searching with Lucene? Yes. the WordNet contrib do that. And with a token filter, it's easy to use your own. What do you wont to do? M. ---

Re: bigram analysis

2008-03-03 Thread Mathieu Lecarme
Not sure, you might want to ask on Nutch. From a strict language standpoint, the notion of a stopword in my mind is a bit dubious. If the word really has no meaning, then why does the language have it to begin with? In a search context, it has been treated as of minimal use in the early da

Re: Does Lucene support partition-by-keyword indexing?

2008-03-03 Thread Mathieu Lecarme
And yes, each node contains their own documents and builds index against them. On Mon, Mar 3, 2008 at 1:30 AM, Mathieu Lecarme <[EMAIL PROTECTED]> wrote: Thanks for your clean and long answer. With Term splitted on different node you've got different drawback : PrefixQuery and Fu

Re: Avoid stemming to get exact word in search results

2008-03-03 Thread Mathieu Lecarme
There's no syntax to restore stemmed word. Stemming is done while reading the news, so the index never knows the complete word. I submit a patch for that : https://issues.apache.org/jira/browse/LUCENE-1190 Be careful, rssbandit use .net lucene, not the java version. M. secou a écrit : Hi,

Re: Does Lucene support partition-by-keyword indexing?

2008-03-02 Thread Mathieu Lecarme
he documents to be indexed are not necessarily web pages. They are mostly files stored on each node's file system. Node failures are also handled by replicas. The index for each term will be replicated on multiple nodes, whose nodeIDs are near to each other. This mechanism is handled

Re: Does Lucene support partition-by-keyword indexing?

2008-03-02 Thread Mathieu Lecarme
Le 2 mars 08 à 03:05, 仇寅 a écrit : Hi, I agree with your point that it is easier to partition index by document. But the partition-by-keyword approach has much greater scalability over the partition-by-document approach. Each query involves communicating with constant number of nodes; whi

Re: Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread Mathieu Lecarme
The easiest way is to split index by Document. In Lucene, index contains Document and inverse index of Term. If you wont to put Term in different place, Document will be duplicated on each index, with only a part of their Term. How will you manage node failure in your network? They were so

Re: Alternate spelling suggestion (was [Resent] Document boosting based on .. semantics? )

2008-02-29 Thread Mathieu Lecarme
Hi Mathieu Lecarme wrote: On a related topic, I'm also searching for a way to suggest alternate spelling of words to the user, when we found a word which is very less frequent used in the index or not in the index at all. I'm Austrian based, when I e.g. search for "r

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Mathieu Lecarme
Grant Ingersoll a écrit : On Feb 29, 2008, at 5:39 AM, Mathieu Lecarme wrote: Petite Abeille a écrit : A proposal for a Lua entry for the "Google Summer of Code" '08: A Lua implementation of Lucene. For me, Lua is just a glue between C coded object, a super config fil

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Mathieu Lecarme
Petite Abeille a écrit : A proposal for a Lua entry for the "Google Summer of Code" '08: A Lua implementation of Lucene. For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. Lulu will work on top of Lucy? Did I miss something? M. --

Re: Indexing source code files

2008-02-28 Thread Mathieu Lecarme
Dharmalingam a écrit : I am working on some sort of search mechanism to link a requirement (i.e. a query) to source code files (i.e., documents). For that purpose, I indexed the source code files using Lucene. Contrary to traditional natural language search scenario, we search for code files that

Re: How do i get a text summary

2008-02-28 Thread Mathieu Lecarme
[EMAIL PROTECTED] a écrit : If you want something from an index it has to be IN the index. So, store a summary field in each document and make sure that field is part of the query. And how could one create automatically such a summary? Have a look to http://alias-i.com/lingpipe/index.h

Re: Rebuilding Document from index?

2008-02-26 Thread Mathieu Lecarme
Yes, I've found a tester! A patch was submited for this kind of job : https://issues.apache.org/jira/browse/LUCENE-1190 And here is the svn work in progress : https://admin.garambrogne.net/subversion/revuedepresse/trunk/src/java/lexicon And the web version : https://admin.garambrogne.net/projets

Re: [Resent] Document boosting based on .. semantics?

2008-02-20 Thread Mathieu Lecarme
Markus Fischer a écrit : Hi, [Resent: guess I sent the first before I completed my subscription, just in case it comes up twice ...] the subject may be a bit weird but I couldn't find a better way to describe a problem I'm trying to solve. If I'm not mistaken, one factor of scoring is the

Re: Apostrophe filtering in StandardFilter

2008-01-29 Thread Mathieu Lecarme
christophe blin a écrit : Hi, thanks for the pointer to the ellision filter, but I am currently stuck with lucene-core-2.2.0 found in maven2 central repository (do not contain this class). I'll watch for an upgrade to 2.3 in the future. you can backport it easily with copy-paste. M. --

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Mathieu Lecarme
Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind of "wildcard fuzzy" but not real fuzzy anymore. I understand the optimitation but right now I hardly can image a reasonable use-case. Who care whether the levenstein distance is a the beginnen, middle

Re: Why exactly are fuzzy queries so slow?

2007-11-24 Thread Mathieu Lecarme
fuzzy are simply not indexed. If you wont to search quickly with fuzzy search, you should index word and their ngrams, it's the "do you mean" pattern. you first select used word wich share ngram with the query word, the distance is computed with levenstein, and you use this word as a synon

Re: Searching Exact Word from Index

2007-09-10 Thread Mathieu Lecarme
Laxmilal Menaria a écrit : > Hello Everyone, > > I want to search 'abc-d' as exact keyword not 'abc d'. KeywordAnalyzer can > be used for this purpose. StandradAnalyzer create different tokens for > 'abc-d' as 'abc' and 'd'. > But I can not use this, becuase I am indexing the content of a text fil

Re: reg-ex based stop word removal

2007-08-22 Thread Mathieu Lecarme
sandeep chawla a écrit : > Hi , > > I am working on a search application . This application requires me to > implement a stop filter > using a stop word list. I have implemented a stop filter using lucene's API. > > I want to take my application one step further. > > I want to remove all the words

Re: Indexing PDF documents with structure information

2007-08-14 Thread Mathieu Lecarme
Thomas Arni a écrit : > Hello Luceners > > I have started a new project and need to index pdf documents. > There are several projects around, which allow to extract the content, > like pdfbox, xpdf and pjclassic. > > As far as I studied the FAQ's and examples, all these > tools allow simple text ex

Re: 答复: 答复: Lucene in large database contexts

2007-08-11 Thread Mathieu Lecarme
With Compass, indexing is linked to your database transaction, when your object is persisted, it's indexed too. All your questions are managed cleanly and silently by Compass, just have a look to the source code if you don't wont to use this product. M. Le 10 août 07 à 12:24, Antonello Prove

Re: frequent phrases

2007-08-10 Thread Mathieu Lecarme
some tools exist for finding duplicated parts in document. You split document in phrase, and build ngram with word. If you wont complete phrase, work with all words, for a partial, work with 5 words ngram, for example. ngram list is convert to hash, and hash is used as an indexed Field for t

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Mathieu Lecarme
Jens Grivolla a écrit : > Hello, > > I'm looking to extract significant terms characterizing a set of > documents (which in turn relate to a topic). > > This basically comes down to functionality similar to determining the > terms with the greatest offer weight (as used for blind relevance > feedba

Re: Lucene and Eastern languages (Japanese, Korean and Chinese)

2007-07-25 Thread Mathieu Lecarme
Le mardi 24 juillet 2007 à 13:01 -0700, Shaw, James a écrit : > Hi, guys, > I found Analyzers for Japanese, Korean and Chinese, but not stemmers; > the Snowball stemmers only include European languages. Does stemming > not make sense for ideograph-based languages (i.e., no stemming is > needed for

Re: Synonym Search and stemming

2007-07-23 Thread Mathieu Lecarme
IMHO, stemming is hurting index. A stemmed index can't be use for completion or other kind of search. But stemming is nice for synonyms search. You should look for spellchecker code. If you index your word with stemmed version, you can provide a synonym filter, just like wordnet example. M. Le lu

Re: Lucene indexing for PDM system like Windchill

2007-07-23 Thread Mathieu Lecarme
Le dimanche 22 juillet 2007 à 13:17 -0500, Dmitry a écrit : > Mathieru, > I never used Compass, i know that there is integration Shards /Search with > Hibernate, but it absolutely different what actually I need, probably I can > take a look on it. any way thanks > thanks, > DT Not only hibernate

Re: search through all fields

2007-07-22 Thread Mathieu Lecarme
e a relational database? On 7/17/07, Mathieu Lecarme < [EMAIL PROTECTED]> wrote: http://www.opensymphony.com/compass/ The project is free, following Lucene version quickly, the forum is great, and the lead developer is quick reacting. M. Mohammad Norouzi a écrit : > Mathieu, >

Re: Lucene indexing for PDM system like Windchill

2007-07-22 Thread Mathieu Lecarme
If you wont to index Hibernate persisted data, just use Compass. M. Le 22 juil. 07 à 04:19, Dmitry a écrit : Folks, Trying to integrate PDM system : WTPart obejct with Lucene indexing search framework. Part of the work is integration with persistent layer + indeces storage+ mysql Could no

Re: search through all fields

2007-07-17 Thread Mathieu Lecarme
http://www.opensymphony.com/compass/ The project is free, following Lucene version quickly, the forum is great, and the lead developer is quick reacting. M. Mohammad Norouzi a écrit : > Mathieu, > I need an object mapper for lucene would you please give me the > Compass web > site? is it open so

Re: search through all fields

2007-07-17 Thread Mathieu Lecarme
Sorry, I use Compass, an object mapper for Lucene, and it provides a special field "all", I thought it was a Lucene feature. M. Renaud Waldura a écrit : > Often documents can be divided in "metadata" and "contents" sections. Say > you're indexing Web pages, you could index them with HEAD data all

Re: search through all fields

2007-07-14 Thread Mathieu Lecarme
you can use the "all" special field, but you loose the differents boost values. M. Le 14 juil. 07 à 10:50, Mohammad Norouzi a écrit : Hello all is there any way to search through all the fields without using MultiFieldQueryParser? currently I am using this parser but it requires to pass all

Re: User Defined Matcher

2007-07-13 Thread Mathieu Lecarme
You creates your own class wich extends SynonymTokenFilter. You pipe it with your Analyzer. M. Le 13 juil. 07 à 19:09, Mohsen Saboorian a écrit : Thanks for quick replies. Mathieu Lecarme wrote: Uses synonyms in the query parser Sorry, I didn't get the point of synonym. Ca

Re: User Defined Matcher

2007-07-13 Thread Mathieu Lecarme
Mohsen Saboorian a écrit : > Is it possible that I inject my own matching mechanism into Lucene > IndexSearcher? In other words, is this possible that my own method be called > in order to collect results (hits)? Suppose the case that I want to match - > for example - "foo" with both "foo" and "oof

Re: Index partitioning by term

2007-07-04 Thread Mathieu Lecarme
Ndapa Nakashole a écrit : > I am considering using Lucene in my mini Grid-based search engine. I > would > like to partition my index by term as opposed to partition by > document. From > what i have read in the mailing list so far, it seems like partition > by term > is impossible with Lucene. am

Re: inserting millions of entries

2007-06-28 Thread Mathieu Lecarme
stop writing scp index to another computer play with it scp indexModified to the server mv indexModified indexCurrent all done. mv is atomic. Jens Grivolla a écrit : > Hi, > > I have a Lucene index with a few million entries, and I will need to > add batches of a few hundred thousand or a few mil

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Mathieu Lecarme
Samuel LEMOINE a écrit : > I'm acutely interrested by this issue too, as I'm working on > distributed architecture of Lucene. I'm only at the very beginning of > my study so that I can't help you much, but Hadoop maybe could fit to > your requirements. It's a sub-project of Lucene aiming to paralle

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Mathieu Lecarme
Server One handle website Server Two is a light version of tomcat wich handle Lucene Search In front, a lighttpd which use server two for /search, and server one for all others things You can add lucene server with round robin in lighttpd with this scheme. Careful with fault tolerance and index

Re: Content Summarization

2007-06-18 Thread Mathieu Lecarme
It's not so far from Lucene! http://en.wikipedia.org/wiki/Sentence_extraction have a look at wordnet (http://wordnet.princeton.edu/). Get some list of articles, verb, nouns, and affix rules (like aspell, myspell ...) You will use more cooking rules than code. M. Le 18 juin 07 à 20:29, Mordo,

Re: Lucene for chinese search

2007-06-18 Thread Mathieu Lecarme
Lee Li Bin a écrit : > Hi, > > I still met problem for searching of Chinese words. > XMl file which is the datasource and analyzer has already been encoded. > Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it > still can't get any results. > > 1.do we need any encoding

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Compass use a trick to manage father-son indexation. If you index "collection", with a fields Date, wich are the newest picture inside, and putting all picture's keyword to it collection? Then, with a keyword search, you will find the collection with the most tag occurence number and date s

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Walt explain differently what I said. Lucene can be efficiently use for selecting objects, without sorting or scoring anything, then, with id stored in Lucene, you can sort yourself with a simple Sortable implementation. The only limit is that lucene gives you not too much results, with your

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
e rules. >> >> M. >> >> Antoine Baudoux a écrit : >>> The problem is that i want lucene to do the sorting, because the query >>> qould return thousands of results, and I'm displaying documents one >>> page at a time. >>> >>>

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
ith at most 300 elements you can sort it with strange rules. M. Antoine Baudoux a écrit : > The problem is that i want lucene to do the sorting, because the query > qould return thousands of results, and I'm displaying documents one > page at a time. > > > On 15 Jun 2007, at 17

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
First step is to feed a Set with "collection" Second step is to sort it. With a sortedSet, you can do that, isnt'it? M. Antoine Baudoux a écrit : > Could-you be more precise? I dont understand what you mean. > > > > On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote:

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-15 Thread Mathieu Lecarme
Your request seems to be a two steps query. First step, you select image, and then collection Second step, you sort collection. BitVector can help you? M. Antoine Baudoux a écrit : > Hi, > > I'm developping an image database. Each lucene document > representing an image contains (among ot

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mathieu Lecarme
if you don't use the same tokenizer for indexing and searching, you will have troubles like this. Mixing exact match (with ") and wildcard (*) is a strange idea. Typographical rules says that you have a space after a comma, no? Your field is tokenized? M. Renaud Waldura a écrit : > My very simple

Re: How to implement AJAX search~Lucene Search part?

2007-06-09 Thread Mathieu Lecarme
You can work like with lucene spelling. A specific Index with word as Document, boost with something proportionnal of number of occurences (with log and math magic) The magical stuff is n Fields with starting ngram, not stored, no tokenized. For example, if you wont to index the word "carott",

Re: Indexing MSword Documents

2007-06-08 Thread Mathieu Lecarme
Why don't use Document? http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ org/apache/lucene/document/Document.html HTMLDocument manage HTML stuff like encoding, header, and other specificity. Nutch use specific word tools (http://lucene.apache.org/nutch/apidocs/ org/ap

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Mathieu Lecarme
If you do that, you enumerate every terms!!! If you use a alphabeticaly sorted collection, you can stop, when match stop, but, you have to test every terms before matching. Lucene gives you tools to match begining of a term, just use it!! M. Le 8 juin 07 à 14:57, Patrick Turcotte a écrit : H

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Mathieu Lecarme
have a look of opensearch.org specification, your self-completion will work with IE7 and Firefox 2. JSON serialization is quicker than XML stuff. Be careful to limit the number of responses. A search in "test*" works very well in my project with ten thousands of documents. Begin completion onl