Re: spell checking for combined words

2008-01-15 Thread Karl Wettin
16 jan 2008 kl. 00.33 skrev solr_user: I did try the Lucene SpellChecker. Currently the lucene SpellChecker does not have the ability to suggest splitting of combined words. Is there a plan to add this capability to the Lucene SpellChecker any time soon? Very few plans in this project,

Re: Why there is no IndexWriter.deleteDocument(int docNum) method?

2008-01-15 Thread Yonik Seeley
On Jan 15, 2008 7:15 PM, Alexei Dets <[EMAIL PROTECTED]> wrote: > Hi! > I'm curious, is there any particular reason why Lucene offers > IndexReader.deleteDocument(int docNum) but not > IndexWriter.deleteDocument(int docNum)? Document ids are transient and can change. To figure out which ids you wa

Why there is no IndexWriter.deleteDocument(int docNum) method?

2008-01-15 Thread Alexei Dets
Hi! I'm curious, is there any particular reason why Lucene offers IndexReader.deleteDocument(int docNum) but not IndexWriter.deleteDocument(int docNum)? Rather typical (I think) potential use case: for (int i = 0; i < indexReader.maxDoc(); ++i) { if (!indexReader.isDeleted(i)) { Document do

Re: spell checking for combined words

2008-01-15 Thread solr_user
I did try the Lucene SpellChecker. Currently the lucene SpellChecker does not have the ability to suggest splitting of combined words. Is there a plan to add this capability to the Lucene SpellChecker any time soon? I also did not quite understand your idea of producing N-word shingles and then

Re: Lucene sorting case-sensitive by default?

2008-01-15 Thread Antony Bowesman
Erick Erickson wrote: doc.add( new Field( "f", "This is Some Mixed, case Junk($*%& With Ugly SYmbols", Field.Store.YES, Field.Index.TOKENIZED)); pr

IndexWriter.deleteDocument()

2008-01-15 Thread Michael Prichard
When I run through and delete a few documents from my index, is it wise to call .flush() afterwards? Or is it better to close the index? Thanks! Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-m

Re: spell checking for combined words

2008-01-15 Thread Otis Gospodnetic
Have you tried the Lucene spellchecker first? I think it could be adapted to do want, esp with the help of LUCENE-400 to produce N-word shingles (which you can then index with the Spellchecker). I'm quite sure this could be done, in fact, and would be a nice addition to Spellchecker in general

Re: spell checking for combined words

2008-01-15 Thread solr_user
I don't have a list of common "combined word" queries. Splitting of words seem to be quite a standard thing, most search engines and spell checkers have this ability. It would be nice if Lucene provides this out of the box. karl wettin-3 wrote: > > > 14 jan 2008 kl. 19.47 skrev solr_user: >

Re: Use Lucene in order to generate tag cloud

2008-01-15 Thread Otis Gospodnetic
Dominique, look at LUCENE-400 issue in JIRA, that will help. It will be in Lucene 2.4. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Dominique Béjean <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, January 15, 2008 12:13:4

Use Lucene in order to generate tag cloud

2008-01-15 Thread Dominique Béjean
Hi, Does anybody know an implementation of Lucene in order to generate tag clouds. The idea is to index some documents in a temporary index in order to find most frequent 1-term, 2-terms and 3-terms sequences. Stop word list will eliminate common words. Ideally, terms like “driver”, “d

RE: [SOLVED] issue sorting results by string field

2008-01-15 Thread Dominique Béjean
Yes, it works much better if you help Lucene to find the sort field type with new Sort(new SortField("pubdate", SortField.STRING, true)) Thank you -Message d'origine- De : Otis Gospodnetic [mailto:[EMAIL PROTECTED] Envoyé : mardi 15 janvier 2008 17:19 À : java-user@lucene.apache.org Obje

Re: lucene as a graph store

2008-01-15 Thread Otis Gospodnetic
Re indexing performance, you are not making use of various IndexWriter parameters. My suggestion: wait another week, Lucene 2.3 will be out then. Check IndexWriter javadocs for various knobs for improving indexing performance. Actually, check the Wiki, there is a page about just that there.

Re: lucene as a graph store

2008-01-15 Thread Otis Gospodnetic
Aha, I see, a Document represents a node and all other nodes connected to it. So you can really only find 2 nodes connected with an edge with a single query, but not, say, the number of edges (degrees?) between any 2 nodes? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Integrating dynamic data into Lucene search/ranking

2008-01-15 Thread Otis Gospodnetic
Tobias, The question is a little too open, I think. Perhaps start by saying what you've tried, what doesn't work, what you think won't work, the actual rate of change, the size of your index and, very importantly, how quickly you need to see index changes (adds, deletes, updates). How about t

Re: Sort does not work for me

2008-01-15 Thread Otis Gospodnetic
Aron, I believe we now have class ExtendedFieldCacheImpl extends FieldCacheImpl implements ExtendedFieldCache And this should support sorting by longs. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Aron Sogor <[EMAIL PROTECTED]> To: java-user

Re: lucene as a graph store

2008-01-15 Thread Cam Bazz
well lets say I have a list representation of a graph like src:1 dst:2 src:2 dst:3 src:1 dst 3 outgoingEdgesOf(1) returns 2 and 3. incomingEdgesOf(3) returns 1 and 2. in a lucene index it does work out nice with term queries. I can search for incoming outgoing or edgeExist with a boolean term qu

Integrating dynamic data into Lucene search/ranking

2008-01-15 Thread Tobias Lohr
I have a more architectural question, which is maybe sort of off topic, but as I want to implement it using Java and Lucene, it's the right forum however: I'm thinking of an approach to design a system that integrates dynamic information into a search (and a ranking) functionality using Lucene.

Re: spell checking for combined words

2008-01-15 Thread Otis Gospodnetic
Hi, If some mispellings are very common, you could also turn them into synonyms. I have not tried finding any information about this, but I *think* Google may be doing that. I run a social service called Simpy at simpy.com and have Google Alerts for "simpy", but those alerts often contain matc

Re: lucene as a graph store

2008-01-15 Thread Otis Gospodnetic
Hi, - Original Message From: Cam Bazz <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, January 15, 2008 8:50:07 AM Subject: Re: lucene as a graph store Usually for implementing things like page rank, or doing centrality metric calculations or maybe dijkstras shortest

Re: issue sorting results by string field

2008-01-15 Thread Otis Gospodnetic
Dominique, I don't have the javadoc/source in front of me, but souldn't that be new Sort(new SortField(.)) ? I'm not sure if the underlying sort implementation is smart enough to avoid re-doing the same work when you call these constructors for *every* every search. If it's not smart enoug

RE: generate-maven-artifacts

2008-01-15 Thread Steven A Rowe
Hi Sergey, On 01/15/2008 at 9:57 AM, Sergey Kabashnyuk wrote: > Hi all. > I try to build mavan artifacts using from tags/lucene_2_2_0. > By calling "ant generate-maven-artifacts" > > But BUILD FAILED > /java/src/lucene/svn/java/tags/lucene_2_2_0/build.xml:366: The following > error occurred while

IndexWriter.optimize()

2008-01-15 Thread Cam Bazz
Hello, I have been running some experiments on lucene. To speed up index time, I have disabled autocommit, and I flush the indexwriter each 512 objects. So far I have tried with 256,512,1024,and 2048 and I have seen a really incredible speed difference indexing. However, if I the time required to

Re: IndexWriter.DISABLE_AUTO_FLUSH

2008-01-15 Thread Michael McCandless
Hi, This option is new in the soon-to-be-released 2.3 version of Lucene (not present in 2.2.0). Mike Cam Bazz wrote: Hello; Has the IndexWriter.DISABLE_AUTO_FLUSH been depreceated? I am using lucene core 2.2.0 and although it is in the documentation I can not access IndexWriter.DISABL

issue sorting results by string field

2008-01-15 Thread Dominique Béjean
Hi, I need to sort my search results by descending publication date. To do this, I added a field like this in all documents doc.add(new Field("pubdate", date, Field.Store.YES, Field.Index.UN_TOKENIZED)); Where date contains string formatted in this way “mmddhhmmss “ Searche

generate-maven-artifacts

2008-01-15 Thread Sergey Kabashnyuk
Hi all. I try to build mavan artifacts using from tags/lucene_2_2_0. By calling "ant generate-maven-artifacts" But BUILD FAILED /java/src/lucene/svn/java/tags/lucene_2_2_0/build.xml:366: The following error occurred while executing this line: /java/src/lucene/svn/java/tags/lucene_2_2_0/common-bui

Re: How?

2008-01-15 Thread Erick Erickson
You really have to tell us more about what you're trying to do to get a meaningful reply. What do you mean you create the index on a table? Are you using some sort of embedded SQL to query the table then creat a lucene index? How big is the index? What search are you submitting? What does your sea

IndexWriter.DISABLE_AUTO_FLUSH

2008-01-15 Thread Cam Bazz
Hello; Has the IndexWriter.DISABLE_AUTO_FLUSH been depreceated? I am using lucene core 2.2.0 and although it is in the documentation I can not access IndexWriter.DISABLE_AUTO_FLUSH Best, C.B.

Re: lucene as a graph store

2008-01-15 Thread Cam Bazz
Usually for implementing things like page rank, or doing centrality metric calculations or maybe dijkstras shortest term, this kind of (list of edges) graph is not best at performance. I like to use lucene for simple operations like neighboors of this node, or 2 degree neighboors of this node. is

Re: spell checking for combined words

2008-01-15 Thread Karl Wettin
14 jan 2008 kl. 19.47 skrev solr_user: Does Lucene spell checker have the ability to suggest splitting of combined words. So for e.g. if I have got the word "apple" and "computer" in my index and if I type "applecomputer" then how can I make it suggest "apple computer" It would probably

Re: Sort does not work for me

2008-01-15 Thread Aron Sogor
Lucky guy who gets the same problem. Found the issue: http://issues.apache.org/jira/browse/LUCENE-463 Lucene see numbrs in the field and thinks it is an int... than overflows the int. Force the sort field to be a SortField.String. Aron Sogor wrote: Let me qualify my question: Sort is not wor

Re: lucene as a graph store

2008-01-15 Thread Grant Ingersoll
I guess the question comes down to what kind of things are you going to do w/ this graph? How often are you updating links, etc? I can't say Lucene was designed for this kind of thing, but I am constantly amazed at what people use Lucene for, so I won't say it can't be done. I don't know

RE: Index merging and optimizing

2008-01-15 Thread spring
> But it also seems that the parallel/not parallel decision is > something you control on the back end, so I'm not sure the user > is involved in the merge question at all. In other words, you could > easily split the indexing task up amongst several machines and/or > processes and combine all the

Re: lucene as a graph store

2008-01-15 Thread Karl Wettin
15 jan 2008 kl. 13.17 skrev Cam Bazz: Typically, when number of objects in BTree based structure in an oodbms for example increase, the search and add times also increase. Will lucene have the same problem and how can I overcome it if it does. There is a benchmark package in the contri

Re: Spell Check + Adding records

2008-01-15 Thread Karl Wettin
15 jan 2008 kl. 07.02 skrev rakeshxp: Hello Everyone, Hi Rakesh, is there any way in which I can dynamically add records to the spell checker ? ( Reindexing everytime is a big overkill ) Start by getting the source code if you don't have it. It should not be a big deal, but it might t

lucene as a graph store

2008-01-15 Thread Cam Bazz
Hello; I like to use lucene as a graph store. The graph representation is a list of edges. Consider the code below: final int commitCount = 16 * 1024; final int numObj = 1024 * 1024; Analyzer analyzer = new KeywordAnalyzer(); FSDirectory directory = FSDirectory.g

Re: Cannot bind RMIMessenger exception:non-JRMP server at remote endpoint

2008-01-15 Thread linuxmasterjedi
Selon Chris Hostetter <[EMAIL PROTECTED]>: > > : Trying config file at path /var/www/.lsearch.conf > : Trying config file at path /usr/local/search/ls2/lsearch.conf > : 0[main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded > unicode > : decomposer > : java.rmi.ConnectIOException:

Re: A question about IndexerReader.termPositions()

2008-01-15 Thread Grant Ingersoll
Wildcard "terms" get expanded by the rewrite() method on WildcardQuery to Term instances during processing. Thus, you would have to TermEnum that the WildCardQuery uses in order to get the individual terms first, then you could get the term positions. -Grant On Jan 15, 2008, at 3:39 AM, T

A question about IndexerReader.termPositions()

2008-01-15 Thread Terry Yang
Hi,ALL Playing with an algorithm(Summarize/Highlight Based on Slide Windows), i find that IndexerReader.termPositions(Term term) not support wildcard term. Is it meaningful or not to write a patch to support wildcard term? - To u

RE: Lucene sorting case-sensitive by default?

2008-01-15 Thread Toke Eskildsen
On Mon, 2008-01-14 at 10:58 -0500, Alex Wang wrote: > Toke, you mentioned "Using a Collator works but does take a fair amount > of memory", can you please elaborate a little more on that. Thanks. We have an index with 10 million records that takes up 37GB. Practically all records have a title, whi