semantic vectors

2009-03-31 Thread nitin gopi
hi all, I want to know everything about semantic vectors. I want to know how does it indexes the documents such that the results produced are semantically better than normal search. I also want to know how it is different from semantic web, which uses the concept of ontologies and metadata.

What is the right query syntax for matching some field's substring?

2009-03-31 Thread Bon
Hi all, I've a question about the query syntax statement, There is a lucene text field and the value of the field like ,11,12,15,16, if I want to query some data and the value of the field has included some number what I like(11 or 15), how can I do? I try to give a query like

Re: IndexWriter.deleteDocuments(Query query)

2009-03-31 Thread John Wang
So do you think it is a good addition/change to the current api now? -John On Tue, Mar 31, 2009 at 2:18 PM, Yonik Seeley wrote: > On Tue, Mar 31, 2009 at 4:58 PM, John Wang wrote: > > I fail to see the difference of exposing the api to allow for a Query > > instance to be passed in vs a DocIdSe

Re: API to get index info

2009-03-31 Thread John Wang
Excellent! Thanks -John On Tue, Mar 31, 2009 at 2:25 PM, Yonik Seeley wrote: > On Tue, Mar 31, 2009 at 4:55 PM, John Wang wrote: > > Maybe I am missing something. I don't see any calls that would gimme the > > number of segments. Are you suggesting: > IndexCommit.getFileNames().size()? > > Ind

Re: API to get index info

2009-03-31 Thread Yonik Seeley
On Tue, Mar 31, 2009 at 4:55 PM, John Wang wrote: > Maybe I am missing something. I don't see any calls that would gimme the > number of segments. Are you suggesting: IndexCommit.getFileNames().size()? IndexReader.getSequentialSubReaders().length The stats page of Solr now displays the number of

Re: IndexWriter.deleteDocuments(Query query)

2009-03-31 Thread Yonik Seeley
On Tue, Mar 31, 2009 at 4:58 PM, John Wang wrote: > I fail to see the difference of exposing the api to allow for a Query > instance to be passed in vs a DocIdSet. I was commenting specifically on your idea to allow deletion by int[] (docids) on the IndexWriter. DocIdSet is a different issue - i

Re: IndexWriter.deleteDocuments(Query query)

2009-03-31 Thread John Wang
I fail to see the difference of exposing the api to allow for a Query instance to be passed in vs a DocIdSet. In this specific case, Query is essentially a factory to produce a DocIdSetIterator (or Scorer) Isn't it what DocIdSet is? Thanks -John On Tue, Mar 31, 2009 at 12:57 PM, Yonik Seeley wrot

Re: API to get index info

2009-03-31 Thread John Wang
Maybe I am missing something. I don't see any calls that would gimme the number of segments. Are you suggesting: IndexCommit.getFileNames().size()? Thanks -John On Tue, Mar 31, 2009 at 1:04 PM, Yonik Seeley wrote: > On Tue, Mar 31, 2009 at 3:43 PM, John Wang wrote: > > Can we have an API that

Re: API to get index info

2009-03-31 Thread Yonik Seeley
On Tue, Mar 31, 2009 at 3:43 PM, John Wang wrote: > Can we have an API that exposes index information, e.g. number of segments > etc.? Should already all be obtainable via public access: IndexReader.getSequentialSubReaders() and IndexReader.getIndexCommit() -Yonik http://www.lucidimagination.co

Re: IndexWriter.deleteDocuments(Query query)

2009-03-31 Thread Yonik Seeley
On Tue, Mar 31, 2009 at 3:41 PM, John Wang wrote: > Also, can we expose  IndexWriter.deleteDocuments(int[] docids)? Exposing internal ids from the IndexWriter may not be a good idea given that they are transient. -Yonik http://www.lucidimagination.com --

API to get index info

2009-03-31 Thread John Wang
Can we have an API that exposes index information, e.g. number of segments etc.? (or simply make SegmentInfo(s) public classes) We currently do this by working around package-level protecting by sneaking in a subclass in the org.apache.index package. We are moving towards OSGI, and split-packages

IndexWriter.deleteDocuments(Query query)

2009-03-31 Thread John Wang
Hi guys: IndexWriter.deleteDocuments(Query query) api is not really making sense to me. Wouldn't IndexWriter.deleteDocuments(DocIdSet set) be better? Since we don't really care about scoring for this call. Also, can we expose IndexWriter.deleteDocuments(int[] docids)? Using the current api is

Re: Creating lucene index from databases

2009-03-31 Thread Chris Lu
kranthi, Maybe you should use DBSight Lite to get started and get familiar with Lucene features. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: _http://www.dbsight.net_ demo: _http://search.dbsight.com_

Re: Lucene index sizes and performance

2009-03-31 Thread sunnyfr
Hi Chris, Just 10-15% of the index size for the memory, how does it work? It just look for in each segment merged ? that's why when I commit it's getting slower ?? Thanks chrislusf wrote: > > Not really suggestion but some points to consider. > (a) Greatly depending on your hardware, espe

Re: Empty SinkTokenizer

2009-03-31 Thread Raymond Balmès
Well I wanted an order because in my first analysis I'm collecting terms which I put in a 2nd field. I can live with whatever order (creation or alpha) I just needed to know and also was wondering why it is that way, looks to me as an extra complication. -Raymond- On Tue, Mar 31, 2009 at 3:24 PM,

Re: Empty SinkTokenizer

2009-03-31 Thread Grant Ingersoll
I might add that I don't know that we explicitly ever declare they must be in order, but it has always been my understanding that they should be and I confirm this by several conversations in the past: http://www.lucidimagination.com/search/document/274ec8c1c56fdd54/order_of_field_objects_with

Re: Empty SinkTokenizer

2009-03-31 Thread Grant Ingersoll
I'm going to bring this over to java-dev. -Grant On Mar 30, 2009, at 11:34 AM, Raymond Balmès wrote: lucene 2.4.0 On Mon, Mar 30, 2009 at 2:18 PM, Grant Ingersoll wrote: On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote: I found out that the fields are processed in alpha order... an

Re: Lucene Index dump into Solr Index folder

2009-03-31 Thread Grant Ingersoll
You need to create a schema.xml for your index that describes the index, etc. The example schema in Solr likely does not fit your needs. I'd also suggest asking on solr-user, as you may get more info there. -Grant On Mar 31, 2009, at 12:47 AM, Allahbaksh Mohammedali Asadullah wrote: Hi

RE: Creating lucene index from databases

2009-03-31 Thread Allahbaksh Mohammedali Asadullah
Hi, You can use LuSQl it is very handy if you already have data in the Database. http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Regards, Allahbaksh Allahbaksh Mohammedali Asadullah, http://allahbaksh.blogspot.com Starting a startup is hard, but having a 9 to 5 job is hard too,

Creating lucene index from databases

2009-03-31 Thread kranthi reddy
Hi all, I am new to lucene. I want to build a search engine. The entire content on which I want to search is stored in Mysql database. Is it possible to use the content in sql database to build an index using lucene? If it is possible, please give a few tips on how it can be done. Thank you fo