Hi, Joaquin, Very interested to know the indexing performance inside Oracle JVM, especially with large amount of data.
-- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes On 9/14/07, Marcelo Ochoa <[EMAIL PROTECTED]> wrote: > > From: J. Delgado <[EMAIL PROTECTED]> > Date: Sep 13, 2007 7:27 PM > Subject: Oracle-Lucene integration (OJVMDirectory and Lucene Domain > Index) - LONG > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > > > I'm very happy to announce the partial rework and extension to > LUCENE-724 (Oracle-Lucene Integration), primarily based on new > requirements from LendingClub.com, who commissioned the work to > Marcelo Ochoa, the contributer of the original patch (great job > Marcelo!). As contribution of LendingClub.com to the Lucene community > we have posted the code on a public CVS (sourceforge) as explained > below. > > Here at Lending Club ( www.lendingclub.com) we have very specific > needs regarding the indexing of both structured and unstructured data, > most of it transactional in nature and siting in our Oracle !0gR2 DB, > with a highly complex schema. Our "ranking" of loans in the inventory > includes components of exact, textual and hardcore mathematical > calculations including time, amount and spatial constraints. This > integration of Lucene into Oracle as a Domain Index will now allow us > to query this inventory in real-time. Going against the Lucene index, > created on "synthetic documents" comprised of fields being populated > from diverse tables (user data store), eliminates the need to create > very complex joins to link data from different tables at query time. > This, along with the support of the full Lucene query language, makes > this a great alternative to: > > Using Lucene outside the database which requires "crawling" the data > and storing the index outside the database, loosing all the benefits > of a fully transactional system and a secure environment. > > Using Oracle Text, which is very powerful but lacks the extensibility > and flexibility that Lucene offers (for example, being able to query > directly the index from the Java layer or implementing our our ranking > algorithm), though to be completely fair some of it is addressed in > the new Oracle DB 11g version. If anyone is interested in learning > more how we are going to use this within Lending Club, please drop me > a line. BTW, please make sure you check us out: "Lending Club ( > http://www.lendingclub.com/), the rapidly growing people-to-people > (P2P) lending service that launched as a Facebook application in May > 2007, today announced the public availability of its services with the > launch of LendingClub.com. Lending Club connects lenders and borrowers > based upon shared affinities, enabling them to bypass banks to secure > better interest rates on loans"... more about the announcement here > http://www.sys-con.com/read/428678.htm. We have seen man entrepreneurs > applying for loans and being helped by regular people to build their > business with the money obtained at very low interest. > > OK, without further marketing stuff (sorry for that), here is the > original note sent to me by Marcelo that summarizes all the new cool > functionalities: > > OJVMDirectory, a Lucene Integration running inside the Oracle JVM is > going one step further. > > This new release includes: > > Synchronized with latest Lucene 2.2.0 production > Replaced in memory storage using Vector based implementation by direct > BLOB IO, reducing memory usage for large index. > Support for user data stores, it means you can not only index one > column at time (limited by Data Cartridge API on 10g), now you can > index multiples columns at base table and columns on related tabled > joined together. > User Data Stores can be customized by the user, it means writing a > simple Java Class users can control which column are indexed, padding > used or any other functionality previous to document adding step. > There is a DefaultUserDataStore which gets all columns of the query > and built a Lucene Document with Fields representing each database > columns these fields are automatically padded if they have NUMBER or > rounded if they have DATE data, for example. > lcontains() SQL operator support full Lucene's QueryParser syntax to > provide access to all columns indexed, see examples below. > Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if > you want to get rows order by lscore() operator (ascending,descending) > the optimizer hint will assume that Lucene Domain Index will returns > rowids in proper order avoided an inline-view to sort it. > Automatic index synchronization by using AQ's Call Back. > Lucene Domain Index creates extra tables named IndexName$T and an > Oracle AQ named IndexName$Q with his storage table IndexName$QT at > user's schema, so you can alter storage's preference if you want. > ojvm project is at SourceForge.net CVS, so anybody can get it and > collaborate ;) > Tested against 10gR2 and 11g database. > Some sample usages: > > create table t2 ( > f4 number primary key, > f5 VARCHAR2(200)); > create table t1 ( > f1 number, > f2 CLOB, > f3 number, > CONSTRAINT t1_t2_fk FOREIGN KEY (f3) > REFERENCES t2(f4) ON DELETE cascade); > create index it1 on t1(f3) indextype is lucene.LuceneIndex > parameters('Analyzer:org.apache.lucene.analysis > .SimpleAnalyzer;ExtraCols:f2'); > > alter index it1 > parameters('ExtraCols:f2,t2.f5;ExtraTabs:t2;WhereCondition:t1.f3=t2.f4 > ;DecimalFormat:000'); > > Lucene domain index will store f2 and f3 columns of table t1 plus f5 > of table t2. > > So you can query then with: > > select lscore(1),f2 from t1 where lcontains(f3, 'f2:test',1) > 0; > or > select lscore(1),f2 from t1 where lcontains(f3, 'f2:test and f3:[001 > to 200]',1) > 0; > > select /*+ DOMAIN_INDEX_SORT */ lscore(1),f2,t2.f5 > from t1,t2 > where lcontains(f3, 'f2:test1 and f3:[001 to 200] and t2.f5:test2',1) > > 0 > and t1.f3=t2.f4 > order by lscore(1) asc; > > In latest example Oracle's optimizer will assume that Lucene Domain > Index will resolve first a set of rowid matching "f2:test1 and f3:[001 > to 200] and t2.f5:test2" then will direct access by by index rowid on > table t1 and perform the join with t2. > > More examples and information can be found at: > > http://dbprism.cvs.sourceforge.net/dbprism/ojvm/Readme.txt?revision=1.10&view=markup > > -- > Marcelo F. Ochoa > http://marcelo.ochoa.googlepages.com/home > > Cheers! > > Joaquin Delgado, PhD > CTO, Lending Club > www.lendingclub.com > > > > -- > Marcelo F. Ochoa > http://marceloochoa.blogspot.com/ > http://marcelo.ochoa.googlepages.com/home > ______________ > Do you Know DBPrism? Look @ DB Prism's Web Site > http://www.dbprism.com.ar/index.html > More info? > Chapter 17 of the book "Programming the Oracle Database using Java & > Web Services" > http://www.amazon.com/gp/product/1555583296/ > Chapter 21 of the book "Professional XML Databases" - Wrox Press > http://www.amazon.com/gp/product/1861003587/ > Chapter 8 of the book "Oracle & Open Source" - O'Reilly > http://www.oreilly.com/catalog/oracleopen/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >