ANN: Oracle Lucene Domain Index 3.0.2 released

2010-09-15 Thread Marcelo Ochoa
Just a few words to announce a new release (http://sourceforge.net/projects/dbprism/files/odi/3.0.2.1.0/) of Oracle Lucene Domain Index (http://docs.google.com/View?docid=ddgw7sjp_569gf8c7cd8), this zip is valid for 10g and 11g database version (10g using back-ported classes from 1.5 to 1.4) This r

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Marcelo Ochoa
Hi Ian: Only as curiosity ;) Which distributed file system are you using on top of your NAS storage? Best regards, Marcelo. On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea wrote: > We've run lucene on NAS, although not with indexes anything like as > large as 1Tb, and gave up because NFS and lucen

Re: If you could have one feature in Lucene...

2010-02-24 Thread Marcelo Ochoa
> What would it be? An extended query parser syntax (http://lucene.apache.org/java/2_9_1/queryparsersyntax.html) including geo-location search. For example: hsin (great circle): name:Minneapolis AND _val_:"recip(hsin(0.78, -1.6, lat_rad, lon_rad, 3963.205), 1, 1

Re: Performance tips when creating a large index from database.

2009-10-22 Thread Marcelo Ochoa
Hi Paul: Mostly of the time indexing big tables is spent on the table full scan and network data transfer. Please take a quick look at my OOW08 presentation about Oracle Lucene integration: http://docs.google.com/present/view?id=ddgw7sjp_156gf9hczxv specially slides 13 and 14 wh

Re: JDBC access to a Lucene index

2009-10-19 Thread Marcelo Ochoa
Hi Zukka: This is similar approach to Lucene Domain Index: http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg But Lucene Domain Index is an specific implementation for Oracle Databases 10g/11g which is integrated through the ODCI API and replacing the Lucene file system storage by a BLOB storag

ANN: New release of Lucene Domain Index for Oracle

2009-09-29 Thread Marcelo Ochoa
Hi All: A new binary distribution of Lucene Domain Index (2.9.0.1.0) for Oracle 10g/11g has been released. Lucene Domain Index is integration of Lucene Project running inside the Oracle JVM and integrated to the SQL layer by adding a new index type. This new version uses latest Lucene 2.9.

Re: Lucene 2.9 RC2 now available for testing

2009-09-07 Thread Marcelo Ochoa
Hi All: I am already have integrated Lucene 2.9RC2 with Lucene Domain Index: http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg As usual, a new Lucene version do a fastest product :) All my internal test runs OK and I only need to re-test on 10g database. Once Lucene 2.9 is ready for produ

Re: SpatialQuery for location based search using Lucene

2009-06-27 Thread Marcelo Ochoa
Hi: Did you know Local Lucene extension? http://sourceforge.net/projects/locallucene/ Some test are similar to your example: http://locallucene.svn.sourceforge.net/viewvc/locallucene/trunk/locallucene/src/java/com/pjaol/search/test/UnitTests/ Best regards, Marcelo. On Sat, Jun 27, 2009 at 5:

Re: [ANN] Luke 0.9.2 release

2009-03-20 Thread Marcelo Ochoa
Hi Andrzej: > > If you tried to access this url during last couple hours the site was down. > It should be up again - apparently I went over the allocated bandwidth and > the hosting company disabled the site without any warning or even > notification. It's time to look for a better home for Luke

Re: Merging database index with fulltext index

2009-03-02 Thread Marcelo Ochoa
Hi: The point to catch with bad performance during merging a database result is to reduce the number of rows visited by your first query. As an example take a look a these two queries using Lucene Domain Index, the two are equivalents: Option A: select * from (select rownum as ntop_pos,q.* fro

Re: Faceted Search using Lucene

2009-02-22 Thread Marcelo Ochoa
Hi Amin: Please take a look a this blog post: http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html Best regards, Marcelo. On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman wrote: > Hi > > Sorry to re send this email but I was wondering if I could get some advice > o

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread Marcelo Ochoa
Hi: Could you try open the index using Luke but using the JDK bundled with the Oracle DB? I mean, try to use Luke as an standalone application in the same machine but outside the OJVM using the JDK at: $ORACLE_HOME/jdk which was used to compile most of the classes running inside the OJV

Re: Any Spanish analyzer available?

2008-10-24 Thread Marcelo Ochoa
Zhang: I have done a simple SpanishAnalyzer for Lucene Domain Index test suites which index Spanish WikiPedia dumps. This simple analyzer have a list of stops words and is faster than SnowballAnalyzer which also performs stemming. You can get the code using CVS from SourceForget.net servers or

Re: Lucene vs. Database

2008-10-01 Thread Marcelo Ochoa
Mathieu: > Crawling a DB is not a good idea. Indexing while writing/deleting is > clever. These operations also consume network traffic in architectures like Solr WS. Also there is a waste of network traffic when a query is filtered against relational data (slides 15 and 18 of Google presentati

Re: Lucene vs. Database

2008-10-01 Thread Marcelo Ochoa
Hi Zoran: One of the biggest issues with Lucene DB integration is the network traffic consumed as consequence of indexing or updating operation, apart from transactionalbilty which can be relaxed in some application. During our Oracle Open World presentation we present some of these issues comp

Re: Caused by: java.io.IOException: read past EOF on Slave

2008-09-30 Thread Marcelo Ochoa
your previous > Input/Output implementations, which worked with Lucene 2.3, work win 2.4? > It's kinda spooky. > > Mike > > Marcelo Ochoa wrote: > >> Michael: >> I have OJVMDirectory working with 2.4rc2 code base. >> I have refactored Output and Input streams cl

Re: Caused by: java.io.IOException: read past EOF on Slave

2008-09-30 Thread Marcelo Ochoa
s odd to me that there would be a bug in your Directory > implementation that 2.3 didn't tickle but 2.4 did. > > Can you dump the index, as stored in BLOB columns, out into the filesystem, > and run CheckIndex on it? (Or maybe run CheckIndex within Oracle). > > Mike > >

Re: Caused by: java.io.IOException: read past EOF on Slave

2008-09-26 Thread Marcelo Ochoa
mat in 2.4 to use "true" UTF8 encoding for all > text content; not sure that this applies here (to BufferedIndexReader it's > all bytes) but it may. > > BufferedIndexReader in general can do random IO, especially when reading the > term dict file (*.tis), when you > &g

Re: Caused by: java.io.IOException: read past EOF on Slave

2008-09-26 Thread Marcelo Ochoa
Michael: I just start testing 2.4rc2 running inside OJVM. I found a similar stack trace during indexing: IW 3 [Root Thread]: flush: segment=_3 docStoreSegment=_3 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=2 numBufDelTerms=2 IW 3 [Root Thread]: index before

Re: Newbie question: using Lucene to index hierarchical information.

2008-09-10 Thread Marcelo Ochoa
Hi Leonid If you are not familiar with Oracle XMLDB schema mappings here an example of how to store WikiPedia XML dumps into Oracle database, but using XML-to-relational model: http://marceloochoa.blogspot.com/2007/12/uploading-wikipedia-dumps-to-oracle.html The structure of WikiPedia dumps s

Re: MoreLikeThis return no results

2008-09-01 Thread Marcelo Ochoa
Hi Dave: MoreLikeThis object has two parameters which controls his functionality: mlt.setMinTermFreq(minTermFreq.intValue()); mlt.setMinDocFreq(minDocFreq.intValue()); By default MinTermFreq is 2, so if your document has no terms with freq greater than 2 will return a query with

Re: Lucene Indexing DB records?

2008-08-22 Thread Marcelo Ochoa
> Actually there are many projects for Lucene + Database. Here is a list I > know: > > * Hibernate Search > * Compass, (also Hibernate + Lucene) > * Solr + DataImportHandler (Searching + Crawler) > * DBSight, (Specific for database, closed source, but very customizable, > easy to setup) > * Browse

Re: Using lucene as a database... good idea or bad idea?

2008-07-30 Thread Marcelo Ochoa
Hi John: Did you test/know Lucene Domain Index for Oracle database? http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html If you are using Oracle 10g/11g is completed integrated in Oracle memory space like Oracle Text but based in Lucene. No network round trip i

ANN: A Lucene-OJVM native REST WS example

2008-07-15 Thread Marcelo Ochoa
Hi all: For people who are using Lucene Oracle integration project: http://marceloochoa.blogspot.com/2008/07/lucene-ojvm-native-rest-ws.html Best regards, Marcelo. -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home __ Do you Know DBPris

OT: Oracle-Lucene integration at Oracle Open World 08

2008-06-06 Thread Marcelo Ochoa
Hi: Oracle-Lucene integration will have a place at Oracle Open World 08 (Oracle Develop - San Francisco Marriott). My friend Kuassi Mensah and I will present the architecture as an example of database intensive solution. Here the link to the content catalogue: http://www28.cplan.com/cc208/

Re: Typical Indexing performance

2008-06-03 Thread Marcelo Ochoa
Hi: Here my latest testing of Oracle-Lucene integration (Lucene 2.3.2 binary dist. / Oracle 11g): http://marceloochoa.blogspot.com/2008/06/new-binary-release-of-lucene-oracle.html Tested against Spanish Wikipedia Dumps and using Wikipedia Analyzer/Tokenizer. There is independent times for upl

ANN: New release Lucene-Oracle integration

2008-06-01 Thread Marcelo Ochoa
Hi All: I am just releasing a new binary distribution of Oracle-Lucene integration by using Lucene-OJVM Data Catridge. Here the change log: * Compiled against Lucene 2.3.2 production release * Used latest API for merging based on RAM usage * Use Writer for deleting during Sync * Confirm 4x impr

Re: Opening an index directory inside a jar

2008-05-30 Thread Marcelo Ochoa
Hi Ravi: I am not a Lucene guru but IMO you has to write a new Directory class which opens the jar a provides access to Lucene. May be a sub class of FSDirectory will work, but only for read-only behaviour. I have done this set of classes to implement Lucene storage inside Oracle JVM using B

Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

2008-05-07 Thread Marcelo Ochoa
Hi Michael: First thanks a lot for your time. See comments below. > Is there any way to capture & serialize the actual documents being > added (this way I can "replay" those docs to reproduce it)? Documents are a column VARCHAR2 from all_source Oracle's System view, in fact is a table as: c

Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

2008-05-06 Thread Marcelo Ochoa
gards, Marcelo. On Tue, May 6, 2008 at 7:00 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Hi Marcelo, > > Hmmm something is not right. > > Somehow the byte slices, which DocumentsWriter uses to hold the postings in > RAM, became corrupt. > > Is this eas

Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

2008-05-06 Thread Marcelo Ochoa
Hi Lucene experts: I am working upgrading Lucene-Oracle integration project to latest Lucene 2.3.1 code. After correcting a minor issue on OJVMDirectory file implementation I have the integration running with latest 2.3.1 code. But it only works with small indexes, I think index which are lower

Re: hybrid query (lucene + db)

2008-05-02 Thread Marcelo Ochoa
Hi Stéphane: If you are using Oracle Spatial I assume that you are using Oracle too for storing text :) Have you take a look at Oracle-Lucene integration project sponsored by LendingClub.com? http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg http://sourceforge.net/project/showfiles.php?group_id=5

New binary distribution of Oracle-Lucene integration

2008-04-07 Thread Marcelo Ochoa
Hi all: I just released a new version of Oracle-Lucene integration implemented as a Domain Index. Binary distribution have a very straightforward installation and testing step, downloads are at SF.net web site: http://sourceforge.net/project/showfiles.php?group_id=56183&package_id=255524&releas

Re: Lucene+Oracle Integration

2008-02-14 Thread Marcelo Ochoa
Hi Mitesh: Lucene-OJVM integration is not tested against lucene-2.3.0 version. I'll do it ASAP. Best regards, Marcelo. On Thu, Feb 14, 2008 at 10:01 AM, Mitesh Soni <[EMAIL PROTECTED]> wrote: > > > > > I have run the build file in the lucene-2.3.0\contrib\ojvm successfully. But > I cannot cr

Re: Indexing Wikipedia dumps

2007-12-18 Thread Marcelo Ochoa
Hi All: Just to add simple hack, I had posted at my Blog an entry named "Uploading WikiPedia Dumps to Oracle databases": http://marceloochoa.blogspot.com/2007_12_01_archive.html with instructions to upload WikiPedia Dumps to Oracle XMLDB, it means transforming an XML file to an object-relationa

Fwd: Oracle-Lucene Domain Index (New Release)

2007-12-13 Thread Marcelo Ochoa
ion of text indexing and search using Lucene within the Oracle relational database. Many thanks to Marcelo Ochoa, the developer that made it all happen! Among the goodies you will find in the new release are: * LuceneDomainIndex.countHits() function to replace select count from .. where lcontains

Re: Lucene jdbc

2007-11-26 Thread Marcelo Ochoa
Mike: If you work with Oracle databases you can take a look at Oracle Lucene integration. http://www.infoq.com/news/2007/10/lucene-oracle http://issues.apache.org/jira/browse/LUCENE-724 By using OJVMDirectory you have Lucene integrated at the Oracle Engine as a new Domain Index, so you can use

Re: Oracle-Lucene integration (OJVMDirectory and Lucene Domain Index) - LONG

2007-09-20 Thread Marcelo Ochoa
Hi Chris: First sorry for the delay :( I have some preliminary performance test using Oracle 11g running on in a VMWare virtual Machine with 400Mb SGA (Virtual Machine using 812Mb RAM for Oracle Enterprise Linux 4.0). This virtual machine is hosted in a modest hardware, a Pentium IV 2.18Ghz wit

Fwd: Oracle-Lucene integration (OJVMDirectory and Lucene Domain Index) - LONG

2007-09-14 Thread Marcelo Ochoa
Integration), primarily based on new requirements from LendingClub.com, who commissioned the work to Marcelo Ochoa, the contributer of the original patch (great job Marcelo!). As contribution of LendingClub.com to the Lucene community we have posted the code on a public CVS (sourceforge) as explained belo

Re: Performance of DbDirectory

2007-06-13 Thread Marcelo Ochoa
Hi Simon: If you are using Oracle 10g you can get the advantage of using Lucene inside the Oracle JVM. Look at this contribution: http://issues.apache.org/jira/browse/LUCENE-724 The most important difference of this implementation in that all the operation between Lucene to index table data

Does someone have compared NFS versus OCFS2 in a Lucene grid installation?

2007-04-30 Thread Marcelo Ochoa
Hi All: Does someone have compared NFS versus OCFS2 in a Lucene grid installation? The Oracle Cluster Filesystem 2 is shipped by default since linux kernel 2.6.16-rc1+ OCFS2 is a cluster optimized file system used by the Oracle RAC configuration (http://oss.oracle.com/projects/ocfs2/). One o

Re: Lucene Index of Oracle RDBMS

2007-01-08 Thread Marcelo Ochoa
ecognizable companies/agencies where this has been done before. Any additional suggestions for where I can search for this information would be appreciated. Again thanks for your help, V/R, Max Aronin -Original Message- From: Marcelo Ochoa [mailto:[EMAIL PROTECTED] Sent: Monday, January

Re: Lucene Index of Oracle RDBMS

2007-01-08 Thread Marcelo Ochoa
Hi Max: I am working in a Oracle-Lucene integration, see patch: http://issues.apache.org/jira/browse/LUCENE-724 Today I'll upload a latest development release which includes several performance enhancement and new methods to integrate with the Oracle Data Cartridge API without using data informa

Re: efficient ways of updating document

2007-01-05 Thread Marcelo Ochoa
John: I had implemented a batch (delete,insert,update) operation for the Oracle Lucene Domain Index using OJVMDirectory, see patch: http://issues.apache.org/jira/browse/LUCENE-724 The strategy used in this solution is to enqueue all operations on the table which have a column indexed by Lucene

Oracle/Lucene integration -status-

2006-12-21 Thread Marcelo Ochoa
Hi: Yesterday, I uploaded a new version of the Oracle/Lucene integration using BLOB as storage for the inverted index and the Oracle JVM for running the Lucene framework inside the Oracle Database, see it at the Jira: http://issues.apache.org/jira/browse/LUCENE-724 This new version includes a fu

Re: lucene functionality

2006-12-13 Thread Marcelo Ochoa
Hi Chris: > (1) Each field is searchable and indexable. ...and I assumed hte real problem is being ableto address use cases like "find all documents where the DRECONTENT contains the words "Action" and the words "News" near eachother -- using stemming and other Text Analysys tricks i may wnat

Re: lucene functionality

2006-12-13 Thread Marcelo Ochoa
Hi Chris: On 12/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : For 10 million records We recommend an strong database such as Oracle. eh ... who is "We" in that statement? We are independent consultants working for many years with Oracle databases ;) I Suspect you'll find other peop

Re: lucene functionality

2006-12-13 Thread Marcelo Ochoa
Hi Mark: For 10 million records We recommend an strong database such as Oracle. You can annotate the Schema (.xsd) which describes your XML record to store some field in traditional VARCHAR2 or NUMBER columns to query it faster, and in a CLOB column. You can find more information at: http://ww

Re: Full text searching on documents saved in database as BLOB

2006-12-01 Thread Marcelo Ochoa
Hi Inderjeet: I am working in a full text searching implementation for Oracle Databases running Lucene on the Oracle JVM. The text searching functionality is ready yet, you can get latest code uploaded on Tuesday, see the attachment text for the detail of the new functionality included: http://i

Re: Oracle and Lucene Integration

2006-11-23 Thread Marcelo Ochoa
Hi Vladimir: I don't think you can define rowid on 'insert' operations (ie, when a new entry in the table is created) - it's a 'hidden'/automatic field Oracle maintains itself... The rowid is available on the Data Cartridge API, see this output from tests: Insert. newval: 'chau' rowid: 'AAAP1EA

Re: Oracle and Lucene Integration

2006-11-23 Thread Marcelo Ochoa
use Oracle anywhere at the moment. You probably don't want that rowid field tokenized, by the way. Otis ----- Original Message From: Marcelo Ochoa <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, November 22, 2006 8:44:58 AM Subject: Re: Oracle and Lucene Integration

Re: Oracle and Lucene Integration

2006-11-22 Thread Marcelo Ochoa
ch put the modification in a queue to be processed with a new Sync method regularly using an Oracle's DBMS_SCHEDULER or by the user's application code. Best regards, Marcelo. On 11/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote: Marcelo Ochoa wrote: > Then I'll move the code outs

Re: Oracle and Lucene Integration

2006-11-22 Thread Marcelo Ochoa
Olenin <[EMAIL PROTECTED]> wrote: Hi, Marcelo, Yes, putting it in the public space would be great. I personally would be very interested to have a look. Can it be posted on the 'lucene' website? Vlad -Original Message- From: Marcelo Ochoa [mailto:[EMAIL PROTECTED] Sent: Wedn

Re: Oracle and Lucene Integration

2006-11-22 Thread Marcelo Ochoa
Hi Mark: Very interesting. So how does this solution manage mapping Oracle primary keys to and from Lucene doc ids? I am storing the rowid value as a Document field, here a code sniped Document doc = new Document(); doc.add(new Field("rowid", rowid, Field.Store.Y

Oracle and Lucene Integration

2006-11-22 Thread Marcelo Ochoa
Hi all: I read on this a list many threads about Lucene indexing framework integration with Oracle. http://www.gossamer-threads.com/lists/lucene/java-user/41104?search_string=oracle%20jvm%20BLOB;#41104 So it push me to work in a Lucene and Oracle JVM (a Java virtual machine running inside the Or