I don't think you can have much, if any, influence on the docid that Lucene
assigns. When you add a document, it's guaranteed to have an ID greater than
any already in the index. I just think (but don't depend) on it being N + 1
where N is the largest docid already in the index.

But here's the really critical part. A document ID can change. If you delete
a document and re-optimize the index (I'm pretty sure the optimization is
necessary), all the documents with docids greater than the one you deleted
will be re-assigned.

People have recommended instead, that you store things like ROWID in a new
field in the Lucene document and leave the docid alone......

Or something like that <G>...

BTW, I commend your efforts and contributions to the Lucene corpus with
this, keep up the good work!

Best
Erick

On 11/23/06, Marcelo Ochoa <[EMAIL PROTECTED]> wrote:

Otis:
  I am new to Lucene API and searching technologies :)

                  doc.add(new Field("rowid", rowid, Field.Store.YES,
                          Field.Index.UN_TOKENIZED));
                  Done!!.

  Also the Oracle ROWID format has a portion which can be used as the
document id into the Lucene document, this will simplify the delete
operation, for example, because with the rowid we can use
reader.deleteDocument(idFromRowIDValue).

http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#sthref3899
  But I don't know how to add documents with an specific id.
  Somebody can help me showing a code snipped with an adding operation
using a predefined ID?
  Rowid number start with 0 and are sequentially assigned.
  Best regards, Marcelo.

On 11/23/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Wow, very cool, even though I don't use Oracle anywhere at the moment.
> You probably don't want that rowid field tokenized, by the way.
>
> Otis
>
> ----- Original Message ----
> From: Marcelo Ochoa <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Wednesday, November 22, 2006 8:44:58 AM
> Subject: Re: Oracle and Lucene Integration
>
> Hi Mark:
> > Very interesting.
> >
> > So how does this solution manage mapping Oracle primary keys to and
from Lucene doc ids?
>   I am storing the rowid value as a Document field, here a code sniped
>                 Document doc = new Document();
>                 doc.add(new Field("rowid", rowid, Field.Store.YES,
>                           Field.Index.TOKENIZED));
>                 Object value = rs.getObject(2);
>                 String valueStr = null;
>                 if (value!=null) { // Sanity checks
>                   if (value instanceof CLOB)
>                     valueStr =
> ((CLOB)value).getSubString(1,(int)((CLOB)value).length());
>                   else if (value instanceof XMLType)
>                     valueStr =
> ((XMLType)value).extract("//text()","").getStringVal();
>                   else
>                     valueStr = value.toString();
>                   doc.add(new
> Field(col,valueStr,Field.Store.NO,Field.Index.TOKENIZED));
>                   writer.addDocument(doc);
>
>   So when I am querying I can get the rowid back using:
>                 if (iterator.hasNext()) {
>                     // append rowid to collection
>                     Hit hit = (Hit) iterator.next();
>                     try {
>                         rid =  hit.get("rowid");
>                         score = hit.getScore();
>                     } catch (IOException e) {
>                         e.printStackTrace();
>                         throw new SQLException(e.getMessage());
>                     }
>                     rlist[i] = new String(rid);
>                     slist.put(rid,new Float(score));
>                     idx++;
>                 } else {.............
>   and passing the rowid to the Oracle execution plan which is
> collecting in bacth of 2000 rowids.
> >
> > >> Another benefits of using the Data Cartridge API is that if the
> > >>table T1 has insert, update or delete rows operations a
corresponding
> > >>Java method will be called to automatically update the Lucene Index.
> >
> > I suspect the tricky bit is optimizing the opening/closing of Lucene
IndexReaders/Writers especially in the event of large batches of database
updates.
> > Does this API pass the transactional info which would help organize
the batching of the Lucene reader.delete and writer.add calls?
>   Well, I think that Oracle Text uses a Queue to store large batches,
> because it use a ctx_sys.sync procedure to update the index ;)
>   We can make the same solution using Oracle AQ.
> >
> > Cheers
> > Mark
>  Best regards, Marcelo.
> --
> Marcelo F. Ochoa
> http://marcelo.ochoa.googlepages.com/home
> ______________
> Do you Know DBPrism? Look @ DB Prism's Web Site
> http://www.dbprism.com.ar/index.html
> More info?
> Chapter 17 of the book "Programming the Oracle Database using Java &
> Web Services"
> http://www.amazon.com/gp/product/1555583296/
> Chapter 21 of the book "Professional XML Databases" - Wrox Press
> http://www.amazon.com/gp/product/1861003587/
> Chapter 8 of the book "Oracle & Open Source" - O'Reilly
> http://www.oreilly.com/catalog/oracleopen/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to