Just a few words to announce a new release
(http://sourceforge.net/projects/dbprism/files/odi/3.0.2.1.0/) of
Oracle Lucene Domain Index
(http://docs.google.com/View?docid=ddgw7sjp_569gf8c7cd8), this zip is
valid for 10g and 11g database version (10g using back-ported classes
from 1.5 to 1.4)
This r
Hi Ian:
Only as curiosity ;)
Which distributed file system are you using on top of your NAS storage?
Best regards, Marcelo.
On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea wrote:
> We've run lucene on NAS, although not with indexes anything like as
> large as 1Tb, and gave up because NFS and lucen
> What would it be?
An extended query parser syntax
(http://lucene.apache.org/java/2_9_1/queryparsersyntax.html) including
geo-location search.
For example:
hsin (great circle): name:Minneapolis
AND _val_:"recip(hsin(0.78, -1.6, lat_rad, lon_rad, 3963.205), 1, 1
Hi Paul:
Mostly of the time indexing big tables is spent on the table full
scan and network data transfer.
Please take a quick look at my OOW08 presentation about Oracle
Lucene integration:
http://docs.google.com/present/view?id=ddgw7sjp_156gf9hczxv
specially slides 13 and 14 wh
Hi Zukka:
This is similar approach to Lucene Domain Index:
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
But Lucene Domain Index is an specific implementation for Oracle
Databases 10g/11g which is integrated through the ODCI API and
replacing the Lucene file system storage by a BLOB storag
Hi All:
A new binary distribution of Lucene Domain Index (2.9.0.1.0) for
Oracle 10g/11g has been released.
Lucene Domain Index is integration of Lucene Project running inside
the Oracle JVM and integrated to the SQL layer by adding a new index
type.
This new version uses latest Lucene 2.9.
Hi All:
I am already have integrated Lucene 2.9RC2 with Lucene Domain Index:
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
As usual, a new Lucene version do a fastest product :)
All my internal test runs OK and I only need to re-test on 10g database.
Once Lucene 2.9 is ready for produ
Hi:
Did you know Local Lucene extension?
http://sourceforge.net/projects/locallucene/
Some test are similar to your example:
http://locallucene.svn.sourceforge.net/viewvc/locallucene/trunk/locallucene/src/java/com/pjaol/search/test/UnitTests/
Best regards, Marcelo.
On Sat, Jun 27, 2009 at 5:
Hi Andrzej:
>
> If you tried to access this url during last couple hours the site was down.
> It should be up again - apparently I went over the allocated bandwidth and
> the hosting company disabled the site without any warning or even
> notification. It's time to look for a better home for Luke
Hi:
The point to catch with bad performance during merging a database
result is to reduce the number of rows visited by your first query.
As an example take a look a these two queries using Lucene Domain
Index, the two are equivalents:
Option A:
select * from (select rownum as ntop_pos,q.* fro
Hi Amin:
Please take a look a this blog post:
http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
Best regards, Marcelo.
On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman wrote:
> Hi
>
> Sorry to re send this email but I was wondering if I could get some advice
> o
Hi:
Could you try open the index using Luke but using the JDK bundled
with the Oracle DB?
I mean, try to use Luke as an standalone application in the same
machine but outside the OJVM using the JDK at:
$ORACLE_HOME/jdk
which was used to compile most of the classes running inside the OJV
Zhang:
I have done a simple SpanishAnalyzer for Lucene Domain Index test
suites which index Spanish WikiPedia dumps.
This simple analyzer have a list of stops words and is faster than
SnowballAnalyzer which also performs stemming.
You can get the code using CVS from SourceForget.net servers or
Mathieu:
> Crawling a DB is not a good idea. Indexing while writing/deleting is
> clever.
These operations also consume network traffic in architectures like Solr WS.
Also there is a waste of network traffic when a query is filtered
against relational data (slides 15 and 18 of Google presentati
Hi Zoran:
One of the biggest issues with Lucene DB integration is the network
traffic consumed as consequence of indexing or updating operation,
apart from transactionalbilty which can be relaxed in some
application.
During our Oracle Open World presentation we present some of these
issues comp
your previous
> Input/Output implementations, which worked with Lucene 2.3, work win 2.4?
> It's kinda spooky.
>
> Mike
>
> Marcelo Ochoa wrote:
>
>> Michael:
>> I have OJVMDirectory working with 2.4rc2 code base.
>> I have refactored Output and Input streams cl
s odd to me that there would be a bug in your Directory
> implementation that 2.3 didn't tickle but 2.4 did.
>
> Can you dump the index, as stored in BLOB columns, out into the filesystem,
> and run CheckIndex on it? (Or maybe run CheckIndex within Oracle).
>
> Mike
>
>
mat in 2.4 to use "true" UTF8 encoding for all
> text content; not sure that this applies here (to BufferedIndexReader it's
> all bytes) but it may.
>
> BufferedIndexReader in general can do random IO, especially when reading the
> term dict file (*.tis), when you
>
&g
Michael:
I just start testing 2.4rc2 running inside OJVM.
I found a similar stack trace during indexing:
IW 3 [Root Thread]: flush: segment=_3 docStoreSegment=_3
docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false
numDocs=2 numBufDelTerms=2
IW 3 [Root Thread]: index before
Hi Leonid
If you are not familiar with Oracle XMLDB schema mappings here an
example of how to store WikiPedia XML dumps into Oracle database, but
using XML-to-relational model:
http://marceloochoa.blogspot.com/2007/12/uploading-wikipedia-dumps-to-oracle.html
The structure of WikiPedia dumps s
Hi Dave:
MoreLikeThis object has two parameters which controls his functionality:
mlt.setMinTermFreq(minTermFreq.intValue());
mlt.setMinDocFreq(minDocFreq.intValue());
By default MinTermFreq is 2, so if your document has no terms with
freq greater than 2 will return a query with
> Actually there are many projects for Lucene + Database. Here is a list I
> know:
>
> * Hibernate Search
> * Compass, (also Hibernate + Lucene)
> * Solr + DataImportHandler (Searching + Crawler)
> * DBSight, (Specific for database, closed source, but very customizable,
> easy to setup)
> * Browse
Hi John:
Did you test/know Lucene Domain Index for Oracle database?
http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
If you are using Oracle 10g/11g is completed integrated in Oracle
memory space like Oracle Text but based in Lucene.
No network round trip i
Hi all:
For people who are using Lucene Oracle integration project:
http://marceloochoa.blogspot.com/2008/07/lucene-ojvm-native-rest-ws.html
Best regards, Marcelo.
--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
__
Do you Know DBPris
Hi:
Oracle-Lucene integration will have a place at Oracle Open World 08
(Oracle Develop - San Francisco Marriott).
My friend Kuassi Mensah and I will present the architecture as an
example of database intensive solution.
Here the link to the content catalogue:
http://www28.cplan.com/cc208/
Hi:
Here my latest testing of Oracle-Lucene integration (Lucene 2.3.2
binary dist. / Oracle 11g):
http://marceloochoa.blogspot.com/2008/06/new-binary-release-of-lucene-oracle.html
Tested against Spanish Wikipedia Dumps and using Wikipedia Analyzer/Tokenizer.
There is independent times for upl
Hi All:
I am just releasing a new binary distribution of Oracle-Lucene
integration by using Lucene-OJVM Data Catridge.
Here the change log:
* Compiled against Lucene 2.3.2 production release
* Used latest API for merging based on RAM usage
* Use Writer for deleting during Sync
* Confirm 4x impr
Hi Ravi:
I am not a Lucene guru but IMO you has to write a new Directory
class which opens the jar a provides access to Lucene.
May be a sub class of FSDirectory will work, but only for read-only behaviour.
I have done this set of classes to implement Lucene storage inside
Oracle JVM using B
Hi Michael:
First thanks a lot for your time.
See comments below.
> Is there any way to capture & serialize the actual documents being
> added (this way I can "replay" those docs to reproduce it)?
Documents are a column VARCHAR2 from all_source Oracle's System
view, in fact is a table as:
c
gards, Marcelo.
On Tue, May 6, 2008 at 7:00 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Hi Marcelo,
>
> Hmmm something is not right.
>
> Somehow the byte slices, which DocumentsWriter uses to hold the postings in
> RAM, became corrupt.
>
> Is this eas
Hi Lucene experts:
I am working upgrading Lucene-Oracle integration project to latest
Lucene 2.3.1 code.
After correcting a minor issue on OJVMDirectory file implementation I
have the integration running with latest 2.3.1 code.
But it only works with small indexes, I think index which are lower
Hi Stéphane:
If you are using Oracle Spatial I assume that you are using Oracle
too for storing text :)
Have you take a look at Oracle-Lucene integration project sponsored
by LendingClub.com?
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
http://sourceforge.net/project/showfiles.php?group_id=5
Hi all:
I just released a new version of Oracle-Lucene integration
implemented as a Domain Index.
Binary distribution have a very straightforward installation and
testing step, downloads are at SF.net web site:
http://sourceforge.net/project/showfiles.php?group_id=56183&package_id=255524&releas
Hi Mitesh:
Lucene-OJVM integration is not tested against lucene-2.3.0 version.
I'll do it ASAP.
Best regards, Marcelo.
On Thu, Feb 14, 2008 at 10:01 AM, Mitesh Soni <[EMAIL PROTECTED]> wrote:
>
>
>
>
> I have run the build file in the lucene-2.3.0\contrib\ojvm successfully. But
> I cannot cr
Hi All:
Just to add simple hack, I had posted at my Blog an entry named
"Uploading WikiPedia Dumps to Oracle databases":
http://marceloochoa.blogspot.com/2007_12_01_archive.html
with instructions to upload WikiPedia Dumps to Oracle XMLDB, it
means transforming an XML file to an object-relationa
ion of text indexing
and search using Lucene within the Oracle relational database. Many
thanks to Marcelo Ochoa, the developer that made it all happen!
Among the goodies you will find in the new release are:
* LuceneDomainIndex.countHits() function to replace select count from
.. where lcontains
Mike:
If you work with Oracle databases you can take a look at Oracle
Lucene integration.
http://www.infoq.com/news/2007/10/lucene-oracle
http://issues.apache.org/jira/browse/LUCENE-724
By using OJVMDirectory you have Lucene integrated at the Oracle
Engine as a new Domain Index, so you can use
Hi Chris:
First sorry for the delay :(
I have some preliminary performance test using Oracle 11g running on
in a VMWare virtual Machine with 400Mb SGA (Virtual Machine using
812Mb RAM for Oracle Enterprise Linux 4.0). This virtual machine is
hosted in a modest hardware, a Pentium IV 2.18Ghz wit
Integration), primarily based on new
requirements from LendingClub.com, who commissioned the work to
Marcelo Ochoa, the contributer of the original patch (great job
Marcelo!). As contribution of LendingClub.com to the Lucene community
we have posted the code on a public CVS (sourceforge) as explained
belo
Hi Simon:
If you are using Oracle 10g you can get the advantage of using
Lucene inside the Oracle JVM.
Look at this contribution:
http://issues.apache.org/jira/browse/LUCENE-724
The most important difference of this implementation in that all
the operation between Lucene to index table data
Hi All:
Does someone have compared NFS versus OCFS2 in a Lucene grid installation?
The Oracle Cluster Filesystem 2 is shipped by default since linux
kernel 2.6.16-rc1+
OCFS2 is a cluster optimized file system used by the Oracle RAC
configuration (http://oss.oracle.com/projects/ocfs2/).
One o
ecognizable companies/agencies
where this has been done before. Any additional suggestions for where I
can search for this information would be appreciated.
Again thanks for your help,
V/R,
Max Aronin
-Original Message-
From: Marcelo Ochoa [mailto:[EMAIL PROTECTED]
Sent: Monday, January
Hi Max:
I am working in a Oracle-Lucene integration, see patch:
http://issues.apache.org/jira/browse/LUCENE-724
Today I'll upload a latest development release which includes
several performance enhancement and new methods to integrate with the
Oracle Data Cartridge API without using data informa
John:
I had implemented a batch (delete,insert,update) operation for the
Oracle Lucene Domain Index using OJVMDirectory, see patch:
http://issues.apache.org/jira/browse/LUCENE-724
The strategy used in this solution is to enqueue all operations on
the table which have a column indexed by Lucene
Hi:
Yesterday, I uploaded a new version of the Oracle/Lucene integration
using BLOB as storage for the inverted index and the Oracle JVM for
running the Lucene framework inside the Oracle Database, see it at the
Jira:
http://issues.apache.org/jira/browse/LUCENE-724
This new version includes a fu
Hi Chris:
> (1) Each field is searchable and indexable.
...and I assumed hte real problem is being ableto address use cases like
"find all documents where the DRECONTENT contains the words "Action" and
the words "News" near eachother -- using stemming and other Text Analysys
tricks i may wnat
Hi Chris:
On 12/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: For 10 million records We recommend an strong database such as Oracle.
eh ... who is "We" in that statement?
We are independent consultants working for many years with Oracle databases ;)
I Suspect you'll find other peop
Hi Mark:
For 10 million records We recommend an strong database such as Oracle.
You can annotate the Schema (.xsd) which describes your XML record
to store some field in traditional VARCHAR2 or NUMBER columns to query
it faster, and in a CLOB column.
You can find more information at:
http://ww
Hi Inderjeet:
I am working in a full text searching implementation for Oracle
Databases running Lucene on the Oracle JVM.
The text searching functionality is ready yet, you can get latest
code uploaded on Tuesday, see the attachment text for the detail of
the new functionality included:
http://i
Hi Vladimir:
I don't think you can define rowid on 'insert' operations (ie, when a
new entry in the table is created) - it's a 'hidden'/automatic field
Oracle maintains itself...
The rowid is available on the Data Cartridge API, see this output from tests:
Insert. newval: 'chau' rowid: 'AAAP1EA
use Oracle anywhere at the moment.
You probably don't want that rowid field tokenized, by the way.
Otis
----- Original Message
From: Marcelo Ochoa <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, November 22, 2006 8:44:58 AM
Subject: Re: Oracle and Lucene Integration
ch
put the modification in a queue to be processed with a new Sync method
regularly using an Oracle's DBMS_SCHEDULER or by the user's
application code.
Best regards, Marcelo.
On 11/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Marcelo Ochoa wrote:
> Then I'll move the code outs
Olenin <[EMAIL PROTECTED]> wrote:
Hi, Marcelo,
Yes, putting it in the public space would be great. I personally would
be very interested to have a look. Can it be posted on the 'lucene'
website?
Vlad
-Original Message-
From: Marcelo Ochoa [mailto:[EMAIL PROTECTED]
Sent: Wedn
Hi Mark:
Very interesting.
So how does this solution manage mapping Oracle primary keys to and from Lucene
doc ids?
I am storing the rowid value as a Document field, here a code sniped
Document doc = new Document();
doc.add(new Field("rowid", rowid, Field.Store.Y
Hi all:
I read on this a list many threads about Lucene indexing framework
integration with Oracle.
http://www.gossamer-threads.com/lists/lucene/java-user/41104?search_string=oracle%20jvm%20BLOB;#41104
So it push me to work in a Lucene and Oracle JVM (a Java virtual
machine running inside the Or
55 matches
Mail list logo