Re: creating Array of IndexReaders

2008-06-19 Thread Sebastin
Hi otis, Rightnow I am using Multi Reader by just collecting array of indexReaders IndexReader[] readArray = { indexIR1, indexIR2, indexIR3, indexIR4}; //merged reader IndexReader mergedReader = new MultiReader(readArray); its not possible for me to

Re: indexing unsupported mime types using Lucene

2008-06-19 Thread Gaurav Sharma
hi Otis I haven't tried Tiks? Is it open source? had u heard about LIUS before or is it talked aroung industry? And what about Solr. It seems you worked on Solr and Nutch. Otis Gospodnetic wrote: > > Gaurav, have you tried Tika? (sub-project of Apache Lucene) > > > Otis > -- > Sematext -- ht

Re: creating Array of IndexReaders

2008-06-19 Thread Otis Gospodnetic
Hi, Have you looked at MultiReader? Opening IndexReaders like that will cost you... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Sebastin <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, June 20, 2008 2:04:12 AM > S

creating Array of IndexReaders

2008-06-19 Thread Sebastin
Hi All, I need to create dynamic Index Readers based on the user input. for example if the user needs to see the records from june 17-june 20 Directory indexFsDir1 = FSDirectory.getDirectory("C:\\200806\\17\\outgoing1", false); IndexReader indexIR1 = IndexReader.open(indexFs

Re: Arbitrary String to String Similarity Score

2008-06-19 Thread Otis Gospodnetic
Hi, Have a look at MoreLikeThis: [EMAIL PROTECTED] trunk]$ ff \*MoreLikeThis\*.java ./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThisQuery.java ./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java I think that or something a lot like it is what yo

Re: Copying a part of index and index structure

2008-06-19 Thread Otis Gospodnetic
Hi, Not doable with Lucene as far as I know. I'm not even certain you would want to split by term. What would that do TF IDF in your distributed search? What's wrong with splitting t the doc level? There are about half a dozen distributed (Lucene) search solutions floating around, why not r

Re: Copying a part of index and index structure

2008-06-19 Thread Anshum
Hey Otis, I guess lucene API would only help me remove documents from an Index and not 'terms'. I need to remove terms from the index for all documents. any clue as to how to get it done? I'm currently analyzing the internal index structure. really need to get it done and if it works out I guess

Re: Arbitrary String to String Similarity Score

2008-06-19 Thread Sangrish
Given 2 text documents I want to quantitatively find, how similar they are, with respect to each other. Say, I want to find Cosine Similarity score between any two given documents. I am trying to use Lucene for it (is it good for this purpose?) This use case is different from querying against a s

Re: Copying a part of index and index structure

2008-06-19 Thread Otis Gospodnetic
Hi, I don't think there are tools for taking a single index and sharding it. So you'll have to create a new index and remove what you ened to remove from the old big index. I could be wrong :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > F

Re: Where to get NGramSpeller.java

2008-06-19 Thread Otis Gospodnetic
I think you really want to get the SpellCheck from the Java Lucene's contrib/spell . I think this stuff is in nightly builds. If not, check it out of svn - it just got updated a bit, so it's different than in Lucene 2.3.2. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ---

Re: Arbitrary String to String Similarity Score

2008-06-19 Thread Grant Ingersoll
You might also have a look at the MemoryIndex. Question, though, is what are you hoping to gain from doing a Query against a single String? Are you doing a FuzzyQuery? You might look at the SecondString project on SourceForge for doing string comparisons. I guess I am a bit confused by y

Arbitrary String to String Similarity Score

2008-06-19 Thread Sangrish
I have a use case for comparing two given strings (attached to a specific field) using Lucene and get the similarity scores. I tried but could not find any built-in way to do so. Hence assuming that Lucene only compares a Query against Indexed documents, I came up with the following approach: (

Re: Where to get NGramSpeller.java

2008-06-19 Thread Geeti Gupta
NGramSpeller: source code from David Spencer ([EMAIL PROTECTED]) http://www.fsc.follett.com/destiny/licenseagreement/OpenSource.pdf On Thu, Jun 19, 2008 at 11:34 PM, sumittyagi <[EMAIL PROTECTED]> wrote: > > HI, > i need to download this file which is NGramSpeller.java > more information about t

Where to get NGramSpeller.java

2008-06-19 Thread sumittyagi
HI, i need to download this file which is NGramSpeller.java more information about this file is here http://www.marine-geo.org/services/oai/docs/javadoc/org/apache/lucene/spell/NGramSpeller.html but from where can i get its src code file any ideas..plzz -- View this message in context: http

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Created a RAMDirectory like directory class that uses ByteArrayRandomAccessIO from http://reader.imagero.com/uio/ to allow concurrent random file access. On Thu, Jun 19, 2008 at 3:33 PM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > Looks like it cannot be used for a log system that needs concur

Re: indexing unsupported mime types using Lucene

2008-06-19 Thread Otis Gospodnetic
Gaurav, have you tried Tika? (sub-project of Apache Lucene) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Gaurav Sharma <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Wednesday, June 18, 2008 10:07:22 AM > Subject: indexing

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Looks like it cannot be used for a log system that needs concurrent read write access to a file. Back to RandomAccessFile which will have buffering issues, any experience with http://reader.imagero.com/uio/ On Thu, Jun 19, 2008 at 3:20 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > createOutput(

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Yonik Seeley
createOutput() creates a new file, overwriting the old one. If you open the IndexInput before you call createOutput() for the 2nd time, you should see the file. And you definitely shouldn't have more than one IndexOutput open on the same file (but that's not your problem here). -Yonik On Thu, Ju

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Here's code that reproduces it. public void testMain() throws IOException { RAMDirectory ramDirectory = new RAMDirectory(); IndexOutput output = ramDirectory.createOutput("test"); byte[] bytes = "hello world".getBytes("UTF-8"); output.writeBytes(bytes, bytes.length); output.flu

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Yes. Also close. But then reopen the IndexOutput again later, then open the IndexInput. I'm not sure if this is the recomended usage of these APIs. It seems everywhere else in the Lucene code base only one is open at a time. On Thu, Jun 19, 2008 at 12:50 PM, Yonik Seeley <[EMAIL PROTECTED]> wr

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Yonik Seeley
Did you try calling flush() on the IndexOutput before opening the IndexInput? -Yonik On Thu, Jun 19, 2008 at 12:13 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > Seeing strange behavior with RAMDirectory. Is a file designed to supported > IndexOutput being open concurrently with IndexInput?

Re: looking for efficient way to dump index info

2008-06-19 Thread Erick Erickson
What's the high-level goal here? The reason I ask is that I'm not sure what *use* these scores are to you. Perhaps someone will have a better approach if you post what it is you're trying to accomplish... Best Erick On Thu, Jun 19, 2008 at 1:06 PM, Gerardo Segura <[EMAIL PROTECTED]> wrote: > He

RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Seeing strange behavior with RAMDirectory. Is a file designed to supported IndexOutput being open concurrently with IndexInput? I open an IndexInput with IndexOutput open, with data written to the file previously, and the IndexInput is reporting a filelength of 0, while Directory.fileLength() rep

looking for efficient way to dump index info

2008-06-19 Thread Gerardo Segura
Hello list, I need to generate a report with all the terms, the document ids where they appear and the score in each document. My current approach is to get a Term enumeration from the index and construct a query for each of them. But as I am a newbie with the library I wonder if there is a be

Re: Improving search performance with the results returned

2008-06-19 Thread Grant Ingersoll
This is exactly how a score-sorted (the default) search works in Lucene. It attempts to return the most relevant results first. Have a look at the docs and the demo and try it out. -Grant On Jun 18, 2008, at 10:59 PM, syedfa wrote: Dear Fellow Java/Lucene developers: I want to know if

RE: indexing unsupported mime types using Lucene

2008-06-19 Thread Steven A Rowe
Hi Gaurav, To which mime types are you referring? I can't think of a tool designed for this, but one thing you might try is checking whether the input is compressed/packed, and if so first decompressing/unpacking it, and then using the "strings" program (available on Linux and Cygwin) to extra