Hi otis,
Rightnow I am using Multi Reader by just collecting array of indexReaders
IndexReader[] readArray =
{ indexIR1, indexIR2, indexIR3, indexIR4};
//merged reader
IndexReader mergedReader = new MultiReader(readArray);
its not possible for me to
hi Otis
I haven't tried Tiks?
Is it open source?
had u heard about LIUS before or is it talked aroung industry?
And what about Solr. It seems you worked on Solr and Nutch.
Otis Gospodnetic wrote:
>
> Gaurav, have you tried Tika? (sub-project of Apache Lucene)
>
>
> Otis
> --
> Sematext -- ht
Hi,
Have you looked at MultiReader? Opening IndexReaders like that will cost you...
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Sebastin <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, June 20, 2008 2:04:12 AM
> S
Hi All,
I need to create dynamic Index Readers based on the user input.
for example
if the user needs to see the records from june 17-june 20
Directory indexFsDir1 =
FSDirectory.getDirectory("C:\\200806\\17\\outgoing1", false);
IndexReader indexIR1 = IndexReader.open(indexFs
Hi,
Have a look at MoreLikeThis:
[EMAIL PROTECTED] trunk]$ ff \*MoreLikeThis\*.java
./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThisQuery.java
./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java
I think that or something a lot like it is what yo
Hi,
Not doable with Lucene as far as I know. I'm not even certain you would want
to split by term. What would that do TF IDF in your distributed search?
What's wrong with splitting t the doc level? There are about half a dozen
distributed (Lucene) search solutions floating around, why not r
Hey Otis,
I guess lucene API would only help me remove documents from an Index and not
'terms'. I need to remove terms from the index for all documents. any clue
as to how to get it done? I'm currently analyzing the internal index
structure. really need to get it done and if it works out I guess
Given 2 text documents I want to quantitatively find, how similar they are,
with respect to each other. Say, I want to find Cosine Similarity score
between any two given documents. I am trying to use Lucene for it (is it
good for this purpose?)
This use case is different from querying against a s
Hi,
I don't think there are tools for taking a single index and sharding it. So
you'll have to create a new index and remove what you ened to remove from the
old big index. I could be wrong :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> F
I think you really want to get the SpellCheck from the Java Lucene's
contrib/spell . I think this stuff is in nightly builds. If not, check it out
of svn - it just got updated a bit, so it's different than in Lucene 2.3.2.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
---
You might also have a look at the MemoryIndex. Question, though, is
what are you hoping to gain from doing a Query against a single
String? Are you doing a FuzzyQuery? You might look at the
SecondString project on SourceForge for doing string comparisons.
I guess I am a bit confused by y
I have a use case for comparing two given strings (attached to a specific
field)
using Lucene and get the similarity scores.
I tried but could not find any built-in way to do so. Hence assuming that
Lucene only compares a Query against Indexed documents, I came up with the
following approach:
(
NGramSpeller: source code from David Spencer ([EMAIL PROTECTED])
http://www.fsc.follett.com/destiny/licenseagreement/OpenSource.pdf
On Thu, Jun 19, 2008 at 11:34 PM, sumittyagi <[EMAIL PROTECTED]> wrote:
>
> HI,
> i need to download this file which is NGramSpeller.java
> more information about t
HI,
i need to download this file which is NGramSpeller.java
more information about this file is here
http://www.marine-geo.org/services/oai/docs/javadoc/org/apache/lucene/spell/NGramSpeller.html
but from where can i get its src code file
any ideas..plzz
--
View this message in context:
http
Created a RAMDirectory like directory class that uses
ByteArrayRandomAccessIO from http://reader.imagero.com/uio/ to allow
concurrent random file access.
On Thu, Jun 19, 2008 at 3:33 PM, Jason Rutherglen <
[EMAIL PROTECTED]> wrote:
> Looks like it cannot be used for a log system that needs concur
Gaurav, have you tried Tika? (sub-project of Apache Lucene)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Gaurav Sharma <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Wednesday, June 18, 2008 10:07:22 AM
> Subject: indexing
Looks like it cannot be used for a log system that needs concurrent read
write access to a file. Back to RandomAccessFile which will have buffering
issues, any experience with http://reader.imagero.com/uio/
On Thu, Jun 19, 2008 at 3:20 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> createOutput(
createOutput() creates a new file, overwriting the old one.
If you open the IndexInput before you call createOutput() for the 2nd
time, you should see the file.
And you definitely shouldn't have more than one IndexOutput open on
the same file (but that's not your problem here).
-Yonik
On Thu, Ju
Here's code that reproduces it.
public void testMain() throws IOException {
RAMDirectory ramDirectory = new RAMDirectory();
IndexOutput output = ramDirectory.createOutput("test");
byte[] bytes = "hello world".getBytes("UTF-8");
output.writeBytes(bytes, bytes.length);
output.flu
Yes. Also close. But then reopen the IndexOutput again later, then open
the IndexInput. I'm not sure if this is the recomended usage of these
APIs. It seems everywhere else in the Lucene code base only one is open at
a time.
On Thu, Jun 19, 2008 at 12:50 PM, Yonik Seeley <[EMAIL PROTECTED]> wr
Did you try calling flush() on the IndexOutput before opening the IndexInput?
-Yonik
On Thu, Jun 19, 2008 at 12:13 PM, Jason Rutherglen
<[EMAIL PROTECTED]> wrote:
> Seeing strange behavior with RAMDirectory. Is a file designed to supported
> IndexOutput being open concurrently with IndexInput?
What's the high-level goal here? The reason I ask is that
I'm not sure what *use* these scores are to you. Perhaps
someone will have a better approach if you post what it
is you're trying to accomplish...
Best
Erick
On Thu, Jun 19, 2008 at 1:06 PM, Gerardo Segura <[EMAIL PROTECTED]>
wrote:
> He
Seeing strange behavior with RAMDirectory. Is a file designed to supported
IndexOutput being open concurrently with IndexInput? I open an IndexInput
with IndexOutput open, with data written to the file previously, and the
IndexInput is reporting a filelength of 0, while Directory.fileLength()
rep
Hello list,
I need to generate a report with all the terms, the document ids where
they appear and the score in each document.
My current approach is to get a Term enumeration from the index and
construct a query for each of them.
But as I am a newbie with the library I wonder if there is a be
This is exactly how a score-sorted (the default) search works in
Lucene. It attempts to return the most relevant results first. Have
a look at the docs and the demo and try it out.
-Grant
On Jun 18, 2008, at 10:59 PM, syedfa wrote:
Dear Fellow Java/Lucene developers:
I want to know if
Hi Gaurav,
To which mime types are you referring?
I can't think of a tool designed for this, but one thing you might try is
checking whether the input is compressed/packed, and if so first
decompressing/unpacking it, and then using the "strings" program (available on
Linux and Cygwin) to extra
26 matches
Mail list logo