You are optimizing before the threads are finished adding to the index.
I think this should work:
IndexWriter writer = new IndexWriter("D:\\index", new StandardAnalyzer(),
true);
File file=new File(args[0]);
Thread t1=new Thread(new IndexFiles(writer,file));
Thread t2=new Thread(new IndexFiles(wri
I am running this on OpenVMS V8.2-1 on IA64. For a small number of files this
works all fine.
I checked the resources part and i have enough disk and ram available.
Regards,
Rishi
Michael McCandless-2 wrote:
>
> It's very odd that CheckIndex has no trouble opening the segment's
> files, yet
> we used different analyzers and regenerated the index each
> time with the same results...used Luke each time already.
> Currently we're using SnowBall and Luke can't find any
> documents using the supplied query examples below (in
> zzz-all).
>
> Same happened using StandardAnalyzer (for both,
I would like to know if there is a simple way to force Lucene to adopt the
simple cosine similarity of the term frequency vectors of the documents and
the query for ranking the result. In practice the score sc_i of the document
i should be given by:
sc_i = (D_i*Q)/(|D_i|*|Q|)
where D_i = vector o
In 2.9 there will be - IndexWriter#getReader().
BTW, note that even if someone deletes, your reader may not see this delete.
If you use IndexWriter to delete docs, the open reader won't see those
deletes. So you may still have a problem.
I don't know how much stuff users can index, and how often
Users can index really a lot of stuff, so I'd like not to keep things in
memory for too long.
Even if I keep a set of things added, how do I know if something has
been deleted via a delete? It seems rather difficult to keep this set of
documents added in sync with the index reader on the index
How many documents do you index between you refresh a reader? If it's not
too much, I'd keep a Set of those terms and check every incoming document in
the set and then the reader.
Note that the set keeps only just the terms of those documents your reader
doesn't see. You should clear() it after yo
Hi all!
I'm currently running a big lucene index and one of my main concerns is
the integrity of the data entered. A few things come to mind, like
enforcing that certain fields be non-blank, forcing certain formats etc...
All these validations are easy to do with lucene, since I can validate
Hello,
we used different analyzers and regenerated the index each time with the same
results...used Luke each time already.
Currently we're using SnowBall and Luke can't find any documents using the
supplied query examples below (in zzz-all).
Same happened using StandardAnalyzer (for both, inde
> hm...try tat...but doesn't seems to be working for me though
Discarding lengthNorm didn't work for you. Very interesting. I am not sure but
I think inverse document frequency causing problem to you. Probably one of
query word (very common word) has high document frequency, and the docs having
> hm...try tat...but doesn't seems to be working for me though
Discarding lengthNorm didn't work for you. Very interesting. I am not sure but
I think inverse document frequency causing problem to you. Probably one of
query word (very common word) has high document frequency, and the docs having
I would just throw your doc into a MemoryIndex (lives in contrib/
memory, I think; it only holds one doc), get the Vector and do what
you need to do. So you would kind of be doing indexing, but not really.
On Aug 13, 2009, at 8:43 AM, joe_coder wrote:
Grant, thanks for responding.
My i
Several, all of which boil down to "what analyzers are you usingduring
indexing and searching?". Without that information, we can't
say much.
Also, I'd recommend you get a copy of Luke and examine your index
to see whether what's in there is what you expect.
And query.toString and (as Grant says)
For example, I am able to do
Analyzer analyzer = new StandardAnalyzer(); // or any other analyzer
TokenStream ts = analyzer.tokenStream("myfield",new StringReader("some
text goes here"));
Token t = ts.next();
while (t!=null) {
System.out.println("token: "+t));
t
Grant, thanks for responding.
My issue is that I am not planning to use lucene ( as I don't need any
search capability, atleast yet). All I have is a text document and I need to
extract keywords and their frequency ( which could be a simple split on
space and tracking the count). But I realize th
It's very odd that CheckIndex has no trouble opening the segment's
files, yet when you run optimize the OS reports a "file not found"
exception (errno 5). Something odd is happening at the OS/filesystem
level.
What OS are you running on?
Can you boil this down to a smallish standalone test that
On Aug 13, 2009, at 7:40 AM, joe_coder wrote:
I was wondering if there is any way to directly use Lucene API to
extract
terms from a given string. My requirement is that I have a text
document for
which I need a term frequency vector ( after stemming, removing
stopwords
and synonyms che
I tried creating the index in different disks but still i see the issue :-(
I tried to index documents in other disks also and got the same exception.
I also tried
$ java org.apache.lucene.index.CheckIndex /SYS$SYSDEVICE/RISHI/melon_1600/
-segment _61
NOTE: testing will be more thorough if you
Hello,
We're experiencing a problem using Lucene 2.4.1 and Compass 2.1.4 using
wildcard search.
Attribute values containing slashes can be searched using the full word,
but not using wildcards. We already tried different analyzers with the
same result.
Slash isn't mentioned as a stop word onl
I noticed the exception is "Caused by: java.io.FileNotFoundException:
/SYS$SYSDEVICE/RISHI/melon_1600/_61.cfs (i/o error (errno:5))"
I searched for i/o error (errno:5) and found some information which
associates it w/ a more native IO problem, like corrupt file due to system
crash etc.
Did you ex
It is a local file system.
We are using lucene 2.4 and java 1.5
Regards,
Rishi
Shai Erera wrote:
>
> Is that a local file system, or a network share?
>
> On Thu, Aug 13, 2009 at 1:07 PM, rishisinghal
> wrote:
>
>>
>> >>Is there any chance that two writers are open on this directory?
>> No,
I was wondering if there is any way to directly use Lucene API to extract
terms from a given string. My requirement is that I have a text document for
which I need a term frequency vector ( after stemming, removing stopwords
and synonyms checks ). The result needs to be the terms and frequency.
I
Hi
I have recently created an indexing reference project using Spring
Integration. May not help you with what you're doing but it might be
interesting for creating asynchronous indexing using JMS.
http://code.google.com/p/lucene-indexing-with-si/
Cheers
Amin
On Thu, Aug 13, 2009 at 11:53 AM,
Thanks Simon! :)
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Thu, Aug 13, 2009 at 3:58 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> On Thu, Aug 13, 20
Hi all,
I am new to multi-thread programming and lucene. I want to change the
indexing demo of lucene143 into a multi-thread one. I create one instance of
IndexWriter which is shared by three threads. But I find that the time it
costs when three threads are used is approximate three times of
On Thu, Aug 13, 2009 at 12:24 PM, Anshum wrote:
> Hey Simon,
> Thanks for the comment, though would be great to have the comment @ the
> blog! :)
done!
Simon
> About testing vanilla sphinx Vs Sphinx, have that pipelined but would be
> some time before I go ahead and do that.
> I'm also planning a
Hey Simon,
Thanks for the comment, though would be great to have the comment @ the
blog! :)
About testing vanilla sphinx Vs Sphinx, have that pipelined but would be
some time before I go ahead and do that.
I'm also planning a benchmarking (of search & indexing) of 2.4 & 2.9 (when
its here) with the
hm...try tat...but doesn't seems to be working for me though
Ahmet Arslan wrote:
>
>> I am trying to boost results that have all the query
>> in it to increase its ranking. But both the query unfortunately does not
>> > seems to effect it
>
> Did you read last two messages on this thread?
>
>
Is that a local file system, or a network share?
On Thu, Aug 13, 2009 at 1:07 PM, rishisinghal wrote:
>
> >>Is there any chance that two writers are open on this directory?
> No, thats not true.
>
> >>something external to Lucene is removing files from the directory.
> No this also has rare chanc
>>Is there any chance that two writers are open on this directory?
No, thats not true.
>>something external to Lucene is removing files from the directory.
No this also has rare chances as I am the owner of these files and other
then me no one can delete the, :-)
Here are all the files in the
> I am trying to boost results that have all the query
> in it to increase its ranking. But both the query unfortunately does not >
> seems to effect it
Did you read last two messages on this thread?
http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html
Is there any chance that two writers are open on this directory? Or,
something external to Lucene is removing files from the directory.
It looks like there were at least two missing files (_37
On Thu, Aug 13, 2009 at 5:19 AM, rishisinghal wrote:
>
> Hi,
>
> I am trying to index documents and whe
Hi,
I am trying to index documents and when all is complete and optimize is
called I get
IFD [main]: setInfoStream
deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@4fced0
IW 0 [main]: setInfoStream:
dir=org.apache.lucene.store.FSDirectory@/SYS$SYSDEVICE/RISHI/melon_1600
au
I am trying to boost results that have all the query in it to increase its
ranking. But both the query unfortunately does not seems to effect it
Ahmet Arslan wrote:
>
>> thanks for the suggestion, but unfortunately it does not
>> work.
>
> What are you trying to do? Both Adriano's and my query
Anshum, thanks for posting this on the list. I have a view comments on
that benchmark while being happy that lucene has an upper hand in
yours.
I wonder if you can publish the various modifications you did to
either of those? If not would it be possible to run the benchmarks
against the vanilla ver
Once you open an issues on the spartial / analyzers/cn contribs feel
free to assign me to them.
simon
On Thu, Aug 13, 2009 at 9:47 AM, Amin Mohammed-Coleman wrote:
> Cool! I'll be on the case.
>
> Cheers!
>
> Amin
>
> On Thu, Aug 13, 2009 at 8:44 AM, Simon Willnauer
> wrote:
>>
>> There is a lot
Cool! I'll be on the case.
Cheers!
Amin
On Thu, Aug 13, 2009 at 8:44 AM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> There is a lot of code in /contrib which needs proper documentation,
> refactoring and clean-up.
> For refactoring you can have a quick look at /analyzers/smartcn.
There is a lot of code in /contrib which needs proper documentation,
refactoring and clean-up.
For refactoring you can have a quick look at /analyzers/smartcn.
Clean-up and documentation is needed in /contrib/spartial which still
suffers from lots of legacy comments and certainly legacy code.
I gue
I would like to know if there is a simple way to force Lucene to adopt the
simple cosine similarity of the term frequency vectors of the documents and
the query for ranking the result.
Thank you
Claudio
-
To unsubscribe, e-mai
Chrisitan,
if you haven't done so you might find Luke
(http://www.getopt.org/luke/) very helpful so see what has been
indexed and how.
simon
On Thu, Aug 13, 2009 at 6:10 AM, Christian
Bongiorno wrote:
> turns out the index is being built with lower-case terms which is why we
> aren't getting hits
Thanks for your replies. I have checked out trunk and have started looking
at what I can do. Any more suggestions as usual always welcome.
Thanks all!
Amin
On Wed, Aug 12, 2009 at 10:28 PM, Chris Hostetter
wrote:
>
> : that you use. Also, we are nearing 2.9 release, so it would
> : be great t
41 matches
Mail list logo