*Here is a self contained code:
*
*
*
*
I verified with luke no 's' is indexed in the index. The output I get is:
testChars
:(bloom's*) got 0 Query is: :bloom's*
:(bloom) got 1 Query is: :bloom
:(bloom AND b*) got 1 Query is: +:bloom +:b*
So what I don't understand why
t at indexing your
> analyzer is converting "bloom's" to "bloom" but not at search time.
> Which implies that you aren't using the same analyzer in both cases.
>
>
> --
> Ian.
>
>
> On Mon, Jul 11, 2011 at 4:19 PM, jm wrote:
> > *Hi,*
>
*Hi,*
*
*
*My env is jdk1.6 and lucene3.3.*
*
*
*At index time I have this:*
*
*
*
Directory directory = FSDirectory.open(new
File("d:\\temp\\lucene.index"));
IndexWriter writer = new IndexWriter(directory, myAnalizer,
IndexWriter.MaxFieldLength.UNLIMITED);
Document doc = ne
maybe http://youdebug.kenai.com/ could be useful. If you are lucky you could
get it to set a breakpoint when the recursive call has reached depth X.
On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:
> Hi,
>
> OK, so it looks like it's not MemoryIndex and its C
Hi Otis,
I have exactly the same scenario, also on 3.1, the only difference is I have
less queries, like 20 or 30.
Yesterday this code processed over 1 million incoming docs (in
a synthetic test), and did not get any error...
Not very helpful maybe but just so you get some other feedback.
javie
mostly status of the indexes, whether there is some corruption or all is ok.
On Thu, Apr 14, 2011 at 9:20 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> what kind of diagnostics are you looking for?
>
> simon
>
> On Thu, Apr 14, 2011 at 9:14 PM, jm wrote:
&g
t;
> Best
> Erick
>
> On Thu, Apr 14, 2011 at 11:56 AM, jm wrote:
>
> > Hi,
> >
> > I need to collect some diagnostic info from customer sites, so I would
> like
> > to get info on the status of lucene indexes...but I don't want the
> process
>
Hi,
I need to collect some diagnostic info from customer sites, so I would like
to get info on the status of lucene indexes...but I don't want the process
of collecting to take very long.
So I am considering Checkindex. I tested in a small index (60k docs) and it
took 12 seconds. A site usually h
or maybe MemoryIndex (in contrib) is more suited to what he wants
On Fri, Apr 1, 2011 at 1:10 PM, Ian Lea wrote:
> RAMDirectory. The clue is in the name ...
>
>
> --
> Ian.
>
>
> On Fri, Apr 1, 2011 at 11:08 AM, Patrick Diviacco
> wrote:
> > Is there a way to index data into memory without wr
4:27 PM, Grant Ingersoll wrote:
> Hi JM,
>
> On Jun 23, 2010, at 4:01 AM, jm wrote:
>
>> Hi,
>>
>> I am trying to compile some arguments in favour of lucene as
>> management is deciding weather to standardize on lucene or a competing
>> commercial product
I am pretty sure at least some number already exists...cause I have
seen mentioned several times things like '3.0 is 3 times faster than
2.4 in benchmark x' and things like that, the only thing is that
numbers are not probably consolidated in one place
On Fri, Jun 25, 2010 at 12:27 AM, Itamar Syn-
http://search-lucene.com/
>
>
>
> - Original Message
>> From: jm
>> To: java-user@lucene.apache.org
>> Sent: Wed, June 23, 2010 5:57:32 PM
>> Subject: Re: arguments in favour of lucene over commercial competition
>>
>> yes, in my case
rkl
>> To: java-user
>> Sent: Wed, June 23, 2010 5:15:46 PM
>> Subject: Re: arguments in favour of lucene over commercial competition
>>
>> Just curious. What commercial alternatives are out there?
>
> On Wed, Jun 23,
>> 2010 at 04:01, jm <
>>
--
> http://search-lucene.com/
>
>
> * ...
>
> Otis
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message
>> From: jm
>> To: java-user@lucene.apa
Hi,
I am trying to compile some arguments in favour of lucene as
management is deciding weather to standardize on lucene or a competing
commercial product (we have a couple of produc, one using lucene,
another using commercial product, imagine what am i using). I searched
the lists but could not f
; Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: jm [mailto:jmugur...@gmail.com]
>> Sent: Wednesday, April 21, 2010 3:59 PM
>> To: java-user@lucene.apache
oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched
lucene source code for 256 and found nothing suspicious, and that was
itself suspicious cause it looked clearly like an inner limit. Of
course I should have searched for 255...
I'll see how I proceed cause I don't want to use a cus
I am analizying this wiht my custom analyzer:
String s = "mail77 mail8 tc ro45mine durante
jjkk
stead. Then
> maybe post the code for your custom analyzer, or step through in a
> debugger or however you prefer to debug code.
>
>
> --
> Ian.
>
>
> On Wed, Apr 21, 2010 at 8:20 AM, jm wrote:
>> I am using a TermQuery so no analyzer used...
>> protected
>
>
> --
> Ian.
>
>
> On Tue, Apr 20, 2010 at 4:58 PM, jm wrote:
>> I am encountering a strange issue. I have a CustomStopAnalyzer. If I
>> do this (supporting code taken from AnalyzerUtils in LIA3 source code
>> Mike upload
I am encountering a strange issue. I have a CustomStopAnalyzer. If I
do this (supporting code taken from AnalyzerUtils in LIA3 source code
Mike uploaded):
Analyzer customStopAnalyzer = new CustomStopAnalyzer();
AnalyzerUtils.displayTokensWithFullDetails(customStopAnalyzer,
"mail77")
On Fri, Apr 16, 2010 at 5:01 PM, jm wrote:
> oh, just FYI, the documents I add are the same, they have all the same
I meant 'are NOT the same size'...
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.a
pps
> running on this one machine?
>
> Or... is it possible two IWs were accidentally opened on the same
> directory? Are you changing the lock factory for your FSDirectory?
>
> Mike
>
> On Thu, Apr 15, 2010 at 11:24 AM, jm wrote:
>> not sure if it matters, but I have
not sure if it matters, but I have to correct my statment, where this
has happened was both times win2008 R1 64bits, local filesystem.
I am trying to reproduce in my dev workstation but unable so far.
On Thu, Apr 15, 2010 at 10:11 AM, jm wrote:
> Hi Mike
>
> I have a server side,
e?
>
> Is this just a local filesystem (disk) under vista?
>
> Mike
>
> On Wed, Apr 14, 2010 at 7:41 AM, jm wrote:
>> Hi,
>>
>> I am trying to chase an issue in our code and it is being quite
>> difficult. We have seen two instances (see below) where we ge
Hi,
I am trying to chase an issue in our code and it is being quite
difficult. We have seen two instances (see below) where we get the
same error. I have been trying to reproduce but it has been impossible
so far.I have several threads, some might be creating indices and
adding documents, others c
I have an issue with my custom analyzer...see the following code:
public static Analyzer getAnalyzer() {
// cache the analyzer
if (analyzer == null) {
analyzer = new CustomStopAnalyzer(); //does some basic
customization, nothing too fancy
//test
then the total count is in tp.totalHits -- simple. The above query will
> still count all hits, but return only 1. Adjust according to your needs (e.g.
> 10).
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetap
Hi,
I need to find out how many hits a query will get, is this a valid
way? (Lucene 3.0)
Query lucquery = ...;
IndexSearcher[] indexes = ...
MultiSearcher ms = new MultiSearcher(indexes);
TopDocs tp = ms.search(lucquery, Integer.MAX_VALUE);
int hits = tp.to
Thanks for the replies Ian and Robert. In my case, I am in a bit of a
uneasy position, cannot reindex, original docs are gone...
What would you recommend? I have to choose one value, and some
customers started using our system with lucene 2.3, others with lucene
2.4 and others with 2.9.
My usage o
someone?
On Tue, Feb 16, 2010 at 11:47 AM, jm wrote:
> Hi,
>
> previously I was using 2.9 (upgraded from 2.4 but did not fix warnings
> etc). Now I have upgraded to 3.0, so I had to fix all deprecated
> methods etc. My question is with Version type parameter in some
> Token* cl
Hi,
previously I was using 2.9 (upgraded from 2.4 but did not fix warnings
etc). Now I have upgraded to 3.0, so I had to fix all deprecated
methods etc. My question is with Version type parameter in some
Token* classes.
Some of our customers have our product with lucene 2.4 (some upgraded
from 2.
On Mon, Nov 30, 2009 at 2:34 PM, Michael McCandless
wrote:
> On Mon, Nov 30, 2009 at 7:22 AM, jm wrote:
>> No other exceptions I could spot.
>
> OK
>
>> OS: win2003 32bits, with NTFS. This is a vm running on vmware fusion on a
>> mac.
>
> That should be
g addIndexes? Do you ever forcefully remove
> the write.lock, or use Lucene's API to unlock the index? The more
> info you can give about how your app uses Lucene, the better.
>
> And if possible run with infoStream enabled so we can maybe capture
> the corruption happening.
>
e capture
> the corruption happening.
>
> Mike
>
> On Fri, Nov 27, 2009 at 7:09 AM, jm wrote:
>> I manually did CheckIndex in all indexes and found two with issues:
>>
>> first
>> Segments file=segments_42w numSegments=21 version=FORMAT_HAS_PROX [Lucene
count; avg 0
term/freq vector fields per doc]
WARNING: 1 broken segments (containing 17119 documents) detected
I have not been able to reproduce this in my env.
javier
On Fri, Nov 27, 2009 at 12:23 PM, jm wrote:
> Ok, I got the index from the production machine, but I am having some
> p
ture & post the resulting output leading
> up to the exception?
>
> Mike
>
> On Thu, Nov 26, 2009 at 7:12 AM, jm wrote:
>> The process is still running and ops dont want to stop it. As soon as
>> stops I'll try checkindex.
>>
>> Its created brand new
>
> Can you run CheckIndex on your index and post the output?
>
> Was this index created from scratch on Lucene 2.4.1? Or, created from
> an earlier Lucene version?
>
> Mike
>
> On Thu, Nov 26, 2009 at 6:03 AM, jm wrote:
>> or are we really? I think we are on 1.6 updat
or are we really? I think we are on 1.6 update 14 right??
sorry Im lost right now on jdk version numbering
On Thu, Nov 26, 2009 at 12:01 PM, jm wrote:
> on second thought...I hadnt noticed the jdk numbers properly, we are
> using using b28, and JDK 6 Update 10 (b28) is the one fixin
on second thought...I hadnt noticed the jdk numbers properly, we are
using using b28, and JDK 6 Update 10 (b28) is the one fixing this...
ok forget this then
thanks!
On Thu, Nov 26, 2009 at 11:55 AM, jm wrote:
> Hi,
>
> Dont know if this should be here or in java-dev, posting to
Hi,
Dont know if this should be here or in java-dev, posting to this one
first. In one of our installations, we have encountered an exception:
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.index.CorruptIndexException: docs out o
onsumption just for the
> writers?
>
> Mike
>
> jm wrote:
>
>> yes I have tested with up to 512MB, althought I dont have the hprof
>> dump file of those tests, they also got the OOM. I was just wondering
>> whether having so many instances of FreqProxTermsWriter$Posti
, Dec 16, 2008 at 11:24 AM, Ian Lea wrote:
> Can you not just give the process some more memory? 128Mb seems very
> low for what you are doing.
>
>
> --
> Ian.
>
>
> On Mon, Dec 15, 2008 at 6:28 PM, jm wrote:
>> Hi,
>>
>> I am having a memory issue wi
Hi,
I am having a memory issue with Lucene2.4. I am strating a process
with 128MB of ram, this process handles incoming request from others,
and indexes objects in a number of lucene indexes.
My lucene docs, all have 6 fields:
-one is small: Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVe
I am using MemoryIndex in a similar scenario. I have not as many
queries though, less than 100, but several 'articles' coming per
second.
Works nicely.
On Sun, Nov 23, 2008 at 10:00 AM, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
>
> On Nov 22, 2008, at 10:57 PM, Ian Holsman wrote:
>>
>> Hi. apologie
Thanks for the reply Mike.
> You can also wrap any other deletion policy (it doesn't have to be
> KeepOnlyLastCommit).
>
> When you want to do a backup, make sure to do try/finally, ie:
>
>IndexCommitPoint cp = dp.snapshot();
>try {
> Collection files = cp.getFileNames();
>
>
Hi guys,
I want to make use of the possibylity of hot backups in 2.3. If i
understand correctly, the only thing i need to do is to open the
writers with SnapshotDeletionPolicy, is that correct?
SnapshotDeletionPolicy dp = new SnapshotDeletionPolicy(new
KeepOnlyLastCommitDeletionPolicy());
final I
I am very interested indeed, do I understand correctly that the tweak
you made reduces the memory when searching if you have many docs in
the index?? I am omitting norms too.
If that is the case, can someone point me to what is hte required
change that should be done? I understand from Yoniks comm
I havent upgraded yet but I think I read in the aperture list that
they already had some extractors for some office 2007 stuff in trunk
some ago.
On Nov 8, 2007 3:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> You might also consider asking on the Tika (a Lucene subproject
> currently in Incu
Hi,
I understand optimizing could take longer when index is bigger, so it
might take a while when index is huge.
I think I remember seeing something in the lucene list about optimizing but not
to the optimum case, only to a less than optimum state, but using less
time, is that correct?
Does some
We had to develop vb code to convert pst to eml files.
I am using mbox, works fine for me. And I am also using aperture, but only
for extracting text from non-mail files (like office etc), works fine too.
On 7/2/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Anyone have any recommendations on
two?
Field(Name, Value, Store, index)
*
*Field(Name, Value, Store, index, Field.TermVector.NO)
Best
Erick
On 3/14/07, jm <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I want to make my index as small as possible. I noticed about
> field.setOmitNorms(true), I read in the list the di
Hi,
I want to make my index as small as possible. I noticed about
field.setOmitNorms(true), I read in the list the diff is 1 byte per
field per doc, not huge but hey...is the only effect the score being
different? I hardly mind about the score so that would be ok.
And can I add to an index witho
, Michael McCandless <[EMAIL PROTECTED]> wrote:
"jm" <[EMAIL PROTECTED]> wrote:
> I have two processes running in parallel, each one adding and deleting
> to its own set of indexes. Since I upgraded to 2.1 I am getting a NPE
> at RAMDirectory.java line 207 in one of
Hello all,
I have two processes running in parallel, each one adding and deleting
to its own set of indexes. Since I upgraded to 2.1 I am getting a NPE
at RAMDirectory.java line 207 in one of the processes.
Line 207 is:
RAMFile existing = (RAMFile)fileMap.get(name);
the stack trace is:
java
AIL PROTECTED]> wrote:
On Wednesday 14 February 2007 17:12, jm wrote:
> So my question, is it possible to disable some of the caching lucene
> does so the memory consumption will be smaller (I am a bit concerned
> on the memory usage side)? Or the memory savings would not pay off?
Hi,
I updated my code to use 2.1 (IndexWriter deleting docs etc), and when
using native locks I still get a lock like this:
lucene-2361bf484af61abc81e6e7f412ad43af-n-write.lock
and when using SimpleFSLockFactory:
lucene-2361bf484af61abc81e6e7f412ad43af-write.lock
From the changes.txt:
9. LUCEN
Hi,
That last thread about caching reminded me of something. Me need is
actually the opposite...
I use lucene to search in hundreds/thousands of indexes. Doing a
lucene query on a set of the indexes is only one of the steps involved
in my 'queries', and some of the other steps take longer than l
yes that would be ok for my, as long as I can reuse my child analyzer.
On 11/27/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
On Nov 27, 2006, at 9:57 AM, jm wrote:
> On 11/27/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
>>
>> On Nov 26, 2006, at 8:57 AM, jm w
On 11/27/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
On Nov 26, 2006, at 8:57 AM, jm wrote:
> I tested this. I use a single static analyzer for all my documents,
> and the caching analyzer was not working properly. I had to add a
> method to clear the cache each time a new
ight help depends on how expensive your analyzer
> chain is. For some examples on how to set up analyzers for chains
> of token streams, see MemoryIndex.keywordTokenStream and class
> AnalzyerUtil in the same package.
>
> Wolfgang.
>
> On Nov 22, 2006, at 4:15 AM, jm wro
up analyzers for chains
> of token streams, see MemoryIndex.keywordTokenStream and class
> AnalzyerUtil in the same package.
>
> Wolfgang.
>
> On Nov 22, 2006, at 4:15 AM, jm wrote:
>
>> checking one last thing, just in case...
>>
>> as I mentioned, I have previously
not seem to
be an easy way to do that no?
thanks
On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
On Nov 21, 2006, at 12:38 PM, jm wrote:
> Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh
> I will explore the other options then.
To get started you
Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh
I will explore the other options then.
On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
On Nov 21, 2006, at 7:43 AM, jm wrote:
> Hi,
>
> I have to decide between using a RAMDirectory and Me
Hi,
I have to decide between using a RAMDirectory and MemoryIndex, but
not sure what approach will work better...
I have to run many items (tens of thousands) against some queries (100
at most), but I have to do it one item at a time. And I already have
the lucene Document associated with each
eleted and locking works ok
On 6/20/06, jm <[EMAIL PROTECTED]> wrote:
Hi,
I am trying to peruse lucene's Lock for my own purposes, I need to
lock several java processes and I thought I could reuse the Lock
stuff. I understand lucene locks work across jvm.
But I cannot make it work. I
Hi,
I am trying to peruse lucene's Lock for my own purposes, I need to
lock several java processes and I thought I could reuse the Lock
stuff. I understand lucene locks work across jvm.
But I cannot make it work. I tried to reproduce my problem in a small class:
public class SysLock {
privat
ok, thanks for letting me know.
I entered a bug, 556.
javi
On 4/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Hi Jim,
>
> This went to the old mailing list...
> Could you email this to java-user@lucene.apache.org
> and maybe open a JIRA bug for it?
>
> -Yonik
On 4/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Hi Jim,
>
> This went to the old mailing list...
> Could you email this to java-user@lucene.apache.org
> and maybe open a JIRA bug for it?
>
> -Yonik
>
> On 4/26/06, jm <[EMAIL PROTECTED]> wrote:
> >
Hi,
I have encountered an issue with lucene1.9.1. It involves
MatchAllDocsQuery, MultiSearcher and a custom HitCollector. The
following code throws java.lang.UnsupportedOperationException.
If I remove the MatchAllDocsQuery condition (comment whole //1
block), or if I dont use the custom hitcoll
Hi,
I have an index with about 20 different fields.
I'd like to query my index to get the list of all different terms for
a given field.
Is it something possible in a simple way? I mean simpler than getting
every terms of the index and then keeping only those which match the
given field.
Thanks i
Hi,
Actually I have the same problem. Queries are working on a few fields
but not all of them, although the index is ok (checked it with luke).
But I have no idea to solve that...
Jean-Marie Tinghir
2005/6/22, Urs Eichmann <[EMAIL PROTECTED]>:
> My index consists of about 26 fields. I have a
Hi,
> Just looking for pointers. I'm new to java and lucene.
I guess you're new to J2EE too... You should study servlets and jsps a bit.
You'll have to call Searcher.java in your servlet, unless you use
frameworks like struts then it'll be in your business code.
Good luck!
Jean-Marie Tinghir
> do you keep your indexWriter open all the time during process?
I think that might be the real cause. And as it reopen it all the
time, the mergeFactor isn't used at all I guess...
I'll try to modify that.
Thanks.
Jean-Marie Tinghir
-
s to index 450 MB.
But the difference in time is due to the fact of indexing in one index or not.
JM
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
> Could you qualify a bit more about what is slow?
Well, it just took 145 minutes to index 2670 files (450 MB) in one
index (29 MB).
It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB).
> Perhaps you need to optimize the index?
Perhaps, never tried it.
Hi,
I have a 25 Mb index and was wondering if it would be better to divide
it in about 10 indexes and search in it with MutliSearcher.
Would searching be faster this way?
The indexing would be faster I guess, as it is getting slower and
slower while indexes get bigger.
But searching?
Jean-Marie
> : When I search "hotliner:such" I get a 0 result. ("such" gets the same)
> : But when I search "hotliner:such*", I get the 277 expected results!
>
> (or treating it as a stop word)
You're right!
Unfortunately someone's name in my company is 'such' and you made me
realize that it's also a common
Hi Lucene community,
I'm facing a strange problem, that you'll probably understand as I'm
only a newbie to Lucene.
When I search "hotliner:such" I get a 0 result. ("such" gets the same)
But when I search "hotliner:such*", I get the 277 expected results!
Why is the first query not working?
Thanks
79 matches
Mail list logo