he.org/core/10_0_0/core/org/apache/lucene/index/DirectoryReader.html#open(org.apache.lucene.index.IndexCommit,int,java.util.Comparator)
> .
>
> So if you don't need adding, updating or deleting documents, this could be
> a fit.
>
> On Thu, Dec 19, 2024 at 1:43 PM Ian Lea wro
Hi
On trying to open an old index using lucene 10.0.0 I'm getting
this exception:
Exception in thread "main"
org.apache.lucene.index.IndexFormatTooOldException: Format version is not
supported (resource
BufferedChecksumIndexInput(MemorySegmentIndexInput(path="/whatever.../segments_3"))):
This ind
xact same options (index options, points dimensions, norms,
> doc values type, etc.) as already indexed documents that also have
> this field.
>
> However it's a bug that Lucene fails to open an index that was legal
> in Lucene 8. Can you file a JIRA issue?
>
> On Mon, Dec
Hi
We have a long-standing index with some mandatory fields and some optional
fields that has been through multiple lucene upgrades without a full
rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when
open an IndexWriter we are hitting the exception
Exception in thread "main"
A. Non-PMC.
--
Ian.
On Wed, Jun 17, 2020 at 1:28 PM jim ferenczi wrote:
> I vote option A (PMC vote)
>
> Le mer. 17 juin 2020 à 14:24, Felix Kirchner <
> felix.kirch...@uni-wuerzburg.de> a écrit :
>
> > A
> >
> > non-PMC
> >
> > Am 16.06.2020 um 00:08 schrieb Ryan Ernst:
> > > Dear Lucene an
What are the full package names for these interfaces? I don't think they
are org.apache.lucene.
--
Ian.
On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N
wrote:
> Hi,
>
> It's not about the file formats. Rather It is about LuceneInputFormat
> and LuceneOutputFormat interfaces which deals with
Looks like your screenshot didn't make it, but never mind: I'm sure we all
know what text files look like.
A join on two ID fields sounds more like SQL database territory rather than
lucene. Lucene is not an SQL database. But I typed "lucene join" into a
well known search engine and the top hit
index folder using java
> (File.listFiles()) it lists 1761 files in that folder. This count goes down
> to a double digit number when I restart the tomcat.
>
> Thanks for looking into it.
>
> --
> Regards
> -Siraj Haider
> (212) 306-0154
>
> -Original Mess
The most common cause is unclosed index readers. If you run lsof against
the tomcat process id and see that some deleted files are still open,
that's almost certainly the problem. Then all you have to do is track it
down in your code.
--
Ian.
On Thu, May 4, 2017 at 10:09 PM, Siraj Haider wro
not found in
> version 5.x
>
> Any suggestion to bypass that?
>
> Sorry for my bad English.
>
> 2017-02-17 19:40 GMT+08:00 Ian Lea :
> > Hi
> >
> >
> > SimpleAnalyzer uses LetterTokenizer which divides text at non-letters.
> > Your add and sea
Hi
SimpleAnalyzer uses LetterTokenizer which divides text at non-letters.
Your add and search methods use the analyzer but the delete method doesn't.
Replacing SimpleAnalyzer with KeywordAnalyzer in your program fixes it.
You'll need to make sure that your id field is left alone.
Good to see a
oal.search.ConstantScoreQuery?
"A query that wraps another query and simply returns a constant score equal
to the query boost for every document that matches the query. It therefore
simply strips of all scores and returns a constant one."
--
Ian.
On Mon, Jan 9, 2017 at 11:39 AM, Taher Galal
w
No, it implies that Lucene is a low level library that allows people like
you and me, application developers, to develop applications that meet our
business and technical needs.
Like you, most of the things I work with prefer documents where the search
terms are close together, often preferably in
Sounds to me like it's related to the index not having been closed properly
or still being updated or something. I'd worry about that.
--
Ian.
On Thu, Jun 16, 2016 at 11:19 AM, Mukul Ranjan wrote:
> Hi,
>
> I'm observing below exception while getting instance of indexWriter-
>
> java.lang.Ill
I'd definitely go for b). The index will of course be larger for every
extra bit of data you store but it doesn't sound like this would make much
difference. Likewise for speed of indexing.
--
Ian.
On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder wrote:
> Hi there,
> I would like to use Lucene
Would
http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/IndexReader.html#document(int,%20java.util.Set)
be what you are looking for?
--
Ian.
On Mon, May 16, 2016 at 1:39 PM, wrote:
> Hello,
>
> I am storing close to 100 fields in a single document which is being
> indexed. Ther
not provide his Solr config! :-) In any case, it would be
>> > > good to get the Analyzer + code you use while indexing and also the
>> > > code (+ Analyzer) that creates the query while searching.
>> > >
>> > > Uwe
>> > >
>> > > -
Hi
Can you provide a few examples of values of cpn that a) are and b) are
not being found, for indexing and searching.
You may also find some of the tips at
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
useful.
You haven't shown the code that create
= whatever(iw, data-source-2)
...
t1.start()
t2.start()
...
wait ...
iw.close()
--
Ian.
> On Wed, Sep 9, 2015 at 11:23 AM, Ian Lea wrote:
>
>> The link that I sent,
>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene,
>> not Solr. The seco
:
> Thanks a lot !
>
> But do you know some links that helps implement these optimization options
> without the Solr (using only lucene) ?
>
> I am using lucene 4.9.
>
> More thanks.
>
> Humberto
>
>
> On Wed, Sep 9, 2015 at 5:23 AM, Ian Lea wrote:
>
>
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Also double check that it's Lucene that you should be concentrating
on. In my experience it's often the reading of the data from a
database, if that's what you are doing, that is the bottleneck.
--
Ian.
On Wed, Sep 9, 2015 at 6:
>From a glance, you need to close the old reader after calling
openIfChanged if it gives you a new one.
See
https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader).
You may wish to pay attention to the words abo
Hi - I suggest you narrow the problem down to a small self-contained
example and if you still can't get it to work, show us the code. And
tell us what version of Lucene you are using.
--
Ian.
On Mon, Jun 1, 2015 at 5:20 PM, Rahul Kotecha
wrote:
> Hi All,
> I am trying to query an index.
>
> Is there a difference between using StoredField and using other types of
> fields with Field.Store.YES?
It will depend on what the other type of field is. As the javadoc for
Field states, the xxxField classes are sugar. If you are doing
standard things on standard data it's generally easier to
Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your
BooleanQuery? Seems more logical and I suspect would solve the problem.
Caching filters can be good too, depending on how often your data changes.
See CachingWrapperFilter.
--
Ian.
On Tue, Mar 10, 2015 at 12:45 PM, Chris Bamf
Take a look at the first section of
https://lucene.apache.org/core/4_10_3/MIGRATE.html. There's probably
something there that will help you.
--
Ian.
On Wed, Mar 11, 2015 at 11:03 AM, wangdong wrote:
> Can anybody help me?
>
>
>> I am confused about the api in lucene 4.10.3.
>>
>> I want to ge
I think if you follow the Field.fieldType().numericType() chain you'll
end up with INT or DOUBLE or whatever.
But if you know you stored it as an IntField then surely you already
know it's an integer? Unless you sometimes store different things in
the one field. I wouldn't do that.
--
Ian.
O
ery, and I want to make sure
> I match only index entries that do not have more than 2 tokens, is there a
> way to do that too?
>
> Thanks
>
> On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea wrote:
>
>> Break the query into words then add them as TermQuery instances as
>>
Break the query into words then add them as TermQuery instances as
optional clauses to a BooleanQuery with a call to
setMinimumNumberShouldMatch(2) somewhere along the line. You may want
to do some parsing or analysis on the query terms to avoid problems of
case matching and the like.
--
Ian.
t;
> What I am currently doing is duplicating the data into 2 different fields
> and having my own PerFieldAnalyzerWrapper just like you pointed out
>
> Is there a good way to do this in a single-pass? Like how Bi-Grams or
> Common-Grams do…
>
> --
> Ravi
>
> On Tue
Sounds like a job for
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
--
Ian.
On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
wrote:
> We have a requirement in that E-mail addresses need to be added in a
> tokenized form to one field while untokenized form is added to
:
> Thanks Ian for your help. But I didn't get aol search, what it is ? tried
> searching in google but couldn't find.
>
> Thanks
>
> On Fri, Feb 13, 2015 at 3:00 AM, Ian Lea wrote:
>
>> I think you can do it with 4 simple queries:
>>
>> 1) +flyi
I think you can do it with 4 simple queries:
1) +flying +shooting
2) +flying +fighting
etc.
or BooleanQuery equivalents with MUST clauses. Use
aol.search.TotalHitCountCollector and it should be blazingly fast,
even if you have more that 100 docs.
--
Ian.
On Thu, Feb 12, 2015 at 5:42 PM, Ma
HOULD logic and boosts and whatever else I wanted.
--
Ian.
On Wed, Feb 11, 2015 at 2:37 PM, Jon Stewart
wrote:
> Ok... so how does anyone ever use date-time queries in lucene with the
> new recommended way of using longs?
>
>
> Jon
>
>
> On Wed, Feb 11, 2015 at 9:26 A
s handed a
> field name and query components (e.g., "created", "2010-01-01",
> "2014-12-31"), which I can derive from, parse the timestamp strings,
> and then turn the whole thing into a numeric range query component?
>
>
> Jon
>
>
> On Wed, Feb
To the best of my knowledge you are spot on with everything you say,
except that the component to parse the strings doesn't exist. I
suspect that a contribution to add that to StandardQueryParser might
well be accepted.
--
Ian.
On Wed, Feb 11, 2015 at 4:21 AM, Jon Stewart
wrote:
> Hello,
>
>
If you only ever want to retrieve based on exact match you could index
the name field using org.apache.lucene.document.StringField. Do be
aware that it is exact: if you do nothing else, a search for "a" will
not match "A" or "A ".
Or you could so something with start and end markers e.g. index yo
gt; bquery.add(queryFieldA, BooleanClause.Occur.SHOULD);
> bquery.add(queryFieldB, BooleanClause.Occur.SHOULD);
>
> this is the correct way?
>
>
> Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr
> Von: "Ian Lea"
> An: java-user@lucene.apache.org
> Betreff: Re: combine to
org.apache.lucene.search.BooleanQuery.
--
Ian.
On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz wrote:
>
> Hi,
>
> i want to combine two MultiTermQueries.
>
> One searches over FieldA, one over FieldB. Both queries should be combined
> with "OR" operator.
>
> so in lucene Syntax i want to searc
che"
>
> Score : 1 :0.27094576
> 3 :0.27094576
> 2 :0.010494952
>
>
> If we go by query it is giving same score ..It is not working.
>
> Thanks
> Priyanka
>
>
> On Fri, Jan 23, 2015 at 3:19 PM, Ian Lea wrote:
>
>> How about "home~10 h
How about "home~10 house~10 flat". See
http://lucene.apache.org/core/4_10_3/queryparser/index.html
--
Ian.
On Fri, Jan 23, 2015 at 7:17 AM, Priyanka Tufchi
wrote:
> Hi ALL
>
> I am working on a project which uses lucene for searching . I am
> struggling with boolean based Query : Actual Scena
Are you asking if your two suggestions
1) a MultiPhraseQuery or
2) a BooleanQuery made up of multiple PhraseQuery instances
are equivalent? If so, I'd say that they could be if you build them
carefully enough. For the specific examples you show I'd say not and
would wonder if you get correct h
hose the force merge as alternative with less afford. Could
>>> forceMergeDeletes serve our purpose here?
>>
>> It could, but has the same problem like above. The only difference to
>> forceMerge is that it only merges segments which have deletions.
>>
>>>
Do you need to call forceMerge(1) at all? The javadoc, certainly for
recent versions of lucene, advises against it. What version of lucene
are you running?
It might be helpful to run lsof against the index directory
before/during/after the merge to see what files are coming or going,
or if there
How are you storing the id field? A wild guess might be that this
error might be caused by having some documents with id stored,
perhaps, as a StringField or TextField and some as an IntField.
--
Ian.
On Wed, Jan 14, 2015 at 2:07 PM, Sascha Janz wrote:
>
> hello,
>
> i am using lucene 4.6. i
Presumably no exception is thrown from the new IndexWriter() call?
I'd double check that, and try some harmless method call on the
writer and make sure that works. And run CheckIndex against the
index.
--
Ian.
On Tue, Jan 6, 2015 at 5:05 PM, Brian Call
wrote:
> Hi Tomoko,
>
> Thank you f
Hi
I can't give an exact answer to your question but my experience has
been that it's best to leave all the merge/buffer/etc settings alone.
If you are doing a bulk update of a large number of docs then it's no
surprise that you are seeing a heavy IO load. If you can, it's likely
to be worth giv
Telling us the version of lucene and the OS you're running on is
always a good idea.
A guess here is that you aren't closing index readers, so the JVM will
be holding on to deleted files until it exits.
A combination of du, ls, and lsof commands should prove it, or just
losf: run it against the j
Toronto != toronto. From the javadocs for StandardAnalyzer:
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter,
LowerCaseFilter does what you would expect.
--
Ian.
On Fri, Oct 3, 2014 at 3:52 AM, Xu Chu <1989ch...@gmail.com> wrote:
> Hi everyone
>
> In the followi
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers.
Personally I'd simply store the case-insensitive field with a call to
toLowerCase() on the value and equivalent on the search string.
You will of course use more storage, but you don't need to store the
text contents for bo
ause
> for the error.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Ian Lea [mailto:ian@gmail.com]
>> Sent: Wednesday, Se
Wed, Sep 10, 2014 at 7:01 AM, Ian Lea wrote:
>> Hi
>>
>>
>> On running a quick test after a handful of minor code changes to deal
>> with 4.10 deprecations, a program that updates an existing index
>> failed with
>>
>> Exception in thread "main&qu
Hi
On running a quick test after a handful of minor code changes to deal
with 4.10 deprecations, a program that updates an existing index
failed with
Exception in thread "main" java.lang.IllegalStateException: cannot
write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)
at org.apache.luc
You tell it what you want. See the javadocs for
org.apache.lucene.document.Field and friends such as TextField.
--
Ian.
On Mon, Aug 4, 2014 at 2:43 PM, Sachin Kulkarni wrote:
> Hi,
>
> I am using lucene 4.6.0 to index a dataset.
> I have the following fields:
> doctitle, docbody, docname, doc
Retrieving stored data is always likely to take longer than not doing
so. There are some tips in
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
But taking over a minute to retrieve data for 50 hits sounds
excessive. Are you sure about those figures?
--
Ian.
On Thu, Jul 31, 2014 at
;
>
> // --
>
> TopDocs results = searcher.search(Searchedquery, 10);
> ScoreDoc[] hits = results.scoreDocs;
>
>
> for (int i = 0; i < hits.length; ++i) {
> int docId = hits[i].doc; //
>
> Document d = searcher.doc(docId);
> int sys_DocID=d.get("DocID");
You need to supply more info. Tell us what version of lucene you are
using and provide a very small completely self-contained example or
test case showing exactly what you expect to happen and what is
happening instead.
--
Ian.
On Fri, Jul 18, 2014 at 11:50 AM, Rajendra Rao
wrote:
> Hello
>
>
Probably because something in the analysis chain is removing the
hyphen. Check out the javadocs. Generally you should also make sure
you use the same analyzer at index and search time.
--
Ian.
On Fri, Jul 18, 2014 at 6:52 AM, itisismail wrote:
> Hi I have created index with 1 field with simp
Might be able to do it with some combination of SpanNearQuery, with
suitable values for slop and inOrder, combined into a BooleanQuery
with setMinimumNumberShouldMatch = number of SpanNearQuery instances -
1.
So, making this up as I go along, you'd have
SpanNearQuery sn1 = B after A, slop 0, in o
There's no magic to it - just build a query or six and fire them at
your newly opened reader. If you want to put the effort in you could
track recent queries and use them, or make sure you warm up searches
on particular fields. Likewise, if you use Lucene's sorting and/or
filters, it might be wor
It's more likely to be a demonstration that concurrent programming is
hard, results often hard to predict and debugging very hard.
Or perhaps you simply need to add acceptsDocsOutOfOrder() to your
collector, returning false.
Either way, hard to see any evidence of a thread-safety problem in lucen
Read the javadocs to understand the difference between commit() and
flush(). You need commit(), or close().
There are no hard and fast rules and it depends on how much data you
are indexing, how fast, how many searches you're getting and how up to
date they need to be. And how much you worry abo
The migration guide that came out with 4.0 is probably the best place to start.
http://lucene.apache.org/core/4_8_1/MIGRATE.html is from the current
release but probably hasn't changed since 4.0. There's also the
changes file with every release. And if you browse the list archives
I expect you'l
The one that meets your requirements most easily will be the best.
If people will want to search for words in particular fields you'll
need to split it but if they only ever want to search across all
fields there's no point.
A common requirement is to want both, in which case you can split it
and
ger
> class? In Lucene 4.5 I cannot find the class (missing a maven dependency?).
> Can anyone point me to a working example?
>
> Cheers,
>
> Klaus
>
>
>
> On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea wrote:
>
>> You will indeed get poor performance if you commi
You'll have to reindex.
--
Ian.
On Mon, Jan 6, 2014 at 2:11 PM, manoj raj wrote:
> Hi,
>
> I have stored fields. I want to delete a single field in all documents. Can
> i do that without reindexing? if yes, is it costly operations..?
>
>
> Thanks,
> Manoj.
You will indeed get poor performance if you commit for every doc. Can
you compromise and commit every, say, 1000 docs, or once every few
minutes, or whatever makes sense for your app.
Or look at lucene's near-real-time search features. Google "Lucene
NRT" for info.
Or use Elastic Search.
--
I
How do you know it's not working? My favourite suggestion: post a
very small self-contained RAMDirectory based program or test case, or
maybe 2 in this case, for 3.6 and 4.3, that demonstrates the problem.
--
Ian.
On Fri, Nov 29, 2013 at 6:00 AM, VIGNESH S wrote:
> Hi,
>
> I try deleting the
Pasting that line into a chunk of code works fine for me, with 4.5
rather than 4.3 but I don't expect that matters. Have you got a) all
the right jars in your classpath and b) none of the wrong jars?
--
Ian.
On Wed, Nov 13, 2013 at 11:20 AM, Hang Mang wrote:
> Hi guys,
>
> I'm using Lucene 4.3
he doc does not give that exception.
> However, I'm still not sure what went wrong in using the other constructor
> for TextField...
>
> Thanks
>
> PS: Sorry about that, didn't realize that while posting :( . Updated the
> message subject now.
>
>
> On
Have you set an analyzer when you create your IndexWriter?
--
Ian.
P.S. Please start new questions in new messages with sensible subjects.
On Mon, Nov 11, 2013 at 9:00 AM, Rohit Girdhar wrote:
> Hi
>
> I was trying to use the lucene JAVA API to create an index. I am repeatedly
> getting Null
Boosting query clauses means more "this clause is more important than
that clause" rather than "make the score for this search higher". I
use it for biblio searching when want to search across multiple fields
and want matches in titles to be more important than matches in
blurbs.. Amended version
combinations of filter/query construction.
>
> On Oct 11, 2013, at 7:33 AM, Ian Lea wrote:
>
>> Are you going to be caching and reusing the filters e.g. by
>> CachingWrapperFilter? The main benefit of filters is in reuse. It
>> takes time to build them in the first plac
If you're using Solr you'd be better off asking this on the Solr list:
http://lucene.apache.org/solr/discussion.html.
You might also like to clarify what you want with regard to sentence
vs document. If you want to display the sentences of a matched doc,
surely you just do it: store what you need
If you want to keep hyphens you could try WhitespaceAnalyzer. But
that may of course have knock on effects on other searches. Don't
forget to use the same analyzer for indexing and searching, unless
you're doing clever things.
An alternative is to create the queries directly in code, but you'll
m.out.println("total no of docs " + topDocs5.totalHits);
>
> }
>
> }
>
>
> I observed that the file path seperator that i am using in the field and
> lucene escape charater seem to be same. so whenever i am using a escape
> character in the query the search
other fields
> and not working on "filePath" field.
>
> TIA,
> Nischal Y
>
>
> On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea wrote:
>
>> Do some googling on leading wildcards and read things like
>> http://www.gossamer-threads.com/lists/lucene/java-user/17573
Do some googling on leading wildcards and read things like
http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
an option you like.
--
Ian.
On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
wrote:
> Hi,
>
> I have problem with doing wild card search on file path fields.
>
> i ha
I'd start with the simple approach of a stored field and only worry
about performance if you needed to. Field caching would likely help
if you did need to.
--
Ian.
On Mon, Oct 14, 2013 at 2:04 AM, Stephen GRAY wrote:
> UNOFFICIAL
> Hi everyone,
>
> I'd appreciate some help with a problem I'm
With multiple fields of the same name vs a single field I doubt you'd
be able to tell the difference in performance or matching or scoring
in normal use. There may be some matching/ranking effect if you are
looking at, say, span queries across the multiple fields.
Try it out and see what happens.
Are you going to be caching and reusing the filters e.g. by
CachingWrapperFilter? The main benefit of filters is in reuse. It
takes time to build them in the first place, likely roughly equivalent
to running the underlying query although with variations as you
describe. Or are you saying that qu
Looks like you can achieve most of what you want by using AND rather
than OR. I think that all the should/should not examples you give
will work if you use AND on your content field.
For ordering, I suggest you look at SpanNearQuery. That can consider
order and slop, the distance between the sea
Looks like you've got some XML processing in there somewhere. Nothing
to do with lucene. This code:
public static void main(String[] _args) throws Exception {
QueryParser qp = new QueryParser(Version.LUCENE_44,
"x",
new StandardAnalyzer(Version.LUCENE_44));
for (String s : _args) {
System
();
t.test(_args[0], _args[1]);
}
}
On Thu, Oct 3, 2013 at 4:10 PM, VIGNESH S wrote:
> Hi,
>
> sorry.. thats my typo..
>
> Its not failing because of that
>
>
> On Thu, Oct 3, 2013 at 8:17 PM, Ian Lea wrote:
>
>> Are you sure it's not failing because
Are you sure it's not failing because "adhoc" != "ad-hoc"?
--
Ian.
On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S wrote:
> Hi,
>
> I am Trying to do Multiphrase Query in Lucene 4.3. It is working Perfect
> for all scenarios except the below scenario.
> When I try to Search for a phrase which is pre
I'd write a shutdown method that calls close() in a controlled manner
and invoke it at 23:55. You could also call commit() at whatever
interval makes sense to you but if you carried on killing the JVM
you'd still be liable to lose any docs indexed since the last commit.
This is standard stuff jus
wrote:
> Ian,
> Thanks for your reply..
> I am facing the same problem if i use whiteSpaceTokenizer also.
> My analyzer works perfect in case of Lucene 3.6.
>
> Thanks and Regards
> Vignesh Srinivasan
>
> On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea wrote:
>
>> Cer
t should preserve.
>>
>> I created my analyzer with tokenizer which returns
>> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
>> My analyzer will use a lowe case filter on top of the tokenizer.This Woks
>> Perfect in case of 3.6
>> In 4.3 it is creating p
Yes, as I suggested, you could search on your unique id and not index
if already present. Or, as Uwe suggested, call updateDocument instead
of add, again using the unique id.
--
Ian.
On Tue, Oct 1, 2013 at 6:41 PM, gudiseashok wrote:
> I am really sorry if something made you confuse, as I sai
I'm still a bit confused about exactly what you're indexing, when, but
if you have a unique id and don't want to add or update a doc that's
already present, add the unique id to the index and search (TermQuery
probably) for each one and skip if already present.
Can't you change the log rotation/co
milliseconds as unique keys are a bad idea unless you are 100% certain
you'll never be creating 2 docs in the same millisecond. And are you
saying the log record A1 from file a.log indexed at 14:00 will have
the same unique id as the same record from the same file indexed at
14:30 or will it be di
I'm not aware of a lucene rather than Solr or whatever tutorial. A
search for something like "lucene sharding" will get hits.
Why don't you want to use Solr or Katta or similar? They've already
done much of the hard work.
How much data are you talking about?
What are your master-master require
fix.add(new Term("content",
> s));
> } else {
> break;
> }
> }
> while (trm.next() != null);
> }
>
>
>
> On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea wr
the same logic and it is working.
>> >
>> > In Lucene 4.3,I implemented the Index for that using
>> >
>> > FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>> >
>> >
>>
>> offsetsTyp
Is this OOM happening as part of your early morning optimize or at
some other point? By optimize do you mean IndexWriter.forceMerge(1)?
You really shouldn't have to use that. If the index grows forever
without it then something else is going on which you might wish to
report separately.
--
Ian.
I use the code below to do something like this. Not exactly what you
want but should be easy to adapt.
public List findTerms(IndexReader _reader,
String _field) throws IOException {
List l = new ArrayList();
Fields ff = MultiFields.getFields(_reader);
Terms tr
.
Maybe try storing this field without analysis, or just with something
simple like downcasing, and searching with a PrefixQuery? I think
that would work.
--
Ian.
On Fri, Sep 20, 2013 at 1:48 PM, Ramprakash Ramamoorthy
wrote:
> On Fri, Sep 20, 2013 at 6:11 PM, Ian Lea wrote:
>
>> It
It's reasonable that "block-major" won't find anything.
"block-major-57" should match.
The split into block and major-57 will be because, from the javadocs
for ClassicTokenizer, "Splits words at hyphens, unless there's a
number in the token, in which case the whole token is interpreted as a
produc
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper in
analyzers-common is what you need. There's an example in the
javadocs. Build and use the wrapper instance in place of
StandardAnalyzer or whatever you are using now.
--
Ian.
On Mon, Sep 16, 2013 at 5:36 PM, Scott Smith wrote
Not exactly dumb, and I can't tell you exactly what is happening here,
but lucene stores some info at the index level rather than the field
level, and things can get confusing if you don't use the same Field
definition consistently for a field.
>From the javadocs for org.apache.lucene.document.Fie
1 - 100 of 984 matches
Mail list logo