Re: IndexFormatTooOldException

2024-12-19 Thread Ian Lea
he.org/core/10_0_0/core/org/apache/lucene/index/DirectoryReader.html#open(org.apache.lucene.index.IndexCommit,int,java.util.Comparator) > . > > So if you don't need adding, updating or deleting documents, this could be > a fit. > > On Thu, Dec 19, 2024 at 1:43 PM Ian Lea wro

IndexFormatTooOldException

2024-12-19 Thread Ian Lea
Hi On trying to open an old index using lucene 10.0.0 I'm getting this exception: Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MemorySegmentIndexInput(path="/whatever.../segments_3"))): This ind

Re: Lucene 9.0.0 inconsistent index options

2021-12-14 Thread Ian Lea
xact same options (index options, points dimensions, norms, > doc values type, etc.) as already indexed documents that also have > this field. > > However it's a bug that Lucene fails to open an index that was legal > in Lucene 8. Can you file a JIRA issue? > > On Mon, Dec

Lucene 9.0.0 inconsistent index options

2021-12-13 Thread Ian Lea
Hi We have a long-standing index with some mandatory fields and some optional fields that has been through multiple lucene upgrades without a full rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when open an IndexWriter we are hitting the exception Exception in thread "main"

Re: [VOTE] Lucene logo contest

2020-06-18 Thread Ian Lea
A. Non-PMC. -- Ian. On Wed, Jun 17, 2020 at 1:28 PM jim ferenczi wrote: > I vote option A (PMC vote) > > Le mer. 17 juin 2020 à 14:24, Felix Kirchner < > felix.kirch...@uni-wuerzburg.de> a écrit : > > > A > > > > non-PMC > > > > Am 16.06.2020 um 00:08 schrieb Ryan Ernst: > > > Dear Lucene an

Re: lucene Input and Output format

2017-08-02 Thread Ian Lea
What are the full package names for these interfaces? I don't think they are org.apache.lucene. -- Ian. On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N wrote: > Hi, > > It's not about the file formats. Rather It is about LuceneInputFormat > and LuceneOutputFormat interfaces which deals with

Re: join on two txt files data using apache lucene

2017-07-14 Thread Ian Lea
Looks like your screenshot didn't make it, but never mind: I'm sure we all know what text files look like. A join on two ID fields sounds more like SQL database territory rather than lucene. Lucene is not an SQL database. But I typed "lucene join" into a well known search engine and the top hit

Re: Un-used index files are not getting released

2017-05-09 Thread Ian Lea
index folder using java > (File.listFiles()) it lists 1761 files in that folder. This count goes down > to a double digit number when I restart the tomcat. > > Thanks for looking into it. > > -- > Regards > -Siraj Haider > (212) 306-0154 > > -Original Mess

Re: Un-used index files are not getting released

2017-05-05 Thread Ian Lea
The most common cause is unclosed index readers. If you run lsof against the tomcat process id and see that some deleted files are still open, that's almost certainly the problem. Then all you have to do is track it down in your code. -- Ian. On Thu, May 4, 2017 at 10:09 PM, Siraj Haider wro

Re: unable to delete document via the IndexWriter.deleteDocuments(term) method

2017-02-17 Thread Ian Lea
not found in > version 5.x > > Any suggestion to bypass that? > > Sorry for my bad English. > > 2017-02-17 19:40 GMT+08:00 Ian Lea : > > Hi > > > > > > SimpleAnalyzer uses LetterTokenizer which divides text at non-letters. > > Your add and sea

Re: unable to delete document via the IndexWriter.deleteDocuments(term) method

2017-02-17 Thread Ian Lea
Hi SimpleAnalyzer uses LetterTokenizer which divides text at non-letters. Your add and search methods use the analyzer but the delete method doesn't. Replacing SimpleAnalyzer with KeywordAnalyzer in your program fixes it. You'll need to make sure that your id field is left alone. Good to see a

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Ian Lea
oal.search.ConstantScoreQuery? "A query that wraps another query and simply returns a constant score equal to the query boost for every document that matches the query. It therefore simply strips of all scores and returns a constant one." -- Ian. On Mon, Jan 9, 2017 at 11:39 AM, Taher Galal w

Re: Favoring Terms Occurring in Close Proximity

2016-06-27 Thread Ian Lea
No, it implies that Lucene is a low level library that allows people like you and me, application developers, to develop applications that meet our business and technical needs. Like you, most of the things I work with prefer documents where the search terms are close together, often preferably in

Re: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Ian Lea
Sounds to me like it's related to the index not having been closed properly or still being updated or something. I'd worry about that. -- Ian. On Thu, Jun 16, 2016 at 11:19 AM, Mukul Ranjan wrote: > Hi, > > I'm observing below exception while getting instance of indexWriter- > > java.lang.Ill

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Ian Lea
I'd definitely go for b). The index will of course be larger for every extra bit of data you store but it doesn't sound like this would make much difference. Likewise for speed of indexing. -- Ian. On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder wrote: > Hi there, > I would like to use Lucene

Re: Selective Output fields in Search Result. Lucene 5.5.0

2016-05-16 Thread Ian Lea
Would http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/IndexReader.html#document(int,%20java.util.Set) be what you are looking for? -- Ian. On Mon, May 16, 2016 at 1:39 PM, wrote: > Hello, > > I am storing close to 100 fields in a single document which is being > indexed. Ther

Re: Need help in alphanumeric search

2015-10-01 Thread Ian Lea
not provide his Solr config! :-) In any case, it would be >> > > good to get the Analyzer + code you use while indexing and also the >> > > code (+ Analyzer) that creates the query while searching. >> > > >> > > Uwe >> > > >> > > -

Re: Need help in alphanumeric search

2015-09-28 Thread Ian Lea
Hi Can you provide a few examples of values of cpn that a) are and b) are not being found, for indexing and searching. You may also find some of the tips at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F useful. You haven't shown the code that create

Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
= whatever(iw, data-source-2) ... t1.start() t2.start() ... wait ... iw.close() -- Ian. > On Wed, Sep 9, 2015 at 11:23 AM, Ian Lea wrote: > >> The link that I sent, >> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene, >> not Solr. The seco

Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
: > Thanks a lot ! > > But do you know some links that helps implement these optimization options > without the Solr (using only lucene) ? > > I am using lucene 4.9. > > More thanks. > > Humberto > > > On Wed, Sep 9, 2015 at 5:23 AM, Ian Lea wrote: > >

Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Also double check that it's Lucene that you should be concentrating on. In my experience it's often the reading of the data from a database, if that's what you are doing, that is the bottleneck. -- Ian. On Wed, Sep 9, 2015 at 6:

Re: IndexWriter is not closing the FDs (deleted files)

2015-09-01 Thread Ian Lea
>From a glance, you need to close the old reader after calling openIfChanged if it gives you a new one. See https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader). You may wish to pay attention to the words abo

Re: IndexReader returns all fields, but IndexSearcher does not

2015-06-02 Thread Ian Lea
Hi - I suggest you narrow the problem down to a small self-contained example and if you still can't get it to work, show us the code. And tell us what version of Lucene you are using. -- Ian. On Mon, Jun 1, 2015 at 5:20 PM, Rahul Kotecha wrote: > Hi All, > I am trying to query an index. >

Re: Difference between StoredField vs Other Fields with Field.Store.YES

2015-03-11 Thread Ian Lea
> Is there a difference between using StoredField and using other types of > fields with Field.Store.YES? It will depend on what the other type of field is. As the javadoc for Field states, the xxxField classes are sugar. If you are doing standard things on standard data it's generally easier to

Re: Filtering question

2015-03-11 Thread Ian Lea
Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your BooleanQuery? Seems more logical and I suspect would solve the problem. Caching filters can be good too, depending on how often your data changes. See CachingWrapperFilter. -- Ian. On Tue, Mar 10, 2015 at 12:45 PM, Chris Bamf

Re: get the DocsEnum in lucene4.10.3

2015-03-11 Thread Ian Lea
Take a look at the first section of https://lucene.apache.org/core/4_10_3/MIGRATE.html. There's probably something there that will help you. -- Ian. On Wed, Mar 11, 2015 at 11:03 AM, wangdong wrote: > Can anybody help me? > > >> I am confused about the api in lucene 4.10.3. >> >> I want to ge

Re: Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Ian Lea
I think if you follow the Field.fieldType().numericType() chain you'll end up with INT or DOUBLE or whatever. But if you know you stored it as an IntField then surely you already know it's an integer? Unless you sometimes store different things in the one field. I wouldn't do that. -- Ian. O

Re: Indexing Query

2015-02-18 Thread Ian Lea
ery, and I want to make sure > I match only index entries that do not have more than 2 tokens, is there a > way to do that too? > > Thanks > > On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea wrote: > >> Break the query into words then add them as TermQuery instances as >>

Re: Indexing Query

2015-02-17 Thread Ian Lea
Break the query into words then add them as TermQuery instances as optional clauses to a BooleanQuery with a call to setMinimumNumberShouldMatch(2) somewhere along the line. You may want to do some parsing or analysis on the query terms to avoid problems of case matching and the like. -- Ian.

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea
t; > What I am currently doing is duplicating the data into 2 different fields > and having my own PerFieldAnalyzerWrapper just like you pointed out > > Is there a good way to do this in a single-pass? Like how Bi-Grams or > Common-Grams do… > > -- > Ravi > > On Tue

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea
Sounds like a job for org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. -- Ian. On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan wrote: > We have a requirement in that E-mail addresses need to be added in a > tokenized form to one field while untokenized form is added to

Re: occurrence of two terms with the highest frequency

2015-02-13 Thread Ian Lea
: > Thanks Ian for your help. But I didn't get aol search, what it is ? tried > searching in google but couldn't find. > > Thanks > > On Fri, Feb 13, 2015 at 3:00 AM, Ian Lea wrote: > >> I think you can do it with 4 simple queries: >> >> 1) +flyi

Re: occurrence of two terms with the highest frequency

2015-02-12 Thread Ian Lea
I think you can do it with 4 simple queries: 1) +flying +shooting 2) +flying +fighting etc. or BooleanQuery equivalents with MUST clauses. Use aol.search.TotalHitCountCollector and it should be blazingly fast, even if you have more that 100 docs. -- Ian. On Thu, Feb 12, 2015 at 5:42 PM, Ma

Re: StandardQueryParser with date/time fields stored as longs

2015-02-11 Thread Ian Lea
HOULD logic and boosts and whatever else I wanted. -- Ian. On Wed, Feb 11, 2015 at 2:37 PM, Jon Stewart wrote: > Ok... so how does anyone ever use date-time queries in lucene with the > new recommended way of using longs? > > > Jon > > > On Wed, Feb 11, 2015 at 9:26 A

Re: StandardQueryParser with date/time fields stored as longs

2015-02-11 Thread Ian Lea
s handed a > field name and query components (e.g., "created", "2010-01-01", > "2014-12-31"), which I can derive from, parse the timestamp strings, > and then turn the whole thing into a numeric range query component? > > > Jon > > > On Wed, Feb

Re: StandardQueryParser with date/time fields stored as longs

2015-02-11 Thread Ian Lea
To the best of my knowledge you are spot on with everything you say, except that the component to parse the strings doesn't exist. I suspect that a contribution to add that to StandardQueryParser might well be accepted. -- Ian. On Wed, Feb 11, 2015 at 4:21 AM, Jon Stewart wrote: > Hello, > >

Re: search on a field by a single word

2015-02-11 Thread Ian Lea
If you only ever want to retrieve based on exact match you could index the name field using org.apache.lucene.document.StringField. Do be aware that it is exact: if you do nothing else, a search for "a" will not match "A" or "A ". Or you could so something with start and end markers e.g. index yo

Re: Re: combine to MultiTermQuery with OR

2015-02-10 Thread Ian Lea
gt; bquery.add(queryFieldA, BooleanClause.Occur.SHOULD); > bquery.add(queryFieldB, BooleanClause.Occur.SHOULD); > > this is the correct way? > > > Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr > Von: "Ian Lea" > An: java-user@lucene.apache.org > Betreff: Re: combine to

Re: combine to MultiTermQuery with OR

2015-02-10 Thread Ian Lea
org.apache.lucene.search.BooleanQuery. -- Ian. On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz wrote: > > Hi, > > i want to combine two MultiTermQueries. > > One searches over FieldA, one over FieldB. Both queries should be combined > with "OR" operator. > > so in lucene Syntax i want to searc

Re: Boolean Search Query is not workng

2015-01-23 Thread Ian Lea
che" > > Score : 1 :0.27094576 > 3 :0.27094576 > 2 :0.010494952 > > > If we go by query it is giving same score ..It is not working. > > Thanks > Priyanka > > > On Fri, Jan 23, 2015 at 3:19 PM, Ian Lea wrote: > >> How about "home~10 h

Re: Boolean Search Query is not workng

2015-01-23 Thread Ian Lea
How about "home~10 house~10 flat". See http://lucene.apache.org/core/4_10_3/queryparser/index.html -- Ian. On Fri, Jan 23, 2015 at 7:17 AM, Priyanka Tufchi wrote: > Hi ALL > > I am working on a project which uses lucene for searching . I am > struggling with boolean based Query : Actual Scena

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread Ian Lea
Are you asking if your two suggestions 1) a MultiPhraseQuery or 2) a BooleanQuery made up of multiple PhraseQuery instances are equivalent? If so, I'd say that they could be if you build them carefully enough. For the specific examples you show I'd say not and would wonder if you get correct h

Re: forceMerge(1) grows index and does not shrink back

2015-01-20 Thread Ian Lea
hose the force merge as alternative with less afford. Could >>> forceMergeDeletes serve our purpose here? >> >> It could, but has the same problem like above. The only difference to >> forceMerge is that it only merges segments which have deletions. >> >>>

Re: forceMerge(1) grows index and does not shrink back

2015-01-19 Thread Ian Lea
Do you need to call forceMerge(1) at all? The javadoc, certainly for recent versions of lucene, advises against it. What version of lucene are you running? It might be helpful to run lsof against the index directory before/during/after the merge to see what files are coming or going, or if there

Re: trouble with Collector and FieldCache

2015-01-15 Thread Ian Lea
How are you storing the id field? A wild guess might be that this error might be caused by having some documents with id stored, perhaps, as a StringField or TextField and some as an IntField. -- Ian. On Wed, Jan 14, 2015 at 2:07 PM, Sascha Janz wrote: > > hello, > > i am using lucene 4.6. i

Re: AlreadyClosedException on new index

2015-01-06 Thread Ian Lea
Presumably no exception is thrown from the new IndexWriter() call? I'd double check that, and try some harmless method call on the writer and make sure that works. And run CheckIndex against the index. -- Ian. On Tue, Jan 6, 2015 at 5:05 PM, Brian Call wrote: > Hi Tomoko, > > Thank you f

Re: "batch-update"-pattern, NoMergeScheduler?

2014-12-23 Thread Ian Lea
Hi I can't give an exact answer to your question but my experience has been that it's best to leave all the merge/buffer/etc settings alone. If you are doing a bulk update of a large number of docs then it's no surprise that you are seeing a heavy IO load. If you can, it's likely to be worth giv

Re: Index keeps growing, then shrinks on restart

2014-11-11 Thread Ian Lea
Telling us the version of lucene and the OS you're running on is always a good idea. A guess here is that you aren't closing index readers, so the JVM will be holding on to deleted files until it exits. A combination of du, ls, and lsof commands should prove it, or just losf: run it against the j

Re: SpanTermQuery Issue

2014-10-03 Thread Ian Lea
Toronto != toronto. From the javadocs for StandardAnalyzer: Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, LowerCaseFilter does what you would expect. -- Ian. On Fri, Oct 3, 2014 at 3:52 AM, Xu Chu <1989ch...@gmail.com> wrote: > Hi everyone > > In the followi

Re: Case sensitivity

2014-09-19 Thread Ian Lea
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers. Personally I'd simply store the case-insensitive field with a call to toLowerCase() on the value and equivalent on the search string. You will of course use more storage, but you don't need to store the text contents for bo

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
ause > for the error. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Ian Lea [mailto:ian@gmail.com] >> Sent: Wednesday, Se

Re: 4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
Wed, Sep 10, 2014 at 7:01 AM, Ian Lea wrote: >> Hi >> >> >> On running a quick test after a handful of minor code changes to deal >> with 4.10 deprecations, a program that updates an existing index >> failed with >> >> Exception in thread "main&qu

4.10.0: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)

2014-09-10 Thread Ian Lea
Hi On running a quick test after a handful of minor code changes to deal with 4.10 deprecations, a program that updates an existing index failed with Exception in thread "main" java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at org.apache.luc

Re: How does Lucene decide which fields to index?

2014-08-04 Thread Ian Lea
You tell it what you want. See the javadocs for org.apache.lucene.document.Field and friends such as TextField. -- Ian. On Mon, Aug 4, 2014 at 2:43 PM, Sachin Kulkarni wrote: > Hi, > > I am using lucene 4.6.0 to index a dataset. > I have the following fields: > doctitle, docbody, docname, doc

Re: Fetching stored data takes more time

2014-08-04 Thread Ian Lea
Retrieving stored data is always likely to take longer than not doing so. There are some tips in http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. But taking over a minute to retrieve data for 50 hits sounds excessive. Are you sure about those figures? -- Ian. On Thu, Jul 31, 2014 at

Re: More Like This query is not working.

2014-07-21 Thread Ian Lea
; > > // -- > > TopDocs results = searcher.search(Searchedquery, 10); > ScoreDoc[] hits = results.scoreDocs; > > > for (int i = 0; i < hits.length; ++i) { > int docId = hits[i].doc; // > > Document d = searcher.doc(docId); > int sys_DocID=d.get("DocID");

Re: More Like This query is not working.

2014-07-18 Thread Ian Lea
You need to supply more info. Tell us what version of lucene you are using and provide a very small completely self-contained example or test case showing exactly what you expect to happen and what is happening instead. -- Ian. On Fri, Jul 18, 2014 at 11:50 AM, Rajendra Rao wrote: > Hello > >

Re: Lucene Query Wrong Result for phrase.

2014-07-18 Thread Ian Lea
Probably because something in the analysis chain is removing the hyphen. Check out the javadocs. Generally you should also make sure you use the same analyzer at index and search time. -- Ian. On Fri, Jul 18, 2014 at 6:52 AM, itisismail wrote: > Hi I have created index with 1 field with simp

Re: Can phrasequery allow mismatch?

2014-07-17 Thread Ian Lea
Might be able to do it with some combination of SpanNearQuery, with suitable values for slop and inOrder, combined into a BooleanQuery with setMinimumNumberShouldMatch = number of SpanNearQuery instances - 1. So, making this up as I go along, you'd have SpanNearQuery sn1 = B after A, slop 0, in o

Re: Warm up IndexReader

2014-07-14 Thread Ian Lea
There's no magic to it - just build a query or six and fire them at your newly opened reader. If you want to put the effort in you could track recent queries and use them, or make sure you warm up searches on particular fields. Likewise, if you use Lucene's sorting and/or filters, it might be wor

Re: IndexSearcher.doc thread safe problem

2014-07-09 Thread Ian Lea
It's more likely to be a demonstration that concurrent programming is hard, results often hard to predict and debugging very hard. Or perhaps you simply need to add acceptsDocsOutOfOrder() to your collector, returning false. Either way, hard to see any evidence of a thread-safety problem in lucen

Re: re-use IndexWriter

2014-07-08 Thread Ian Lea
Read the javadocs to understand the difference between commit() and flush(). You need commit(), or close(). There are no hard and fast rules and it depends on how much data you are indexing, how fast, how many searches you're getting and how up to date they need to be. And how much you worry abo

Re: Lucene Upgrade from 2.9.x to 4.7.x

2014-05-29 Thread Ian Lea
The migration guide that came out with 4.0 is probably the best place to start. http://lucene.apache.org/core/4_8_1/MIGRATE.html is from the current release but probably hasn't changed since 4.0. There's also the changes file with every release. And if you browse the list archives I expect you'l

Re: Which is better ,Search through query and whole text document or search through query with document field.

2014-02-13 Thread Ian Lea
The one that meets your requirements most easily will be the best. If people will want to search for words in particular fields you'll need to split it but if they only ever want to search across all fields there's no point. A common requirement is to want both, in which case you can split it and

Re: Slow Index Writes

2014-01-07 Thread Ian Lea
ger > class? In Lucene 4.5 I cannot find the class (missing a maven dependency?). > Can anyone point me to a working example? > > Cheers, > > Klaus > > > > On Fri, Jan 3, 2014 at 11:49 AM, Ian Lea wrote: > >> You will indeed get poor performance if you commi

Re: Delete a field in old documents

2014-01-07 Thread Ian Lea
You'll have to reindex. -- Ian. On Mon, Jan 6, 2014 at 2:11 PM, manoj raj wrote: > Hi, > > I have stored fields. I want to delete a single field in all documents. Can > i do that without reindexing? if yes, is it costly operations..? > > > Thanks, > Manoj.

Re: Slow Index Writes

2014-01-03 Thread Ian Lea
You will indeed get poor performance if you commit for every doc. Can you compromise and commit every, say, 1000 docs, or once every few minutes, or whatever makes sense for your app. Or look at lucene's near-real-time search features. Google "Lucene NRT" for info. Or use Elastic Search. -- I

Re: Deletion of Index not happening in Lucene 4.3

2013-11-29 Thread Ian Lea
How do you know it's not working? My favourite suggestion: post a very small self-contained RAMDirectory based program or test case, or maybe 2 in this case, for 3.6 and 4.3, that demonstrates the problem. -- Ian. On Fri, Nov 29, 2013 at 6:00 AM, VIGNESH S wrote: > Hi, > > I try deleting the

Re: java.lang.NoSuchFieldError: STOP_WORDS_SET

2013-11-13 Thread Ian Lea
Pasting that line into a chunk of code works fine for me, with 4.5 rather than 4.3 but I don't expect that matters. Have you got a) all the right jars in your classpath and b) none of the wrong jars? -- Ian. On Wed, Nov 13, 2013 at 11:20 AM, Hang Mang wrote: > Hi guys, > > I'm using Lucene 4.3

Re: IndexWriter.addDocument() gives NullPointerException when used with a doc containing TextField

2013-11-11 Thread Ian Lea
he doc does not give that exception. > However, I'm still not sure what went wrong in using the other constructor > for TextField... > > Thanks > > PS: Sorry about that, didn't realize that while posting :( . Updated the > message subject now. > > > On

Re: subscribe

2013-11-11 Thread Ian Lea
Have you set an analyzer when you create your IndexWriter? -- Ian. P.S. Please start new questions in new messages with sensible subjects. On Mon, Nov 11, 2013 at 9:00 AM, Rohit Girdhar wrote: > Hi > > I was trying to use the lucene JAVA API to create an index. I am repeatedly > getting Null

Re: PhraseQuery boost doesn't affect ScoreDoc.score

2013-10-17 Thread Ian Lea
Boosting query clauses means more "this clause is more important than that clause" rather than "make the score for this search higher". I use it for biblio searching when want to search across multiple fields and want matches in titles to be more important than matches in blurbs.. Amended version

Re: Optimizing Filters

2013-10-17 Thread Ian Lea
combinations of filter/query construction. > > On Oct 11, 2013, at 7:33 AM, Ian Lea wrote: > >> Are you going to be caching and reusing the filters e.g. by >> CachingWrapperFilter? The main benefit of filters is in reuse. It >> takes time to build them in the first plac

Re: Search sentence from document based on keyword as input using lucene

2013-10-17 Thread Ian Lea
If you're using Solr you'd be better off asking this on the Solr list: http://lucene.apache.org/solr/discussion.html. You might also like to clarify what you want with regard to sentence vs document. If you want to display the sentences of a matched doc, surely you just do it: store what you need

Re: QueryParser stripping off Hyphen from query

2013-10-15 Thread Ian Lea
If you want to keep hyphens you could try WhitespaceAnalyzer. But that may of course have knock on effects on other searches. Don't forget to use the same analyzer for indexing and searching, unless you're doing clever things. An alternative is to create the queries directly in code, but you'll

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
m.out.println("total no of docs " + topDocs5.totalHits); > > } > > } > > > I observed that the file path seperator that i am using in the field and > lucene escape charater seem to be same. so whenever i am using a escape > character in the query the search

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
other fields > and not working on "filePath" field. > > TIA, > Nischal Y > > > On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea wrote: > >> Do some googling on leading wildcards and read things like >> http://www.gossamer-threads.com/lists/lucene/java-user/17573

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
Do some googling on leading wildcards and read things like http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick an option you like. -- Ian. On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy wrote: > Hi, > > I have problem with doing wild card search on file path fields. > > i ha

Re: Calculating min, max and sum of a field in docs returned by search [SEC=UNOFFICIAL]

2013-10-14 Thread Ian Lea
I'd start with the simple approach of a stored field and only worry about performance if you needed to. Field caching would likely help if you did need to. -- Ian. On Mon, Oct 14, 2013 at 2:04 AM, Stephen GRAY wrote: > UNOFFICIAL > Hi everyone, > > I'd appreciate some help with a problem I'm

Re: Performance/scoring impacts with multiple occurrences of a field

2013-10-11 Thread Ian Lea
With multiple fields of the same name vs a single field I doubt you'd be able to tell the difference in performance or matching or scoring in normal use. There may be some matching/ranking effect if you are looking at, say, span queries across the multiple fields. Try it out and see what happens.

Re: Optimizing Filters

2013-10-11 Thread Ian Lea
Are you going to be caching and reusing the filters e.g. by CachingWrapperFilter? The main benefit of filters is in reuse. It takes time to build them in the first place, likely roughly equivalent to running the underlying query although with variations as you describe. Or are you saying that qu

Re: Multiple Keywords - Regular and Any Order Search

2013-10-11 Thread Ian Lea
Looks like you can achieve most of what you want by using AND rather than OR. I think that all the should/should not examples you give will work if you use AND on your content field. For ordering, I suggest you look at SpanNearQuery. That can consider order and slop, the distance between the sea

Re: queries with "&&" doesn't work but "AND" does

2013-10-10 Thread Ian Lea
Looks like you've got some XML processing in there somewhere. Nothing to do with lucene. This code: public static void main(String[] _args) throws Exception { QueryParser qp = new QueryParser(Version.LUCENE_44, "x", new StandardAnalyzer(Version.LUCENE_44)); for (String s : _args) { System

Re: Problem with MultiPhrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
(); t.test(_args[0], _args[1]); } } On Thu, Oct 3, 2013 at 4:10 PM, VIGNESH S wrote: > Hi, > > sorry.. thats my typo.. > > Its not failing because of that > > > On Thu, Oct 3, 2013 at 8:17 PM, Ian Lea wrote: > >> Are you sure it's not failing because

Re: Problem with MultiPhrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
Are you sure it's not failing because "adhoc" != "ad-hoc"? -- Ian. On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S wrote: > Hi, > > I am Trying to do Multiphrase Query in Lucene 4.3. It is working Perfect > for all scenarios except the below scenario. > When I try to Search for a phrase which is pre

Re: Handling abrupt shutdown while indexing

2013-10-03 Thread Ian Lea
I'd write a shutdown method that calls close() in a controlled manner and invoke it at 23:55. You could also call commit() at whatever interval makes sense to you but if you carried on killing the JVM you'd still be liable to lose any docs indexed since the last commit. This is standard stuff jus

Re: Multiphrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
wrote: > Ian, > Thanks for your reply.. > I am facing the same problem if i use whiteSpaceTokenizer also. > My analyzer works perfect in case of Lucene 3.6. > > Thanks and Regards > Vignesh Srinivasan > > On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea wrote: > >> Cer

Re: Multiphrase Query in Lucene 4.3

2013-10-03 Thread Ian Lea
t should preserve. >> >> I created my analyzer with tokenizer which returns >> Character.isDefined(cn) && (!Character.isWhitespace(cn)). >> My analyzer will use a lowe case filter on top of the tokenizer.This Woks >> Perfect in case of 3.6 >> In 4.3 it is creating p

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-02 Thread Ian Lea
Yes, as I suggested, you could search on your unique id and not index if already present. Or, as Uwe suggested, call updateDocument instead of add, again using the unique id. -- Ian. On Tue, Oct 1, 2013 at 6:41 PM, gudiseashok wrote: > I am really sorry if something made you confuse, as I sai

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
I'm still a bit confused about exactly what you're indexing, when, but if you have a unique id and don't want to add or update a doc that's already present, add the unique id to the index and search (TermQuery probably) for each one and skip if already present. Can't you change the log rotation/co

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
milliseconds as unique keys are a bad idea unless you are 100% certain you'll never be creating 2 docs in the same millisecond. And are you saying the log record A1 from file a.log indexed at 14:00 will have the same unique id as the same record from the same file indexed at 14:30 or will it be di

Re: Multi server

2013-10-01 Thread Ian Lea
I'm not aware of a lucene rather than Solr or whatever tutorial. A search for something like "lucene sharding" will get hits. Why don't you want to use Solr or Katta or similar? They've already done much of the hard work. How much data are you talking about? What are your master-master require

Re: Multiphrase Query in Lucene 4.3

2013-09-30 Thread Ian Lea
fix.add(new Term("content", > s)); > } else { > break; > } > } > while (trm.next() != null); > } > > > > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea wr

Re: Multiphrase Query in Lucene 4.3

2013-09-30 Thread Ian Lea
the same logic and it is working. >> > >> > In Lucene 4.3,I implemented the Index for that using >> > >> > FieldType offsetsType = new FieldType(TextField.TYPE_STORED); >> > >> > >> >> offsetsTyp

Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-09-26 Thread Ian Lea
Is this OOM happening as part of your early morning optimize or at some other point? By optimize do you mean IndexWriter.forceMerge(1)? You really shouldn't have to use that. If the index grows forever without it then something else is going on which you might wish to report separately. -- Ian.

Re: Multiphrase Query in Lucene 4.3

2013-09-26 Thread Ian Lea
I use the code below to do something like this. Not exactly what you want but should be easy to adapt. public List findTerms(IndexReader _reader, String _field) throws IOException { List l = new ArrayList(); Fields ff = MultiFields.getFields(_reader); Terms tr

Re: Strange behaviour of tokenizer with wildcard queries

2013-09-20 Thread Ian Lea
. Maybe try storing this field without analysis, or just with something simple like downcasing, and searching with a PrefixQuery? I think that would work. -- Ian. On Fri, Sep 20, 2013 at 1:48 PM, Ramprakash Ramamoorthy wrote: > On Fri, Sep 20, 2013 at 6:11 PM, Ian Lea wrote: > >> It

Re: Strange behaviour of tokenizer with wildcard queries

2013-09-20 Thread Ian Lea
It's reasonable that "block-major" won't find anything. "block-major-57" should match. The split into block and major-57 will be because, from the javadocs for ClassicTokenizer, "Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a produc

Re: Lucene Query Syntax with analyzed and unanalyzed text

2013-09-16 Thread Ian Lea
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper in analyzers-common is what you need. There's an example in the javadocs. Build and use the wrapper instance in place of StandardAnalyzer or whatever you are using now. -- Ian. On Mon, Sep 16, 2013 at 5:36 PM, Scott Smith wrote

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Ian Lea
Not exactly dumb, and I can't tell you exactly what is happening here, but lucene stores some info at the index level rather than the field level, and things can get confusing if you don't use the same Field definition consistently for a field. >From the javadocs for org.apache.lucene.document.Fie

  1   2   3   4   5   6   7   8   9   10   >