[ANNOUNCE] Apache Lucene 8.6.3 released

2020-10-08 Thread Jason Gerlowski
The Lucene PMC is pleased to announce the release of Apache Lucene 8.6.3. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-02 Thread Jason Gerlowski
A1, A2, D (binding) On Wed, Sep 2, 2020 at 10:47 AM Michael McCandless wrote: > > A2, A1, C5, D (binding) > > Thank you to everyone for working so hard to make such cool looking possible > future Lucene logos! And to Ryan for the challenging job of calling this > VOTE :) > > Mike McCandless >

Re: [VOTE] Lucene logo contest

2020-06-16 Thread Jason Gerlowski
Option "A" On Tue, Jun 16, 2020 at 8:37 PM Man with No Name wrote: > > A, clean and modern. > > On Mon, Jun 15, 2020 at 6:08 PM Ryan Ernst wrote: >> >> Dear Lucene and Solr developers! >> >> In February a contest was started to design a new logo for Lucene [1]. That >> contest concluded, and I

Re: term frequency

2016-11-24 Thread Jason Wee
the exception line does not match the code you pasted, but do make sure your object actually not null before accessing its method. On Thu, Nov 24, 2016 at 5:42 PM, huda barakat wrote: > I'm using SOLRJ to find term frequency for each term in a field, I wrote > this code but it is not working: > >

Lucene IndexSearcher PrefixQuery seach getting really slow after a while

2016-11-03 Thread Jason Wu
Hi Team, We are using lucene 4.8.1 to do some info searches every day for years. However, recently we encounter some performance issues which greatly slow down the lucene search. After application running for a while, we are facing below issues, which IndexSearcher PrefixQuery taking much lon

Re: Request for help with Lucene search engine

2015-06-26 Thread Jason Wee
maybe start with this? https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/test/org/apache/lucene/search/TestDocValuesScoring.java hth jason On Fri, Jun 26, 2015 at 7:40 PM, Rim REKIK wrote: > Dear, > I m trying Lucene to work with Lucene search engine. But I m asking if &

Re: Global ordinal based query time join documentation

2015-06-06 Thread Jason Wee
https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/join/src/test/org/apache/lucene/search/join/TestJoinUtil.java?view=markup&pathrev=1671777 https://svn.apache.org/viewvc?view=revision&revision=1671777 https://issues.apache.org/jira/browse/LUCENE-6352 hth jason On Fr

Re: Java8 and lucene version

2015-05-06 Thread Jason Wee
) immediately. hth jason On Thu, May 7, 2015 at 4:19 AM, Pushyami Gundala wrote: > Hi, We are using lucene 2.9.4 version for our application that has search. > We are planning on upgrading our application to run on java 8. My Question > is when we move to java 8 does the lucene-2.9.

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
DB result set - When I loop the result set, I reuse the same Document instance. - At the end of each loop, I call indexWriter.addDocument(doc) 4. After all docs are added, call IndexWriter.commit() 5. IndexWriter.close(); Thank you, Jason -- View this message in context

RE: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
n and take 22 mins. Did you have any similar experience like the above before? Thank you, Jason -- View this message in context: http://lucene.472066.n3.nabble.com/Making-lucene-indexing-multi-threaded-tp4087830p4166116.html Sent from the Lucene - Java Users mailing list archive at Nabbl

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about it? Thank you, Jason -- View this message in context: http

Lucene Indexing performance issue

2014-10-22 Thread Jason Wu
pplication. Can you give me some suggestions about my issue? Thank you, Jason

Re: make data search as index progress.

2014-05-02 Thread Jason Wee
different settings for the index writer config and merge policy. Thank for the lengthy information and we have also make our code reachable via github.com /Jason On Wed, Apr 16, 2014 at 10:55 AM, Jose Carlos Canova < jose.carlos.can...@gmail.com> wrote: > No, the index remains, you c

Re: make data search as index progress.

2014-04-15 Thread Jason Wee
dex speed get very very slow (like 10-20doc per second) unfortunately and at times, after index on N files, it just stalled forever, am not sure what went wrong. /Jason On Mon, Apr 14, 2014 at 9:01 PM, Jose Carlos Canova < jose.carlos.can...@gmail.com> wrote: > Hello, > &

make data search as index progress.

2014-04-14 Thread Jason Wee
needed to pass in IndexWriter and DirectoryReader to make it searchable. Thanks and appreciate any advice. /Jason

Re: background merge hit exception

2014-04-09 Thread Jason Wee
time with large merge segments, that is 50. if (writer != null && forceMerge) { writer.forceMerge(50); writer.commit(); } With these changed, the exceptions reported initially, is no longer happening. Thank you again. Jason On Tue, Apr 8, 2014 at 8:50 PM, Jose Carlo

Re: background merge hit exception

2014-04-08 Thread Jason Wee
terConfig(Version.LUCENE_46, > analyzer); yes, we were and still referencing lucene_46 in our analyzer. /Jason On Sat, Apr 5, 2014 at 9:01 PM, Jose Carlos Canova < jose.carlos.can...@gmail.com> wrote: > Seems that you want to force a max number of segments to 1, > On a previous threa

background merge hit exception

2014-04-03 Thread Jason Wee
eScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) We do not know what is wrong as our understanding on lucene is limited. Can someone give explanation on what is happening, or which might be the possible error source is? Thank you and any advice is appreciated. /Jason

Re: codec mismatch

2014-03-06 Thread Jason Wee
wrongly. It was set from 0 all the time when it should be set based on lucene called seek(position). Thank you again. Jack, it is educational purpose and we think lucene is a fantastic software and we would like to learn it in details. Jason On Mon, Feb 17, 2014 at 10:31 PM, Jack Krupansky

Re: codec mismatch

2014-02-17 Thread Jason Wee
le name? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Feb 14, 2014 at 3:13 AM, Jason Wee wrote: > > Hello, > > > > This is my first question to lucene mailing list, sorry if the question > > sounds funny. > > > > I have

codec mismatch

2014-02-14 Thread Jason Wee
Hello, This is my first question to lucene mailing list, sorry if the question sounds funny. I have been experimenting to store lucene index files on cassandra, unfortunately the exception got overwhelmed. Below are the stacktrace. org.apache.lucene.index.CorruptIndexException: codec mismatch: a

Re: deleteDocuments(Term... terms) takes a long time to do nothing.

2013-12-16 Thread Jason Corekin
tried to search by query I used to filenames stored in each document as the query, which was essentially equivalent to deleting by term. You email helped me to realize this and in turn change my query to be time range based, which now takes seconds to run. Thank You Jason Corekin >It sou

Re: deleteDocuments(Term... terms) takes a long time to do nothing.

2013-12-14 Thread Jason Corekin
Mike, Thanks for the input, it will take me some time to digest and trying everything you wrote about. I will post back the answers to your questions and results to from the suggestions you made once I have gone over everything. Thanks for the quick reply, Jason On Sat, Dec 14, 2013 at 5:13

Re: deleteDocuments(Term... terms) takes a long time to do nothing.

2013-12-14 Thread Jason Corekin
;,filename, Field.Store.YES)); On Sat, Dec 14, 2013 at 1:28 AM, Jason Corekin wrote: > Let me start by stating that I almost certain that I am doing something > wrong, and that I hope that I am because if not there is a VERY large bug > in Lucene. What I am trying to d

deleteDocuments(Term... terms) takes a long time to do nothing.

2013-12-13 Thread Jason Corekin
or Lucene 4.6. If anyone has any ideas as to what I might be doing wrong, I would really appreciate reading what you have to say. Thanks in advance. Jason private void cloneDB() throws QueryNodeException { Document doc

Assistance for Unified Index Proces

2013-08-14 Thread Mark Jason B. Nacional
ified Index". In this implementation, we have only one index file to manage. I just want to get information as to how am I going to implemented it in a an optimal way. Any suggestion would be perfect! :) Thanks! Mark Jason Nacional Junior Software Engineer

Lucene VSM scoring

2013-07-09 Thread Jason Z.
Hi, In the Lucene docs it mentions that Lucene impements a tf-idf weighting scheme for scoring. Is there anyway to modfiy Lucene to implement a custom weighting scheme for the VSM? Thank you.

Looking for case studies for 'Lucene and Solr: The Definitive Guide' from O'Reilly

2012-12-17 Thread Jason Rutherglen
Cloud * Hadoop integration Thanks, Jason Rutherglen, Jack Krupansky, and Ryan Tabora http://shop.oreilly.com/product/0636920028765.do - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-ma

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Jason Rutherglen
t. Is that right? > > What about the ByteBufferDirectory? Can this specific directory utilize the > 2GB memory I grant to the app? > > On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> If you want the index to be stored

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Jason Rutherglen
If you want the index to be stored completely in RAM, there is the ByteBuffer directory [1]. Though I do not see the point in putting an index in RAM, it will be cached in RAM regardless in the OS system IO cache. 1. https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/ap

Re: date issues

2012-02-22 Thread Jason Toy
ya > www.findbestopensource.com > > > On Thu, Feb 23, 2012 at 11:55 AM, Jason Toy wrote: > >> I have a solr instance with about 400m docs. For text searches it is >> perfectly fine. When I do searches that calculate the amount of times a >> word appeared in the doc

date issues

2012-02-22 Thread Jason Toy
n the mailing list and on google and not sure what to use, I would appreciate any pointers. Thanks. Jason - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Jason Rutherglen
red > SUM, stats would do it. > > Erick > > On Thu, Jan 5, 2012 at 7:23 PM, Jason Rutherglen > wrote: >>> Short answer is that no, there isn't an aggregate >>> function. And you shouldn't even try >> >> If that is the case why does a 'st

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Jason Rutherglen
> Short answer is that no, there isn't an aggregate > function. And you shouldn't even try If that is the case why does a 'stats' component exist for Solr with the SUM function built in? http://wiki.apache.org/solr/StatsComponent On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson wrote: > You will

BigInteger usage in numeric Trie range queries

2011-11-28 Thread Jason Rutherglen
Even though the NumericRangeQuery.new* methods do not support BigInteger, the underlying recursive algorithm supports any sized number. Has this been explored? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For

Re: ElasticSearch

2011-11-16 Thread Jason Rutherglen
The docs are slim on examples. On Wed, Nov 16, 2011 at 3:35 PM, Peter Karich wrote: > >>> even high complexity as ES supports lucene-like query nesting via JSON >> That sounds interesting.  Where is it described in the ES docs?  Thanks. > > "Think of the Query DSL as an AST of queries" > http://w

Re: ElasticSearch

2011-11-16 Thread Jason Rutherglen
> even high complexity as ES supports lucene-like query nesting via JSON That sounds interesting. Where is it described in the ES docs? Thanks. On Wed, Nov 16, 2011 at 1:36 PM, Peter Karich wrote: >  Hi, > > its not really fair to compare NRT of Solr to ElasticSearch. > ElasticSearch provides

RE: Case insensitive sortable column

2011-10-11 Thread Sendros, Jason
If that's not an option, create another column with the same data lowercased and search on the new column while displaying the original column. Jason -Original Message- From: Greg Bowyer [mailto:gbow...@shopzilla.com] Sent: Tuesday, October 11, 2011 10:43 PM To: java

RE: searching / sorting on timestamp and update efficiency

2011-09-22 Thread Sendros, Jason
n to avoid memory leaks. Jason -Original Message- From: Sam Jiang [mailto:sam.ji...@karoshealth.com] Sent: Thursday, September 22, 2011 10:18 AM To: java-user@lucene.apache.org Subject: searching / sorting on timestamp and update efficiency Hi all I have some questions about how I sh

RE: deleting with sorting and max document

2011-09-14 Thread Sendros, Jason
Vincent, I think you may be looking for the following method: http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/Inde xWriter.html#deleteDocuments%28org.apache.lucene.search.Query%29 Jason -Original Message- From: v.se...@lombardodier.com [mailto:v.se

RE: Lucene scoring and random result order

2011-08-25 Thread Sendros, Jason
You can sort on multiple values. Keep the primary sort as a relevancy sort, and choose something else to sort on to keep the rest of the responses fairly static. http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/search/So rt.html Example: Sort sortBy = new Sort(new SortField[] { Sort

RE: i'm having some trouble with class FSDirectory

2011-08-24 Thread Sendros, Jason
Hi Mostafa, Try looking through the API for help with these types of questions: http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/store/FSDi rectory.html You can use a number of FSDirectory subclasses depending on your circumstances. Hope this helps! Jason -Original Message

How to determine memory required for searching

2011-08-04 Thread Trieu, Jason T
ts of resources. Perhaps 8 GB of memory is just simply not enough to handle an index of 600 million documents. But before telling management that they must get more memory, I'd to see if there might be other ways to accomplish this. Thanks in advance. Jason

Searching for Empty Field

2011-07-14 Thread Trieu, Jason T
the latest postings on this topic were a few years old, I am wondering if there have been any changes in Lucene query syntax to support searching for empty fields. Has anyone been successfully searched for empty fields with recent Lucene releases? Thanks Jason

how to approach phrase queries and term grouping

2011-06-22 Thread Jason Guild
mentioned in the text if that helps. Thanks for any help you can provide. Jason - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index size and performance degradation

2011-06-13 Thread Jason Rutherglen
> deletions made by readers merely mark it for > deletion, and once a doc has been marked for deletions it is deleted for all > intents and purposes, right? There's the point-in-timeness of a reader to consider. > Does the N in NRT represent only the cost of reopening a searcher? Aptly put, and

Re: Index size and performance degradation

2011-06-13 Thread Jason Rutherglen
> I don't think we'd do the post-filtering solution, but instead maybe > resolve the deletes "live" and store them in a transactional data I think Michael B. aptly described the sequence ID approach for 'live' deletes? On Mon, Jun 13, 2011 at 3:00 PM, Michael McCandless wrote: > Yes, adding dele

found a bug, not sure if its lucene or solr

2011-06-03 Thread Jason Toy
in the document. For that reason I believe the bug is in solr and not in lucene, but I'm not certain. Jason Toy socmetrics http://socmetrics.com @jtoy

Lucene Util question

2011-04-08 Thread Jason Rutherglen
Is http://code.google.com/a/apache-extras.org/p/luceneutil/ designed to replace or augment the contrib benchmark? For example it looks like SearchPerfTest would be useful for executing queries over a pre-built index. Though there's no indexing tool in the code tree? -

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Jason Rutherglen
I think Solr has a HashDocSet implementation? On Tue, Apr 5, 2011 at 3:19 AM, Michael McCandless wrote: > Can we simply factor out (poach!) those useful-sounding classes from > Nutch into Lucene? > > Mike > > http://blog.mikemccandless.com > > On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman > w

Append Codec random testing

2011-03-21 Thread Jason Rutherglen
I'm seeing an error when using the misc Append codec. java.lang.AssertionError at org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107) at org.apache.lucene.index.codecs.BlockTermsReader$FieldReader$SegmentTermsEnum._next(BlockTermsReader.java:661) at org.apache.luce

Is ConcurrentMergeScheduler useful for multiple running IndexWriter's?

2011-03-04 Thread Jason Rutherglen
ConcurrentMergeScheduler is tied to a specific IndexWriter, however if we're running in an environment (such as Solr's multiple cores, and other similar scenarios) then we'd have a CMS per IW. I think this effectively disables CMS's max thread merge throttling feature? ---

Proper way to deal with shared indexer exception

2011-02-25 Thread Jason Tesser
AlreadyClosedException ace OR ClosedChannelException OR IOException what would be the best to do with my shared searcher * * 2. is reopen enough? or should I get a brand new searcher? Thanks, Jason Tesser dotCMS Lead Development Manager 1-305-858-1422

Re: Last/max term in Lucene 4.x

2011-02-21 Thread Jason Rutherglen
ordered IDs stored in the index, so that remaining documents (that lets say were left in RAM prior to process termination) can be indexed. It's an inferred transaction checkpoint. On Mon, Feb 21, 2011 at 5:31 AM, Michael McCandless wrote: > On Sun, Feb 20, 2011 at 8:47 PM, Jason Rutherglen &

Re: Last/max term in Lucene 4.x

2011-02-20 Thread Jason Rutherglen
rd. How would I seek to the last term in the index using VarGaps? Or do I need to interact directly with the FST class (and if so I'm not sure what to do there either). Thanks Mike. On Sun, Feb 20, 2011 at 2:51 PM, Michael McCandless wrote: > On Sat, Feb 19, 2011 at 8:42 AM, Jason Rutherg

Re: Last/max term in Lucene 4.x

2011-02-19 Thread Jason Rutherglen
h the existing) to automatically store the max term? On Sat, Feb 19, 2011 at 3:33 AM, Michael McCandless wrote: > I don't quite understand your question Jason... > > Seeking to the first term of the field just gets you the smallest term > (in unsigned byte[] order, ie Unicode order

Last/max term in Lucene 4.x

2011-02-18 Thread Jason Rutherglen
This could be a rhetorical question. The way to find the last/max term that is a unique per document is to use TermsEnum to seek to the first term of a field, then call seek to the docFreq-1 for the last ord, then get the term, or is there a better/faster way?

Re: Storing an ID alongside a document

2011-02-03 Thread Jason Rutherglen
> there is a entire RAM resident part and a Iterator API that reads / > streams data directly from disk. > look at DocValuesEnum vs, Source Nice, thanks! On Thu, Feb 3, 2011 at 12:20 AM, Simon Willnauer wrote: > On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen > wrote: >>

Re: Storing an ID alongside a document

2011-02-02 Thread Jason Rutherglen
s branch) > > -Yonik > http://lucidimagination.com > > > On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen > wrote: > >> I'm curious if there's a new way (using flex or term states) to store >> IDs alongside a document and retrieve the IDs of the top N resul

Storing an ID alongside a document

2011-02-02 Thread Jason Rutherglen
I'm curious if there's a new way (using flex or term states) to store IDs alongside a document and retrieve the IDs of the top N results? The goal would be to minimize HD seeks, and not use field caches (because they consume too much heap space) or the doc stores (which require two seeks). One pos

Re: API access to in-memory tii file (3.x not flex).

2010-11-10 Thread Jason Rutherglen
Yeah that's customizing the Lucene source. :) I should have gone into more detail, I will next time. On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless wrote: > Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis > file, just that it only contains every 128th term. > > If you

Re: API access to in-memory tii file (3.x not flex).

2010-11-10 Thread Jason Rutherglen
In a word, no. You'd need to customize the Lucene source to accomplish this. On Wed, Nov 10, 2010 at 1:02 PM, Burton-West, Tom wrote: > Hello all, > > We have an extremely large number of terms in our indexes.  I want to be able > to extract a sample of the terms, say something like every 128th

Re: Recreate segment infos

2010-10-05 Thread Jason Rutherglen
egment is given the same name as the first segment that > shares it.  However, unfortunately, because of merging, it's possible > that this mapping is not easy (maybe not possible, depending on the > merge policy...) to reconstruct.  I think this'll be the hardest part > :) > &

Recreate segment infos

2010-10-04 Thread Jason Rutherglen
Lets say the segment infos file is missing, and I'm aware of CheckIndex, however is there a tool to recreate a segment infos file? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:

Surge 2010 Early Registration ends Tuesday!

2010-08-27 Thread Jason Dixon
t and guarantee your seat to this year's event! -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-

Register now for Surge 2010

2010-08-02 Thread Jason Dixon
your seat to this year's event! http://omniti.com/surge/2010/register Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.

Last day to submit your Surge 2010 CFP!

2010-07-09 Thread Jason Dixon
your business sponsor/exhibit at Surge 2010, please contact us at su...@omniti.com. Thanks! -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241 - To unsubscribe, e-mail: java-user-unsubscr

CFP for Surge Scalability Conference 2010

2010-07-02 Thread Jason Dixon
icipating as an exhibitor, please visit the Surge website or contact us at su...@omniti.com. Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241 - To unsubscribe, e-mail: jav

Re: Last Call: Lucene Revolution CFP Closes Tomorrow Wednesday, June 23, 2010, 12 Midnight PDT

2010-06-22 Thread Jason Rutherglen
Grant, I can probably do the 3 billion document one from Prague, or a realtime search one... I spaced on submitting for ApacheCon. Are there cool places in the Carolinas to hang? Cheers bro, Jason On Tue, Jun 22, 2010 at 10:51 AM, Grant Ingersoll wrote: > Lucene Revolution Call

CFP for Surge Scalability Conference 2010

2010-06-14 Thread Jason Dixon
n Surge is just what you've been waiting for. For more information, including CFP, sponsorship of the event, or participating as an exhibitor, please contact us at su...@omniti.com. Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.co

Monitoring low level IO

2010-06-03 Thread Jason Rutherglen
This is more of a unix related question than Lucene specific however because Lucene is being used, I'm asking here as perhaps other people have run into a similar issue. On an Amazon EC2 merge, read, and write operations are possibly blocking due to underlying IO. Is there a tool that you have use

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Jason Eacott
Thanks for the ref - didn't know about Pig before. the language and approach looks useful, so now I'm wondering if it couldn't be used across lucene over hadoop too. If data was indexed in lucene and Pig knew that, then it could make for an interesting alternate lucene query language. could this w

fastest way to gather simple terms that match documents?

2010-03-31 Thread Jason Eacott
keen to avoid that option if possible. Is there a quick way to discover this information? All I need is a list of terms (as simple strings would be fine), I don't care how many were found or what position or anything else. just which ones matched. thoug

Is it safe to use reopen on IndexReader

2010-03-31 Thread Jason Tesser
= new IndexSearcher(ir.reopen(true)); if(ir != indexSearcher.getIndexReader()){ ir.close(); } Is the if(ir != indexSearcher.getIndexReader()){ check needed? Thanks, Jason Tesser dotCMS Lead Development Manager 1-305-858-1422

Re: If you could have one feature in Lucene...

2010-02-25 Thread Jason Rutherglen
long - whatever > happened to CSF? That feature is so 2006, and we still > don't have it? I'm completely disturbed about the whole situation myself. > > Who the heck is in charge here? > > On 02/25/2010 12:51 PM, Jason Rutherglen wrote: >> >> It'd be great to

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Jason Rutherglen
Peter, Perhaps other concurrent operations? Jason On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan wrote: > Using Lucene 2.9.1, I have the following pseudocode which gets repeated at > regular intervals: > > 1. FSDirectory dir = FSDirectory.open(java.io.File); > 2. dir.set

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Answering my own question... PatternReplaceFilter doesn't output multiple tokens... Which means messing with capture state... On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen wrote: > Transferred partially to solr-user... > > Steven, thanks for the reply! > > I wonder if

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
wrote: > Hi Jason, > > Solr's PatternReplaceFilter(ts, "\\P{Alnum}+$", "", false) should work, > chained after an appropriate tokenizer. > > Steve > > On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote: >> Is there an anal

Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Is there an analyzer that easily strips non alpha-numeric from the end of a token? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: file open handles?

2010-01-26 Thread Jason Rutherglen
Jamie, How often are you calling getReader? Is it only these files? Jason On Tue, Jan 26, 2010 at 12:58 PM, Jamie wrote: > Ok. I spoke too soon. The problem is not solved. I am still seeing these > file handles lying around. Is this something I should be worried about? > We are no

Re: file open handles?

2010-01-26 Thread Jason Rutherglen
You can call close on the reader obtained via writer.getReader. Well, actually, you'll need to. :) The underlying writer will not be affected though. On Tue, Jan 26, 2010 at 11:45 AM, Jamie wrote: > Hi Jason > > No .I wasn't sure whether I needed to or not. We have ju

Re: file open handles?

2010-01-26 Thread Jason Rutherglen
Jamie, Are you calling close on the reader? Jason On Tue, Jan 26, 2010 at 11:23 AM, Jamie wrote: > Hi Erick > > Our app is a long running server. Is it a problem if indexes are never > closed? Our searchers > do see the latest snapshot as we use writer.getReader() method for

Re: Tag Index patch (LUCENE-1292) status?

2010-01-21 Thread Jason Rutherglen
from these lower Lucene levels, I don't see working on it in the near future. Jason On Thu, Jan 21, 2010 at 8:18 PM, Chris Harris wrote: > I'm probably not going to work on it right now. > > It might be nice, though, to make sure I have the right big-picture > idea of the tag

Re: Tag Index patch (LUCENE-1292) status?

2010-01-19 Thread Jason Rutherglen
Hi Chris, It's not actively being worked on. Are you interested in working on it? Jason On Tue, Jan 19, 2010 at 4:42 PM, Chris Harris wrote: > I'm interested in the Tag Index patch (LUCENE-1292), in particular > because of how it enables you to modify certain fields withou

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Actually I meant to say indexes... However when optimize(numsegments) is used they're segments... On Wed, Jan 13, 2010 at 3:05 PM, Otis Gospodnetic wrote: > I think Jason meant "15-20GB segments"? >  Otis > -- > Sematext -- http://sematext.co

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Right... It all blends together, I need an NLP analyzer for my emails On Wed, Jan 13, 2010 at 3:05 PM, Otis Gospodnetic wrote: > I think Jason meant "15-20GB segments"? >  Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > __

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Chavalittumrong wrote: > Seems like optimize() only cares about final number of segments rather than > the size of the segment. Is it so? > > On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> There's a different method

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
is only used during index time and will be ignored > by by the Optimize() process? > > > On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> Oh ok, you're asking about optimizing... I think that's a different >&g

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Oh ok, you're asking about optimizing... I think that's a different algorithm inside LogMergePolicy. I think it ignores the maxMergeMB param. On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong wrote: > Thanks, Jason. > > Is my understanding correct that LogByteSizeMergeP

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
on at best. Jason On Wed, Jan 13, 2010 at 1:36 PM, Trin Chavalittumrong wrote: > Hi, > > > > I am trying to optimize the index which would merge different segment > together. Let say the index folder is 1Gb in total, I need each segmentation > to be no larger than 200Mb. I tried

Re: Term Frequency for phrases

2010-01-08 Thread Jason Rutherglen
I'm not going to go into too much code level detail, however I'd index the phrases using tri-gram shingles, and as uni-grams. I think this'll give you the results you're looking for. You'll be able to quickly recall the count of a given phrase aka tri-gram such as "blue_shorts_burough" On Fri, J

Re: Is there a way to limit the size of an index?

2010-01-07 Thread Jason Rutherglen
The naming is unclear, when I looked at this I had to thumb through the code a fair bit before discerning if it was the input segments or the output segment of a merge (it's the former). Though I find the current functionality somewhat odd because it will inherently exceed the given size with a mer

Re: Question about many fields within a single index

2009-12-31 Thread Jason Tesser
right we do analyze a number of fields. We use the WHiteSpace whenever we have a text field. So maybe 5 on average per guy. Can be more of course. Thanks, Jason Tesser dotCMS Lead Development Manager 1-305-858-1422 On Wed, Dec 30, 2009 at 10:44 PM, Tom Hill wrote: > Hi - > > One

Question about many fields within a single index

2009-12-30 Thread Jason Tesser
hoping not to have to have many indexes under the covers if I can avoid it but I don't want performance to suffer either. Any thoughts? Thanks, Jason Tesser dotCMS Lead Development Manager 1-305-858-1422 - To unsubscribe, e

CJKAnalyzer phrase slop?

2009-12-13 Thread Jason Rutherglen
Does CJK support phrase slop? (I'm assuming no) - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1

2009-12-10 Thread Jason Fennell
3 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> -Original Message- >>> From: Jason Fennell [mailto:jdfenn...@gmail.com] >>> Sent: Wednesday, December 09, 2009 7:48 PM >>> To: java-user@lucene.apache.org >>> Su

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Jason Rutherglen
Mike, Is this the thread? http://www.lucidimagination.com/search/document/1e87d488a904b89f/spannearquery_s_spans_payloads#8103efdc9705a763 Maybe we need a recommended workaround for this? Jason On Wed, Dec 9, 2009 at 1:17 PM, Michael McCandless wrote: > That sounds familiar... try to tr

MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1

2009-12-09 Thread Jason Fennell
I'm trying to upgrade our application from Lucene 2.4.1 to Lucene 2.9.1. I've been using an InstantiatedIndex to do a bunch of unit testing, but am running into a some problems with Lucene 2.9.1. In particular, when I try to run a MatchAllDocsQuery on my InstantiatedIndex (which worked fine on 2.4.

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Jason Rutherglen
if that included sometimes > missing payloads... > > Mike > > On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen > wrote: >> Howdy, >> >> I am wondering if anyone has seen >> NearSpansUnordered.getPayload() not return payloads that are >> verifiably ac

NearSpansUnordered payloads not returning all the time

2009-12-08 Thread Jason Rutherglen
. I'll put together a test case, however the difficulty is that we're only seeing the issue with largish 800 MB indexes, which could make the test case a little crazy. Jason - To unsubscribe, e-mail: java-user-unsubscr.

  1   2   3   >