Re: Can I use multiple writers of different applications on a same FSDirectory?

2012-02-14 Thread Mihai Caraman
you can use a fsdirectory for each writer and then, search on all of them at once. This is the recomended way if you have different apps. În data de 14 februarie 2012, 19:06, Ian Lea a scris: > You can only have one writer against one index at a time. Lucene's > locking will prevent anything el

Performance question

2011-07-13 Thread Mihai Caraman
Hello, My name is Mihai and I'm trying to write a java (later I'll need to port it to pylucene) search on billions of mentions like twitter statuses. Mentions are grouped by some containing keywords. I'm thinking of partitioning the index for faster results as follows:

Re: Performance question

2011-07-14 Thread Mihai Caraman
Thank you for the reply, if you need more info to understand the question, I'll try to be as prompt as possible. > -if i search on last week's index and the individual index (this needs to be > opened at search request!?) will it be faster than using a single huge index > for all groups, for all w

Re: Questions on index Writer

2011-07-16 Thread Mihai Caraman
> > indexWriter = new IndexWriter(FSDirectory.open(new File(indexDirName)), > getAnalyzer(), true, MaxFieldLength.UNLIMITED); > > Does this statement cleans up existing index files? yes > If yes, then how do I > tackle a scenario where lets say I brought down my application server > hosting code

HighFreqTerms for results set

2011-07-18 Thread Mihai Caraman
So I looked around and found no viable solution for this problem: How to extract the most frequent terms in the search result set after submitting the query. HighFreqTerms and docFreq

Re: HighFreqTerms for results set

2011-07-18 Thread Mihai Caraman
2011/7/18 Manish Bafna > Use Facet by that field. It will bring up top words. > > On Mon, Jul 18, 2011 at 6:03 PM, Mihai Caraman >wrote: > > > So I looked around and found no viable solution for this problem: > > How to extract the most frequent terms in the searc

Re: Re: HighFreqTerms for results set

2011-07-19 Thread Mihai Caraman
Yeah, that's to slow to use. Thank you very much for your answers. I really appreciate it. All the best, Mihai C

Re: HighFreqTerms for results set

2011-07-21 Thread Mihai Caraman
It's only available in Solr and it's based on UnInvertedField . Lucene 3.4.0 should have itimplemented too. I ran a small index in Solr and it does the job by showing

Re: please help

2011-07-21 Thread Mihai Caraman
Before you get into Java, you should know that it's posible to find a lucene implementation for your language. Lucene is available in python, c# .net, etc... Search for that first. If you chose Java, you'll need to make baby steps. Install yourself an IDE (eclipse, netbeans...) to get rid of all t

Separating IndeWriter with NRT

2011-07-22 Thread Mihai Caraman
I trust that some of you had to run the indexing as a service/jar and the search as a servlet/war. How can i obtain this while still keeping the search near real time(this is difficult because IndexReader needs direct access to the IndexWriter instance). If no lucene users know this, where else sho

Re: Is There a Way To Split The Lucene Index Segments To Samller Size Less Than 1 GB

2011-07-27 Thread Mihai Caraman
> > > smaller segments of size less than 1 GB, can you please know me? > As i recall, the optimize mechanism can be told how many segments to create. So you can verify your index size and know how many segments to create before optimization.

Re: Overriding default handling of '/' and '-'

2011-08-17 Thread Mihai Caraman
QueryParser is to blaim, so avoid using it. Like you said, by just filtering you're good. That's how I did it, when the query came, it came broken in two, the part that needed to be (full-text)analyzed and the second part by which I filtered as exact match(suppose it applies to you too) 2011/8/17

Re: Regarding multiple index creation and Searching

2011-08-17 Thread Mihai Caraman
heard that ~80million docs per index (varying with average document size). @Uwe Schindler: Is hashed distribution really necessary when using MultiReader? I did hear that solr uses continuous hashing algorithm with shards of indexes. But MultiReader didn't say anything about hashing.

Re: Analysis

2011-08-22 Thread Mihai Caraman
http://snowball.tartarus.org/ for stemming 2011/8/22 Saar Carmi > Hi > Where can I find a guide for building analyzers, filters and tokenizers? > > Saar >

Re: [ANNOUNCE] Apache Lucene 3.4.0 released

2011-09-20 Thread Mihai Caraman
How can you use the TaxonomyReader in NRT search? Fantastic job! 2011/9/15 Michael McCandless > September 14 2011, Apache Lucene™ 3.4.0 available > > The Lucene PMC is pleased to announce the release of Apache Lucene 3.4.0. > > Apache Lucene is a high-performance, full-featured text search engi

Re: Re: Re: search match documents and pagination in lucene3.x

2011-09-21 Thread Mihai Caraman
totalHits = searcher.search( query,searcher.maxDoc()).scoreDocs.length

Missing Facet link

2011-09-21 Thread Mihai Caraman
at the documentation page http://lucene.apache.org/java/3_4_0/ there's no link in contrib towards http://lucene.apache.org/java/3_4_0/api/contrib-facet/index.html

FacetedSearch DrillDown

2011-09-21 Thread Mihai Caraman
Hello gurus, Cutting to the chase, I index this: CategoryPath(lvl1,lvl2,lvl3) I want to group things as deep as lvl3. Which should be more eficient: *search for categoryPath(lvl1) to get lvl2 results: search lvl2 number of times for categoryPath(lvl1,lvl2) to get lvl3 results* ? or *search drilldo

Re: FacetedSearch DrillDown

2011-09-21 Thread Mihai Caraman
2011/9/21 Shai Erera > What do you mean "up to lvl3"? > "as *deep *as lvl3" :P In this example, let's look at these lvls as a tree(like n-ary tree) with root in a unique value at(the top) lvl 1 ..one with category [l1, l2, l3] and one with [l1, l2], All documents have the same depth (of categor

LuceneTaxonomyWriter unexpected shutdown

2011-09-26 Thread Mihai Caraman
Something strange happened, with no error what-so-ever. Ran a taxwriter in parallel with a IndexWriter, but because the indexReader was NearRealTime, i reopened taxwriter and refreshed the taxreader every 3min. day one, OK day two, OK day tree, no more indexing, results were only from the previou

TaxWriter leakage?

2011-09-29 Thread Mihai Caraman
There may be some likage while using threadedIndexWriter... The app start as a listener servlet in tomcat6 First start, all ok. First close, none of these lines appear: *INFO: A valid shutdown command was received via the shutdown port. Stopping the Server instance. Sep 29, 2011 3:28:34 PM org.apa

Re: TaxWriter leakage?

2011-09-29 Thread Mihai Caraman
Hmm.. if i leave it a couple of minutes before restarting, it doesn't log the proper shutdown steps, but it does restart correctly. 2011/9/29 Mihai Caraman > There may be some likage while using threadedIndexWriter... > > The app start as a listener servlet in tomcat6 > Fi

Re: [ANN] Luke 3.4.0 release

2011-10-03 Thread Mihai Caraman
same on win7 and ubuntu11. 2011/10/3 Erick Erickson > Same thing happened to me on a Mac, Java 1.6 > > FWIW > Erick > > On Mon, Oct 3, 2011 at 7:45 AM, Shai Erera wrote: > > Thanks Andrezj ! > > > > I downloaded the standalone lukeall-3.4.0.jar and ran "java -jar > > lukeall-3.4.0.jar" and I ge

Re: [ANN] Luke 3.4.0 release

2011-10-03 Thread Mihai Caraman
Works, it can open the tax Index too, ty! 2011/10/3 Andrzej Bialecki > On 03/10/2011 16:09, Mihai Caraman wrote: > >> same on win7 and ubuntu11. >> >> 2011/10/3 Erick Erickson >> > >> >> Same thing happened to me on a Mac, Java 1.6 >>> >

Re: TaxWriter leakage?

2011-10-03 Thread Mihai Caraman
Uwe : > Maybe another Java7 bug? Are you using Java 7? > Nop, java 1.6 Shai > return getParentArray().getArray()[ordinal]; > Can you give me a lucene jar with print outs for when it throws this NPE? ...How is using ThreadedIndexWriter related > When I remove the Threaded version, it doesn't gi

Re: TaxWriter leakage?

2011-10-04 Thread Mihai Caraman
I also think that there is nothing special in the second restart, except > that that by that time there were other servlets up (?) which were able to > trigger simultaneous AddDoc requests, exposing this bug... > > Makes sense? > It does. Can those two Ts be from the threadedindexwriter?

Re: TaxWriter leakage?

2011-10-04 Thread Mihai Caraman
> (org.myapp.search.CustomLucene.ThreadedIndexWriter). > > it's just the threaded example from Lucene In Action SE(Listing 10.1) I opened https://issues.apache.org/jira/browse/LUCENE-3484 for this and will > fix this soon. > Patch in that issue shows how to reproduce this error. > > Thanks for rep

Re: TaxWriter leakage?

2011-10-05 Thread Mihai Caraman
2011/10/4 Doron Cohen > LUCENE-3484 is resolved. > Mihai, could you give it a try and see if this solves the NPE problem in > your setup? > As Jim Carrey whould say: Like a glove!

Re: lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

2011-10-11 Thread Mihai Caraman
Hey, you should compare with the ThreadedIndexWriter too :). I'll attach the source from Lucene in action SE manual and you can just replace the new IntexWriter(... with new ThreadedIndexWriter(... See if those results make a difference. Also I presume you don't have a single core cpu 2011/10/11

Re: How can i search lucene java user list archive?

2011-10-20 Thread Mihai Caraman
http://apache-pivot-users.399431.n3.nabble.com/how-to-search-mailing-list-td1876948.html 2011/10/20 janwen > I want to know how to search the java user list archive. > There is no search function on the site: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/ > Any idea? > thanks > > 2

Re: Return Lucene field name when a query is matched

2011-10-20 Thread Mihai Caraman
So now you have something like query[title,content,header,...]. Evidently you can find out by query[title], query[content], query[header]. But you'd have to the merge the results. Maybe there's a collector for this. 2011/10/20 damian2b > Hi, > > I was given a task to investigate whether it is po

Taxonomy indexer debug

2011-11-24 Thread Mihai Caraman
Hello, I'm having an issue with using NRT and Tax. After a couple of days of running continuously , the taxonomyreader doesn't return results anymore (but taxindex has them). How can i debug this?! does taxonomy index have a logoutput like indexwriter has? will that be enough or relevant? Current

Re: Taxonomy indexer debug

2011-11-28 Thread Mihai Caraman
All packages used: core3.4, queries3.4, facet3.5. Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*. *InitWriters()* writer = new ThreadedIndexWriter taxWriter = new LuceneTaxonomyWriter // because the reader can't start if doesn't have a valid taxIndex directory taxWriter.c

Re: Taxonomy indexer debug

2011-11-28 Thread Mihai Caraman
rve more data to it. > Any particular reason why not using the same version in all 3? > There was a concurrency bug at some point, and after it was fixed, i got a night build to use until 3.5 official release. > Doron > > On Mon, Nov 28, 2011 at 1:01 PM, Mihai Caraman >wrote

Quoted search on Analyzed fields

2011-11-29 Thread Mihai Caraman
field = new Field("author",(author).toLowerCase(),Field.Store.NO, Field.Index.NOT_ANALYZED); field.setIndexOptions(FieldInfo.IndexOptions.DOCS_ONLY); field.setOmitNorms(true); When in the above configuration i switched from NOT_ANALYZED to ANALYZED, luke's results for autho

Re: Quoted search on Analyzed fields

2011-11-29 Thread Mihai Caraman
Still no difference, it may be because of some other hidden bug.Anyway, adding freq and positions will be a no - no because of space :) so bye bye quotes. Thank you