subject:"Performance Question"

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread baris . kazar

Thanks Mikhail. On 2/13/20 5:05 AM, Mikhail Khludnev wrote: Hello, I picked two first questions for reply. does this class offer any Shingling capability embedded to it? No, it doesn't allow to expand wildcard phrase with shingles. I could not find any api within this class ComplexPhra

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread Mikhail Khludnev

Hello, I picked two first questions for reply. > does this class offer any Shingling capability embedded to it? > No, it doesn't allow to expand wildcard phrase with shingles. > I could not find any api within this class ComplexPhraseQueryParser for > that purpose. > There are no one. > B

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread baris . kazar

org.apache.lucene.search.PhraseWildcardQuery looks very good, i hope this makes into Lucene build soon. Thanks > On Feb 12, 2020, at 10:01 PM, baris.ka...@oracle.com wrote: > > Thanks David, can i look at the source code? > i think ComplexPhraseQueryParser uses > something similar. > i will che

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread baris . kazar

Thanks David, can i look at the source code? i think ComplexPhraseQueryParser uses something similar. i will check the differences but do You know the differences for quick reference? Thanks > On Feb 12, 2020, at 6:41 PM, David Smiley wrote: > > > Hi, > > See org.apache.lucene.search.Phra

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread David Smiley

Hi, See org.apache.lucene.search.PhraseWildcardQuery in Lucene's sandbox module. It was recently added by my amazing colleague Bruno. At this time there is no query parser that uses it in Lucene unfortunately but you can rectify this for your own purposes. I hope this query "graduates" to Lucen

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread baris . kazar

Hi,- Regarding this mechanisms below i mentioned, does this class offer any Shingling capability embedded to it? I could not find any api within this class ComplexPhraseQueryParser for that purpose. For instance does this class offer the most commonly used words api? i can then use one of

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread baris . kazar

Thanks but i thought this class would have a mechanism to fix this issue. Thanks > On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote: > > It's slow per se, since it loads terms positions. Usual advices are > shingling or edge ngrams. Note, if this is not a text but a string or enum, > it pr

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread Mikhail Khludnev

It's slow per se, since it loads terms positions. Usual advices are shingling or edge ngrams. Note, if this is not a text but a string or enum, it probably let to apply another tricks. Another idea is perhaps IntervalQueries can be smarter and faster in certain cases, although they are backed on th

Re: ComplexPhraseQueryParser performance question

2020-02-03 Thread baris . kazar

How can this slowdown be resolved? is this another limitation of this class? Thanks > On Feb 3, 2020, at 4:14 PM, baris.ka...@oracle.com wrote: > > Please ignore the first comparison there. i was comparing there {term1 with > 2 chars} vs {term1 with >= 5 chars + term2 with 1 char} > > > The s

Re: ComplexPhraseQueryParser performance question

2020-02-03 Thread baris . kazar

Please ignore the first comparison there. i was comparing there {term1 with 2 chars} vs {term1 with >= 5 chars + term2 with 1 char} The slowdown is The query "term1 term2*" slows down 400 times (~1500 millisecs) compared to "term1*" when term1 has >5 chars and term2 is still 1 char. Best re

ComplexPhraseQueryParser performance question

2020-02-03 Thread baris . kazar

Hi,- i hope everyone is doing great. I saw this issue with this class such that if you search for "term1*" it is good, (i.e., 4 millisecs when it has >= 5 chars and it is ~250 millisecs when it is 2 chars) but when you search for "term1 term2*" where when term2 is a single char, the perfo

Re: performance question - number of documents

2011-10-27 Thread Felipe Hummel

Thanks again. > > > > > - Original Message - > From: Erick Erickson > To: java-user@lucene.apache.org; sol myr > Cc: > Sent: Sunday, October 23, 2011 7:18 PM > Subject: Re: performance question - number of documents > > "Why would it matter...top 5 mat

Re: performance question - number of documents

2011-10-24 Thread sol myr

Thanks again. - Original Message - From: Erick Erickson To: java-user@lucene.apache.org; sol myr Cc: Sent: Sunday, October 23, 2011 7:18 PM Subject: Re: performance question - number of documents "Why would it matter...top 5 matches" Because Lucene has to calculate the

Re: performance question - number of documents

2011-10-23 Thread Antony Sequeira

This may not be directly relevant to Lucene, but I wanted to learn: How does a web search engine do something like this. Do they also "score every matching document on every query" OR do they pick a subset first based on some static/offlline ranking criteria then do what Lucene does OR do they sea

Re: performance question - number of documents

2011-10-23 Thread Erick Erickson

"Why would it matter...top 5 matches" Because Lucene has to calculate the score of all documents in order to insure that it returns those 5 documents. What if the very last document scored was the most relevant? Best Erick On Sun, Oct 23, 2011 at 3:06 PM, sol myr wrote: > Hi, > > We've noticed s

performance question - number of documents

2011-10-23 Thread sol myr

Hi, We've noticed some Lucene performance phenomenon, and would appreciate an explanation from anyone familiar with Lucene internals (I know Lucene as a user, but haven't looked under its hood). We have a Lucene index of about 30 million records. We ran 2 queries: "AND" and "OR" ("+john +doe" v

Re: Performance question

2011-07-14 Thread Mihai Caraman

Thank you for the reply, if you need more info to understand the question, I'll try to be as prompt as possible. > -if i search on last week's index and the individual index (this needs to be > opened at search request!?) will it be faster than using a single huge index > for all groups, for all w

Re: Performance question

2011-07-14 Thread Ian Lea

Searching billions of anything is likely to be challenging. Mark Miller's document at http://www.lucidimagination.com/content/scaling-lucene-and-solr looks well worth a read. > -if i search on last week's index and the individual index (this needs to be > opened at search request!?) will it be fas

Performance question

2011-07-13 Thread Mihai Caraman

Hello, My name is Mihai and I'm trying to write a java (later I'll need to port it to pylucene) search on billions of mentions like twitter statuses. Mentions are grouped by some containing keywords. I'm thinking of partitioning the index for faster results as follows:

Re: Lucene implementation/performance question

2008-11-27 Thread Greg Shackles

The queries I'm doing really aren't anything clever...just searching for phrases on pages of text, sometimes narrowing results by other words that must appear on the page, or words that cannot appear on the same page. I don't have experience with those span queries so i can't say much about them.

Re: Lucene implementation/performance question

2008-11-27 Thread Eran Sevi

Hi Greg, Thanks for quick and detailed answer. What kind of queries do you run? Is it going to work for SpanNearQueries/SpanNotQueries as well? Do you also get the word itself at each position? It would be great if I could search on the content of each payload as well, but since the payload cont

Re: Lucene implementation/performance question

2008-11-26 Thread Greg Shackles

Sure, I'm happy to give some insight into this. My index itself has a few fields - one that uniquely identifies the page, one that stores all the text on the page, and then some others to store characteristics. At indexing time, the text field for each document is manually created by concatenatin

Re: Lucene implementation/performance question

2008-11-26 Thread Eran Sevi

Hi, Can you please shed some light on how your final architecture looks like? Do you manually use the PayloadSpanUtil for each document separately? How did you solve the problem with phrase results? Thanks in advance for your time, Eran. On Tue, Nov 25, 2008 at 10:30 PM, Greg Shackles <[EMAIL PROTE

Re: Lucene implementation/performance question

2008-11-25 Thread Greg Shackles

Just wanted to post a little follow-up here now that I've gotten through implementing the system using payloads. Execution times are phenomenal! Things that took over a minute to run in my old system take fractions of a second to run now. I would also like to thank Mark for being very responsive

Re: Lucene implementation/performance question

2008-11-20 Thread Greg Shackles

Thanks for the update, Mark. I guess that means I'll have to do the sorting myself - that shouldn't be too hard, but the annoying part would just be knowing where one result ends and the next begins since there's no guarantee that they'll always be the same. Let me know if you find any information

Re: Lucene implementation/performance question

2008-11-20 Thread Mark Miller

Yeah, discussion came up on order and I believe we punted - its up to you to track order and sort at the moment. I think that was to prevent those that didnt need it from paying the sort cost, but I have to go find that discussion again (maybe its in the issue?) I'll look at the whole idea agai

Re: Lucene implementation/performance question

2008-11-20 Thread Greg Shackles

On Wed, Nov 19, 2008 at 12:33 PM, Greg Shackles <[EMAIL PROTECTED]> wrote: > In the searching phase, I would run the search across all page documents, > and then for each of those pages, do a search with > PayloadSpanUtil.getPayloadsForQuery that made it so it only got payloads for > each page at

Re: Lucene implementation/performance question

2008-11-19 Thread Greg Shackles

I have a couple quick questions...it might just be because I haven't looked at this in a week now (got pulled away onto some other stuff that had to take priority). In the searching phase, I would run the search across all page documents, and then for each of those pages, do a search with PayloadS

Re: Lucene implementation/performance question

2008-11-13 Thread Eran Sevi

Hi, I have the same need - to obtain "attributes" for terms stored in some field. I also need all the results and can't take just the first few docs. I'm using an older version of lucene and the method i'm using right now is this: 1. Store the words as usual in some field. 2. Store the attributeso

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles

> > Right, sounds like you have it spot on. That second * from 3 looks like a > possible tricky part. I agree that it will be the tricky part but I think as long as I'm careful with counting as I iterate through it should be ok (I probably just doomed myself by saying that...) Right...you'd do i

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller

Greg Shackles wrote: Thanks! This all actually sounds promising, I just want to make sure I'm thinking about this correctly. Does this make sense? Indexing process: 1) Get list of all words for a page and their attributes, stored in some sort of data structure 2) Concatenate the text from tho

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles

Thanks! This all actually sounds promising, I just want to make sure I'm thinking about this correctly. Does this make sense? Indexing process: 1) Get list of all words for a page and their attributes, stored in some sort of data structure 2) Concatenate the text from those words (space separat

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller

Here is a great power point on payloads from Michael Busch: www.us.apachecon.com/us2007/downloads/AdvancedIndexing*Lucene*.ppt. Essentially, you can store metadata at each term position, so its an excellent place to store attributes of the term - they are very fast to load, efficient, etc. Yo

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles

Hey Mark, This sounds very interesting. Is there any documentation or examples I could see? I did a quick search but didn't really find much. It might just be that I don't know how payloads work in Lucene, but I'm not sure how I would see this actually doing what I need. My reasoning is this..

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller

If your new to Lucene, this might be a little much (and maybe I am not fully understand the problem), but you might try: Add the attributes to the words in a payload with a PayloadAnalyzer. Do searching as normal. Use the new PayloadSpanUtil class to get the payloads for the matching words. (T

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles

Hi Erick, Thanks for the response, sorry that I was somewhat vague in the reasoning for my implementation in the first post. I should have mentioned that the word details are not details of the Lucene document, but are attributes about the word that I am storing. Some examples are position on th

Re: Lucene implementation/performance question

2008-11-12 Thread Erick Erickson

If I may suggest, could you expand upon what you're trying to accomplish? Why do you care about the detailed information about each word? The reason I'm suggesting this is "the XY problem". That is, people often ask for details about a specific approach when what they really need is a different app

Lucene implementation/performance question

2008-11-12 Thread Greg Shackles

I hope this isn't a dumb question or anything, I'm fairly new to Lucene so I've been picking it up as I go pretty much. Without going into too much detail, I need to store pages of text, and for each word on each page, store detailed information about it. To do this, I have 2 indexes: 1) pages:

Re: Search performance question

2007-09-06 Thread Mike Klaas

On 6-Sep-07, at 4:41 AM, makkhar wrote: Hi, I have an index which contains more than 20K documents. Each document has the following structure : field : ID (Index and store) typical value - "1000" field : parameterName(index and store) typical value

Re: Search performance question

2007-09-06 Thread Grant Ingersoll

pen in the order of a few milliseconds irrespective of the number of documents it matched. Am I expecting too much ? -- View this message in context: http://www.nabble.com/Search- performance-question-tf4391551.html#a12520740 Sent from the Lucene - Java

Re: Search performance question

2007-09-06 Thread Mark Miller

Your not expecting too much. On cheap hardware I watch searches on over 5 mil + docs that match every doc come back in under a second. Able to post your search code? makkhar wrote: Hi, I have an index which contains more than 20K documents. Each document has the following structure : fiel

Search performance question

2007-09-06 Thread makkhar

orse. My problem is, I am expecting the search itself to happen in the order of a few milliseconds irrespective of the number of documents it matched. Am I expecting too much ? -- View this message in context: http://www.nabble.com/Search-performance-question-tf4391551.html#a12520740 Sent f

Re: Sort Performance Question

2007-03-20 Thread Peter W .

I have a sort performance question: I have a fairly large index consisting of chunks of full-text transcriptions of television, radio and other media, and I'm trying to make it searchable and sortable by date. ... Initially I was sorting based on a unixtime field, but having read up

Re: Sort Performance Question

2007-03-20 Thread Erik Hatcher

27;t be worried about the very first search against the index. How would a cached searcher implementation look? -Dave -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 20, 2007 4:03 PM To: java-user@lucene.apache.org Subject: Re: Sort Performance Questio

RE: Sort Performance Question

2007-03-20 Thread David Seltzer

sage- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 20, 2007 4:03 PM To: java-user@lucene.apache.org Subject: Re: Sort Performance Question Are you using a cached IndexSearcher such that successive sorts on the same field will be more efficient? Erik On Mar 20, 200

Re: Sort Performance Question

2007-03-20 Thread Erik Hatcher

Are you using a cached IndexSearcher such that successive sorts on the same field will be more efficient? Erik On Mar 20, 2007, at 3:39 PM, David Seltzer wrote: Hi All, I have a sort performance question: I have a fairly large index consisting of chunks of full-text

Sort Performance Question

2007-03-20 Thread David Seltzer

Hi All, I have a sort performance question: I have a fairly large index consisting of chunks of full-text transcriptions of television, radio and other media, and I'm trying to make it searchable and sortable by date. The search front-end uses a parallelmultisearcher to search up to

Re: Text storing design and performance question

2007-01-11 Thread Chris Hostetter

In general, if you are having performance issues with highlighting, the first thing to do is double check what the bottleneck is: is it accessing the text to by highlighted, or is it running the highlighter? you suggested earlier in the thread that the problem was with accessing the text... : >>

Re: Text storing design and performance question

2007-01-11 Thread Jason Pump

e match? -Original Message- From: Jason Pump [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 10, 2007 1:49 PM To: java-user@lucene.apache.org Subject: Re: Text storing design and performance question Renaud, one optimization you can do on this is to try the first 10kb, see if it

RE: Text storing design and performance question

2007-01-11 Thread Renaud Waldura

0, 2007 1:49 PM To: java-user@lucene.apache.org Subject: Re: Text storing design and performance question Renaud, one optimization you can do on this is to try the first 10kb, see if it finds text worth highlighting, if not, with a slight overlap try the next 9.9kb - 19.9kb or just 9.9kb -> end if

Re: Text storing design and performance question

2007-01-10 Thread Jason Pump

dex lean and allow for fast highlighting? --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 10, 2007 9:54 AM To: java-user@lucene.apache.org Subject: Re: Text storing design and performance question Being stateless should not be much of

RE: Text storing design and performance question

2007-01-10 Thread Renaud Waldura

: java-user@lucene.apache.org Subject: RE: Text storing design and performance question Maybe keeping the data in the DB would make it quicker? Seems like the I/O performance would cause most of the performance issues you're seeing. -los Renaud Waldura-5 wrote: > > We used to store

RE: Text storing design and performance question

2007-01-10 Thread moraleslos

the index lean and > allow for fast highlighting? > > --Renaud > > > > -Original Message- > From: Mark Miller [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 10, 2007 9:54 AM > To: java-user@lucene.apache.org > Subject: Re: Text storing design and performanc

Re: Text storing design and performance question

2007-01-10 Thread moraleslos

ent that is searched. Now I'll >> >>>> have lots >> >>>> and lots of content, thinking of the range of 50GB+, all stored in >> >>>> the DB. >> >>>> Using Lucene, I index all of this

RE: Text storing design and performance question

2007-01-10 Thread Renaud Waldura

just storing term vectors would keep the index lean and allow for fast highlighting? --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 10, 2007 9:54 AM To: java-user@lucene.apache.org Subject: Re: Text storing design and performance que

Re: Text storing design and performance question

2007-01-10 Thread Mark Miller

all of this. But since I'm using highlighting >>>> features, I'll also need to store the content into the index. Not >>>> sure what >>>> the performance implications are during a search but I know that >>>> indexing >>>> performance s

Re: Text storing design and performance question

2007-01-10 Thread moraleslos

index. Not >>>> sure what >>>> the performance implications are during a search but I know that >>>> indexing >>>> performance should be slower as well as the index size being enormous. >>>&

Re: Text storing design and performance question

2007-01-10 Thread Mark Miller

e in context: http://www.nabble.com/Text-storing- design-and-performance-question-tf2953201.html#a8259883 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit

Re: Text storing design and performance question

2007-01-10 Thread moraleslos

index size being enormous. >> Because I have duplicated data, one in the index and the other in >> the db, >> are there other ways of handling this situation in a more efficient >> and >> performant way? Thanks in advance. >> >> -los >&

Re: Text storing design and performance question

2007-01-10 Thread Erik Hatcher

data, one in the index and the other in the db, are there other ways of handling this situation in a more efficient and performant way? Thanks in advance. -los -- View this message in context: http://www.nabble.com/Text-storing- design-and-performance-question-tf2953201.html#a8259883 Sen

Text storing design and performance question

2007-01-10 Thread moraleslos

size being enormous. Because I have duplicated data, one in the index and the other in the db, are there other ways of handling this situation in a more efficient and performant way? Thanks in advance. -los -- View this message in context: http://www.nabble.com/Text-storing-design-and-performanc

RE: Performance question

2006-07-21 Thread Scott Smith

t: Re: Performance question > Does it matter what order I add the sub-queries to the BooleanQuery Q. > That is, is the execution speed for the search faster (slower) if I do: > Q.add(Q1, BooleanClause.Occur.MUST); > Q.add(Q2, BooleanClause.Occur.MUST); >

Re: Performance question

2006-07-20 Thread Doron Cohen

> Does it matter what order I add the sub-queries to the BooleanQuery Q. > That is, is the execution speed for the search faster (slower) if I do: > Q.add(Q1, BooleanClause.Occur.MUST); > Q.add(Q2, BooleanClause.Occur.MUST); > Q.add(Q3, BooleanClause.Occur.MUST);

Performance question

2006-07-20 Thread Scott Smith

I was reading a book on SQL query tuning. The gist of it was that the way to get the best performance (fastest execution) out of a SQL select statement was to "create" execution plans where the most selective term in the "where" clause is used first, the next most selective term is used next, etc.

Tuning Indexing performance question ..

2006-04-10 Thread Mufaddal Khumri

Hi, I am using a multi threaded app to index a bunch of Data. The app spawns X number of threads. Each thread writes to a RAMDirectory. When thread finishes it work, the contents from the RAMDirectory are written into the FSDirectory. All threads are passed an instance of the FSWriter when th

Re: Lucene performance question

2006-03-09 Thread DanielFeinstein

I'm using the following java options: JAVA_OPTS='-Xmx1524m -Xms1524m -Djava.awt.headless=true' --- Grant Ingersoll <[EMAIL PROTECTED]> wrote: > What is your Java max heap size set to? This is the > -Xmx Java option. > > Daniel Feinstein wrote: > > Hi, > > > > My lucene index is not big (about

Re: Lucene performance question

2006-03-09 Thread Grant Ingersoll

What is your Java max heap size set to? This is the -Xmx Java option. Daniel Feinstein wrote: Hi, My lucene index is not big (about 150M). My computer has 2G RAM but for some reason when I'm trying to store my index using org.apache.lucene.store.RAMDirectory it fails with java out of memory

Lucene performance question

2006-03-09 Thread Daniel Feinstein

Hi, My lucene index is not big (about 150M). My computer has 2G RAM but for some reason when I'm trying to store my index using org.apache.lucene.store.RAMDirectory it fails with java out of memory exception. Also sometimes for the same search query time spent on search could raise in 10-20 tim

RE: Performance Question

2005-11-14 Thread Mike Streeton

index? Thanks Mike -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: 12 November 2005 01:39 To: java-user@lucene.apache.org Subject: Re: Performance Question Look at IndexReader.open() It actually uses a MultiReader if there are multiple segments. -Yonik Now hiring

Re: Performance Question

2005-11-11 Thread Yonik Seeley

Look at IndexReader.open() It actually uses a MultiReader if there are multiple segments. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/11/05, Charles Lloyd <[EMAIL PROTECTED]> wrote: > You should run your own tests, but I found the MultiReader to be slower > than a regular IndexR

Re: Performance Question

2005-11-11 Thread Charles Lloyd

You should run your own tests, but I found the MultiReader to be slower than a regular IndexReader. I was running on a dual-cpu box and two separate disk drives. Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit

Re: Performance Question

2005-11-11 Thread Yonik Seeley

The IndexSearcher(MultiReader) will be faster (it's what's used for indicies with multiple segments too). -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/11/05, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have several indexes I want to search together. What performs better a > sing

Performance Question

2005-11-11 Thread Mike Streeton

I have several indexes I want to search together. What performs better a single searcher on a multi reader or a single multi searcher on multiple searchers (1 per index). Thanks Mike

Performance Question

2005-04-01 Thread Omar Didi

I have 5 indexes, each one is 6GB...I need 512MB of Heap size in order to open the index and have all type of queries. My question is, is it better to just have on large Index 30GB? will increasing the Heap size increase performance? can I store an instance of MultiSearcher(OR just Searcher in c

74 matches

Mail list logo