Search at LinkedIn tech talk on Wednesday

2014-05-18 Thread Sriram Sankar
I'm giving a talk that may be interesting to you: http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/events/180573212/ Please share with others that may be interested. Sriram.

Re: Performance measurements

2013-08-20 Thread Sriram Sankar
nce from a technical point of view between the to ways of > matching items. > > I'll leave more detailed explanations to others, as I might make too > many mistakes or just assume I know something I actually don't :) > > Best regards, > > Arjen > > On 25-7-2013

Re: Performance measurements

2013-07-25 Thread Sriram Sankar
Term("name", name)); FilteredQuery query = new FilteredQuery(tq, conns); On Thu, Jul 25, 2013 at 12:14 AM, Arjen van der Meijden < acmmail...@tweakers.net> wrote: > On 24-7-2013 21:58 Sriram Sankar wrote: > >> On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky > >

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
his more). Sriram. > > > -- Jack Krupansky > > -Original Message- From: Sriram Sankar > Sent: Wednesday, July 24, 2013 1:03 PM > To: java-user@lucene.apache.org > Subject: Re: Performance measurements > > > No I do not need scoring. This is a pur

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
n>500. Should I assume that Lucene is that much worse - or is it that this use case has not been optimized? Sriram. On Wed, Jul 24, 2013 at 9:59 AM, Adrien Grand wrote: > Hi, > > On Wed, Jul 24, 2013 at 6:11 PM, Sriram Sankar wrote: > > termA AND (termB1 OR termB2 OR ... OR

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
Clarification - I used an MMap'd index and warmed it up with similar queries, as well as running the identical query many times before starting measurements. I had ample heap space. Sriram. On Wed, Jul 24, 2013 at 9:11 AM, Sriram Sankar wrote: > I did some performance tests on a re

Performance measurements

2013-07-24 Thread Sriram Sankar
I did some performance tests on a real index using a query having the following pattern: termA AND (termB1 OR termB2 OR ... OR termBn) The results were not good and I was wondering if I may be doing something wrong (and what I would need to do to improve performance), or is it just that the OR is

Another question on sorting documents

2013-07-17 Thread Sriram Sankar
The approach we have discussed in an earlier thread uses: writer.addIndexes(new SortingAtomicReader(...)); I want to confirm (this is not absolutely clear to me yet) that the above call will not create multiple segments - i.e., the output will be optimized. We are also trying another approach -

Re: posting list strings

2013-07-14 Thread Sriram Sankar
The large majority of terms in my index are not text terms. For example, I have connection terms. Suppose user 543 and user 664 are connected. Then the doc corresponding to user 543 will have a term connection:664 indexed. It is not useful to do prefix matching on this - and ideally I'd not wan

Re: NRT + static rank based sorting

2013-07-12 Thread Sriram Sankar
Thanks! On Tue, Jul 9, 2013 at 2:13 PM, Adrien Grand wrote: > Hi Sriram, > > On Tue, Jul 9, 2013 at 5:06 AM, Sriram Sankar wrote: > > I've finally got something running and will send you some performance > > numbers as promised shortly. In the meanwhile, I've a

Re: posting list strings

2013-07-12 Thread Sriram Sankar
Thanks! On Tue, Jul 9, 2013 at 2:34 PM, Uwe Schindler wrote: > Hi, > > You can replace the term by their hash directly in the analyzer chain. > Just write a custom TermToBytesRef attribute that hashes the term to a > constant-length byte[] (using a AttributeFactory)! :-) This would give you > a

NRT + static rank based sorting

2013-07-08 Thread Sriram Sankar
Hi Mike, I've finally got something running and will send you some performance numbers as promised shortly. In the meanwhile, I've a question regarding the use of real time indexing along with ordering by static rank. Before each search, I do the reopen as follows: public void refresh() thr

posting list strings

2013-07-05 Thread Sriram Sankar
It looks like Lucene stores the string names of the posting lists in the index. How compact is this storage (when there may be a very large number of posting lists, and the string lengths may be large - for example, I may have an entry that looks like "Cn:4593846->8957363,485". I've seen other sy

Re: Strategy for optimal read-only index

2013-06-25 Thread Sriram Sankar
Thanks! Sriram. On Tue, Jun 25, 2013 at 10:01 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Jun 25, 2013 at 12:49 PM, Sriram Sankar wrote: > > I have a use case where I build my index only occasionally and am willing > > to pay the cost to build a

Strategy for optimal read-only index

2013-06-25 Thread Sriram Sankar
I have a use case where I build my index only occasionally and am willing to pay the cost to build a read-only index that occupies as small a memory footprint as possible and also remains efficient for posting list traversal. I.e., I will not be making any changes at all once it is built. 1. Wha

Question on MMap'd indices

2013-06-21 Thread Sriram Sankar
In Unicorn (Facebook's search backend), we used mmap'd indices. We could load them on a separate process - which meant that we could make scoring changes and test rapidly since we did not have to reload the index for every run. Is this true for Lucene also? I'm assuming it would be if the entire

Re: segments and sorting

2013-06-20 Thread Sriram Sankar
Thanks. If I end up doing it, we can try to get it in. Sriram. On Wed, Jun 19, 2013 at 1:10 AM, Adrien Grand wrote: > Hi, > > On Wed, Jun 19, 2013 at 12:16 AM, Sriram Sankar wrote: > > Is it possible to do this more efficiently using a merge sort? Assuming > > the in

Re: segments and sorting

2013-06-18 Thread Sriram Sankar
segments are already sorted and therefore would repeat the work? Thanks, Sriram. On Sat, Jun 15, 2013 at 1:52 AM, Adrien Grand wrote: > Hi, > > On Fri, Jun 14, 2013 at 11:24 PM, Sriram Sankar wrote: > > For my use case of having all docs sorted by a static rank and being ab

Re: segments and sorting

2013-06-17 Thread Sriram Sankar
e: > Hi, > > On Fri, Jun 14, 2013 at 11:24 PM, Sriram Sankar wrote: > > For my use case of having all docs sorted by a static rank and being able > > to cut off retrieval after a certain number of docs, I have to sort all > my > > docs using the static rank (and Lu

segments and sorting

2013-06-14 Thread Sriram Sankar
Quick question on segments: For my use case of having all docs sorted by a static rank and being able to cut off retrieval after a certain number of docs, I have to sort all my docs using the static rank (and Lucene 4 has a way to do this). When an index has multiple segments, how does this sorti

Re: posting list traversal code

2013-06-13 Thread Sriram Sankar
Thank you very much. I think I need to play a bit with the code before asking more questions. Here is the context for my questions: I was at Facebook until recently and worked extensively on the Unicorn search backend. Unicorn allows documents to be ordered by a static rank in the posting lists

Re: posting list traversal code

2013-06-12 Thread Sriram Sankar
()) { > int docId = termDocs.doc(); > // work with the document... > } > On Jun 13, 2013, at 1:56 PM, Sriram Sankar wrote: > > > Can someone point me to the code that traverses the

posting list traversal code

2013-06-12 Thread Sriram Sankar
Can someone point me to the code that traverses the posting lists? I trying to understand how it works. Thanks, Sriram