I'm giving a talk that may be interesting to you:
http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/events/180573212/
Please share with others that may be interested.
Sriram.
nce from a technical point of view between the to ways of
> matching items.
>
> I'll leave more detailed explanations to others, as I might make too
> many mistakes or just assume I know something I actually don't :)
>
> Best regards,
>
> Arjen
>
> On 25-7-2013
Term("name", name));
FilteredQuery query = new FilteredQuery(tq, conns);
On Thu, Jul 25, 2013 at 12:14 AM, Arjen van der Meijden <
acmmail...@tweakers.net> wrote:
> On 24-7-2013 21:58 Sriram Sankar wrote:
>
>> On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky > >
his more).
Sriram.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Sriram Sankar
> Sent: Wednesday, July 24, 2013 1:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: Performance measurements
>
>
> No I do not need scoring. This is a pur
n>500.
Should I assume that Lucene is that much worse - or is it that this use
case has not been optimized?
Sriram.
On Wed, Jul 24, 2013 at 9:59 AM, Adrien Grand wrote:
> Hi,
>
> On Wed, Jul 24, 2013 at 6:11 PM, Sriram Sankar wrote:
> > termA AND (termB1 OR termB2 OR ... OR
Clarification - I used an MMap'd index and warmed it up with similar
queries, as well as running the identical query many times before starting
measurements. I had ample heap space.
Sriram.
On Wed, Jul 24, 2013 at 9:11 AM, Sriram Sankar wrote:
> I did some performance tests on a re
I did some performance tests on a real index using a query having the
following pattern:
termA AND (termB1 OR termB2 OR ... OR termBn)
The results were not good and I was wondering if I may be doing something
wrong (and what I would need to do to improve performance), or is it just
that the OR is
The approach we have discussed in an earlier thread uses:
writer.addIndexes(new SortingAtomicReader(...));
I want to confirm (this is not absolutely clear to me yet) that the above
call will not create multiple segments - i.e., the output will be optimized.
We are also trying another approach -
The large majority of terms in my index are not text terms. For example, I
have connection terms. Suppose user 543 and user 664 are connected. Then
the doc corresponding to user 543 will have a term connection:664 indexed.
It is not useful to do prefix matching on this - and ideally I'd not wan
Thanks!
On Tue, Jul 9, 2013 at 2:13 PM, Adrien Grand wrote:
> Hi Sriram,
>
> On Tue, Jul 9, 2013 at 5:06 AM, Sriram Sankar wrote:
> > I've finally got something running and will send you some performance
> > numbers as promised shortly. In the meanwhile, I've a
Thanks!
On Tue, Jul 9, 2013 at 2:34 PM, Uwe Schindler wrote:
> Hi,
>
> You can replace the term by their hash directly in the analyzer chain.
> Just write a custom TermToBytesRef attribute that hashes the term to a
> constant-length byte[] (using a AttributeFactory)! :-) This would give you
> a
Hi Mike,
I've finally got something running and will send you some performance
numbers as promised shortly. In the meanwhile, I've a question regarding
the use of real time indexing along with ordering by static rank. Before
each search, I do the reopen as follows:
public void refresh() thr
It looks like Lucene stores the string names of the posting lists in the
index. How compact is this storage (when there may be a very large number
of posting lists, and the string lengths may be large - for example, I may
have an entry that looks like "Cn:4593846->8957363,485". I've seen other
sy
Thanks!
Sriram.
On Tue, Jun 25, 2013 at 10:01 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Tue, Jun 25, 2013 at 12:49 PM, Sriram Sankar wrote:
> > I have a use case where I build my index only occasionally and am willing
> > to pay the cost to build a
I have a use case where I build my index only occasionally and am willing
to pay the cost to build a read-only index that occupies as small a memory
footprint as possible and also remains efficient for posting list
traversal. I.e., I will not be making any changes at all once it is built.
1. Wha
In Unicorn (Facebook's search backend), we used mmap'd indices. We could
load them on a separate process - which meant that we could make scoring
changes and test rapidly since we did not have to reload the index for
every run. Is this true for Lucene also? I'm assuming it would be if the
entire
Thanks. If I end up doing it, we can try to get it in.
Sriram.
On Wed, Jun 19, 2013 at 1:10 AM, Adrien Grand wrote:
> Hi,
>
> On Wed, Jun 19, 2013 at 12:16 AM, Sriram Sankar wrote:
> > Is it possible to do this more efficiently using a merge sort? Assuming
> > the in
segments are
already sorted and therefore would repeat the work?
Thanks,
Sriram.
On Sat, Jun 15, 2013 at 1:52 AM, Adrien Grand wrote:
> Hi,
>
> On Fri, Jun 14, 2013 at 11:24 PM, Sriram Sankar wrote:
> > For my use case of having all docs sorted by a static rank and being ab
e:
> Hi,
>
> On Fri, Jun 14, 2013 at 11:24 PM, Sriram Sankar wrote:
> > For my use case of having all docs sorted by a static rank and being able
> > to cut off retrieval after a certain number of docs, I have to sort all
> my
> > docs using the static rank (and Lu
Quick question on segments:
For my use case of having all docs sorted by a static rank and being able
to cut off retrieval after a certain number of docs, I have to sort all my
docs using the static rank (and Lucene 4 has a way to do this).
When an index has multiple segments, how does this sorti
Thank you very much. I think I need to play a bit with the code before
asking more questions. Here is the context for my questions:
I was at Facebook until recently and worked extensively on the Unicorn
search backend. Unicorn allows documents to be ordered by a static rank in
the posting lists
()) {
> int docId = termDocs.doc();
> // work with the document...
> }
> On Jun 13, 2013, at 1:56 PM, Sriram Sankar wrote:
>
> > Can someone point me to the code that traverses the
Can someone point me to the code that traverses the posting lists? I
trying to understand how it works.
Thanks,
Sriram
23 matches
Mail list logo