Sweet, thanks Simon. I'll have a go at getting some failing tests passing to begin with.
On 23 May 2012, at 11:59, Simon Willnauer wrote: > alan, > > I merged the branch manually and created a new branch from it. its > here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 > the branch compiles but lots of nocommits / todos > > if you have questions please ask I will help as much as I can > > simon > > On Tue, May 22, 2012 at 8:38 PM, Alan Woodward > <alan.woodw...@romseysoftware.co.uk> wrote: >> Hey, I reckon I can have a decent go at getting the branch updated. Is it >> best to work this out as a patch applying to trunk? Any patch that merges >> in all the trunk changes to the branch is going to be absolutely massive⦠>> >> On 17 May 2012, at 13:15, Simon Willnauer wrote: >> >>> ok man. I will try to merge up the branch. I tell you this is going to >>> be messy and it might not compile but I will make it reasonable so you >>> can start. >>> >>> simon >>> >>> On Thu, May 17, 2012 at 8:03 AM, Alan Woodward >>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>> Sorry for vanishing for so long, life unexpectedly caught up with me... >>>> I'm going to have some time to look at this again next week though, if >>>> you're interested in picking it up again. >>>> >>>> On 21 Mar 2012, at 09:02, Alan Woodward wrote: >>>> >>>>> That would be great, thanks! I had a go at merging it last night, but >>>>> there are a *lot* of changes that I haven't got my head round yet, so it >>>>> was getting pretty messy. >>>>> >>>>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote: >>>>> >>>>>> Alan, if you want I can just merge the branch up next week and we >>>>>> iterate from there? >>>>>> >>>>>> simon >>>>>> >>>>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson >>>>>> <erickerick...@gmail.com> wrote: >>>>>>> Yep, the first challenge is always getting the old patch(es) to >>>>>>> apply..... >>>>>>> >>>>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward >>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>> Thanks for all the offers of help! It looks as though most of the >>>>>>>> hard work has already been done, which is exactly where I like to pick >>>>>>>> up projects. :-) >>>>>>>> >>>>>>>> Maybe the best place to start would be for me to rebase the branch >>>>>>>> against trunk, and see what still fits? I think there have been some >>>>>>>> fairly major changes in the internals since July last year. >>>>>>>> >>>>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >>>>>>>> >>>>>>>>> I posted a patch with a Collector somewhat similar to what you >>>>>>>>> described, Alan - it's attached to one of the sub-issues >>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a >>>>>>>>> fairly complete "alpha" state, but has seen no production use of >>>>>>>>> course, since it relies on the remainder of the unfinished work in >>>>>>>>> that branch. It works by creating a TokenStream based on match >>>>>>>>> positions returned from the query and passing that to the existing >>>>>>>>> Highlighter. Please feel free to get in touch if you decide to look >>>>>>>>> into that and have questions. >>>>>>>>> >>>>>>>>> >>>>>>>>> -Mike >>>>>>>>> >>>>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Have you marked that for GSOC? Would be a good idea! >>>>>>>>>>> >>>>>>>>>> yes I did >>>>>>>>>> >>>>>>>>>>> ----- >>>>>>>>>>> Uwe Schindler >>>>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>>>>>>> http://www.thetaphi.de >>>>>>>>>>> eMail: u...@thetaphi.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>>>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>>>>>>>> To: dev@lucene.apache.org >>>>>>>>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>>>>>>>> >>>>>>>>>>>> Alan, you made my day! >>>>>>>>>>>> >>>>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can >>>>>>>>>>>> certainly help >>>>>>>>>>>> to get it up to speed. The feature in that branch is quite a big >>>>>>>>>>>> one and its in a >>>>>>>>>>>> very early stage. Still I want to encourage you to take a look and >>>>>>>>>>>> work on it. I >>>>>>>>>>>> promise all my help with the issues! >>>>>>>>>>>> >>>>>>>>>>>> let me know if you have questions! >>>>>>>>>>>> >>>>>>>>>>>> simon >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>>>>>>>> >>>>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The project I'm currently working on requires the reporting of >>>>>>>>>>>>>>> exact >>>>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which >>>>>>>>>>>>>>> are >>>>>>>>>>>>>>> covered by the existing highlighter modules. I'm working round >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>> by translating everything into SpanQueries, and using the >>>>>>>>>>>>>>> getSpans() >>>>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>>>>>>>>> term offsets available - see >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>>>>>>>>> applicable to >>>>>>>>>>>>>>> >>>>>>>>>>>> non-Span queries. >>>>>>>>>>>> >>>>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a >>>>>>>>>>>>>>> couple >>>>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>>>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>>>>>>>>> essentially run the query again, with a filter on just the >>>>>>>>>>>>>>> documents >>>>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the >>>>>>>>>>>>>>> term >>>>>>>>>>>>>>> >>>>>>>>>>>> offsets in place of the scores. >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>>>>>>>> lagging behind trunk a bit. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>>>>>>>>> support offsets in the postings lists. >>>>>>>>>>>>>> We've since added this and several codecs support it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> lucidimagination.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>>>>> additional >>>>>>>>>>>> commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org