On Mar 6, 9:15 pm, Jeff FW <jeff...@gmail.com> wrote:
> Clearly, you get to work on cooler projects than I :-)  I had thought
> of the keywords/phrases case, but the other ones are far more
> interesting.  Thanks for the explanation!
>
> -Jeff
>
> On Mar 5, 7:02 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
> wrote:
>
> > On Thu, 2009-03-05 at 12:54 -0800, Jeff FW wrote:
> > > Well, then, that is quite a strange use case :-)  Nevermind my simple
> > > methods.  Malcom's suggestion of an extension for postgres seems like
> > > a good idea--writing functions in various languages (like Python!) is
> > > _really_ easy in postgres.
>
> > > Just out of curiosity (for either of you,) what is a search like that
> > > used for?  I've had a lot of strange requests from a lot of (generally
> > > strange) clients, but that's a pretty weird one.
>
> > It's not that weird at all. It simply depends on the domains you're
> > working in. No idea how it might apply to article headlines, although
> > finding "related matches" could well use something like this.
>
> > It's very common for finding overlaps in sequences of strings, though.
> > The almost "standard" example is DNA sequences where you're trying to
> > find if one sequenced set of data (bases extracted from a genetic
> > sample) correspond to anything else already in the database. Since there
> > can be damage at the extremeties of extractions, or even in the middle
> > (or mutations), finding the longest common substring is the standard
> > approach. There's a whole related area of reasearch in finding the
> > longest palindrome sequences, too, for similar matching and folding
> > purposes.
>
> > Plagarism or even "similar article" testing is another case like this.
> > Finding all "reasonably long" common sequences between a set of source
> > documents and a candidate document is a start.
>
> > One case I built something for was a compressed storage and
> > transmissiong system for PDF and ODF documents. That required doing,
> > essentially, a context-aware diff'ing process and pulling out any large
> > chunks of commonality was the first step.
>
> > Finally, not quite the same problem, but highly related, is the issue
> > of, say, quickly finding all tags or other keywords or phrases that
> > appear in a collection documents. Sometimes partial matching is an
> > appropriate place for generating new phrases, so a modified Aho-Corasick
> > search (just to give you a term to search on if you care) is a starting
> > point.
>
> > This whole domain is a very interesting area for algorithms and
> > implementation.
>
> > Regards,
> > Malcolm

I used the headline example since icontains in Django documentation
was showing that example.
Mine is a case where string search from document text matches specific
values in stored data.
The document in my case, even though text, contained directories,
basic text, and many other similar cases.
The string stored might be a path of the directory - or even a single
piece of text.
So, mine is a very usual case, not at all close to exotic as Malcolm's.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to