On Mon, Apr 23, 2012 at 10:18 AM, Pedro <pedro...@gmail.com> wrote:
> Ok, thank you all  for the input, however I'm still missing an important 
> detail.
> So I build a suffix tree, but how exactly do I refer to the target documents?
> Should I tie a reference to each document in which the string occurs
> to each node? I can't think of other way to do it.


One way to do this is keep a reference to the node at the end of the string.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lang racket
(require (planet dyoo/suffixtree))

;; String reference
(struct sref (str))

;; string->label*: string -> label
;; Creates a label out of a string, but with a reference
;; to the string at the end.
(define (string->label* s)
  (vector->label
   (list->vector (append (string->list s) (list (sref s))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


Once we have these special characters at the end, we have a way to
cull  them when we search through a tree.  The leaves of our tree will
contain those special sref values, and we can then use them to recover
the matched strings:


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; leaves: node -> (listof node)
;; Get the leaves of the tree rooted at node.
(define (leaves a-node)
  (let loop ([a-node a-node]
             [acc '()])
    (define children (node-children a-node))
    (cond [(eq? children '())
           (cons a-node acc)]
          [else
           (foldl loop acc children)])))


;; leaf-str: node -> string
;; Given a leaf node, get back the string stored in the
;; terminating sref element.
(define (leaf-str a-leaf)
  (define a-label (node-up-label a-leaf))
  (sref-str (label-ref a-label (sub1 (label-length a-label)))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;




When we walk the tree, if we find a matching node, we then can walk
the children to the leaves.


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; find-matches: tree string -> (listof string)
(define (find-matches a-tree str)
  (define (on-success node up-label-offset)
    (map leaf-str (leaves node)))

  (define (on-fail node up-label-offset input-label-offset)
    '())

  (tree-walk a-tree
             (string->label str)
             on-success
             on-fail))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;



For example:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(define a-tree (make-tree))
(tree-add! a-tree (string->label* "peter"))
(tree-add! a-tree (string->label* "piper"))
(tree-add! a-tree (string->label* "pepper"))

(find-matches a-tree "er")
(find-matches a-tree "per")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


Note that find-matches isn't checking for duplicates, so if we search
for the single string "e", we'll see several matches because there are
several suffixes of the keywords that contain "e".

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to