On Mon, Apr 23, 2012 at 10:18 AM, Pedro <pedro...@gmail.com> wrote: > Ok, thank you all for the input, however I'm still missing an important > detail. > So I build a suffix tree, but how exactly do I refer to the target documents? > Should I tie a reference to each document in which the string occurs > to each node? I can't think of other way to do it.
One way to do this is keep a reference to the node at the end of the string. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; #lang racket (require (planet dyoo/suffixtree)) ;; String reference (struct sref (str)) ;; string->label*: string -> label ;; Creates a label out of a string, but with a reference ;; to the string at the end. (define (string->label* s) (vector->label (list->vector (append (string->list s) (list (sref s)))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Once we have these special characters at the end, we have a way to cull them when we search through a tree. The leaves of our tree will contain those special sref values, and we can then use them to recover the matched strings: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; leaves: node -> (listof node) ;; Get the leaves of the tree rooted at node. (define (leaves a-node) (let loop ([a-node a-node] [acc '()]) (define children (node-children a-node)) (cond [(eq? children '()) (cons a-node acc)] [else (foldl loop acc children)]))) ;; leaf-str: node -> string ;; Given a leaf node, get back the string stored in the ;; terminating sref element. (define (leaf-str a-leaf) (define a-label (node-up-label a-leaf)) (sref-str (label-ref a-label (sub1 (label-length a-label))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; When we walk the tree, if we find a matching node, we then can walk the children to the leaves. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; find-matches: tree string -> (listof string) (define (find-matches a-tree str) (define (on-success node up-label-offset) (map leaf-str (leaves node))) (define (on-fail node up-label-offset input-label-offset) '()) (tree-walk a-tree (string->label str) on-success on-fail)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; For example: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (define a-tree (make-tree)) (tree-add! a-tree (string->label* "peter")) (tree-add! a-tree (string->label* "piper")) (tree-add! a-tree (string->label* "pepper")) (find-matches a-tree "er") (find-matches a-tree "per") ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Note that find-matches isn't checking for duplicates, so if we search for the single string "e", we'll see several matches because there are several suffixes of the keywords that contain "e". ____________________ Racket Users list: http://lists.racket-lang.org/users