Héllo,
I pleased to share with you the haunt [1] configuration I use for my
english only blog [2]. It includes an implementation of little markdown
parser powered by a parser combinator. Everything is in the file, so you
should just drop the haunt.scm file somewhere and run it with haunt.
An article must look like what follows:
----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8
title: Do It Yourself: a search engine in Scheme Guile
date: 2015-09-20 13:00
I don't have the prentention to know all of those but with a help of
background
knowledge, [NLP coursera](//class.coursera.org/nlp/lecture)
and [another description of a search
engine](//aakashjapi.com/fuckin-search-engines-how-do-they-work/)
I will try to buiolg *ahem* build and blog about a **search engine**
mocking
the different parts and hopefully make a proper release at some point.
----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8
The little markdown support:
- metadata in the header via key, value string pair separated by a
double dot `metadata-name: metadata-value`
- heading via `#` syntax, `#` means h1, `##` means h2 etc.
- inline styles: *italic, **bold** `inline code`
- [links](http://hypermove.net)
- code blocks through the github syntax e.g.
----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8
``` lang
some code
```
----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8
lang must be provided or the parser fails.
Regarding the parser combinator, it's based on
https://github.com/epsil/gll
without the meoization and continuations stuff. One area that is really
poor is how the output tree is built. And that's the reason I did not
port
little markdown parser to guile-parser-combinators [3], both lake a good
way to build the output tree. guile-log does all this, I did not find
time to study it more closely.
My dirty explanation of parser combinators is that it's composition of
procedure that return parser procedure ie. (some-parser input-stream) so
they are parser functors. A parser can fail or succeed.
Those functor are composed using **control** procedures called
combinators. Those combinator are split into two categories:
- single step combinators e.g. and, or, maybe, not, alt/any
- multiple step combinators e.g. seq/each, many/zero-or-more (star
glob), one-or-more (plus glob)
E.g. (many (parse-string "guile")) will parse as many "guile" token
there is in the input stream and step in the stream as many times so the
next parser will reads things that are not "guile". Whereas (not (and
(parse-string "love") (parse-string "like"))) will parse any token that
is neither "love" or "like" but only consume one token in the stream.
There is actually no magic, on how parser combinators can parse diverse
kind of grammars, it's only because they do try every parser and
backtrack when it fails. It possible to backtrack without using call/cc
because of the use of persistent data structure. This makes me think
about minikanren.
Anyway! Happy hacking!
Amirouiche aka. amz3
[1] http://haunt.dthompson.us/
[2] http://www.hypermove.net
[3] https://git.dthompson.us/guile-parser-combinators.git
;;; Little markdown configuration for Haunt
;;; Copyright © 2015 David Thompson <da...@gnu.org>
;;; Copyright © 2015 Amirouche Boubekki <amirou...@hypermove.net>
;;;
;;; Haunt is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or
;;; (at your option) any later version.
;;;
;;; Haunt is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
;;; General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with Haunt. If not, see <http://www.gnu.org/licenses/>.
(use-modules (haunt site)
(haunt reader)
(haunt asset)
(haunt page)
(haunt post)
(haunt html)
(haunt utils)
(haunt builder blog)
(haunt builder atom)
(haunt builder assets)
(srfi srfi-19)
(srfi srfi-26)
(ice-9 rdelim)
(ice-9 match)
(web uri))
(use-modules (sxml match))
(use-modules (srfi srfi-19))
(define %cc-by-sa-link
'(a (@ (href "https://creativecommons.org/licenses/by-sa/4.0/"))
"Creative Commons Attribution Share-Alike 4.0 International"))
(define haunt-theme
(theme #:name "Haunt"
#:layout
(lambda (site title body)
`((doctype "html")
(head
(meta (@ (charset "utf-8")))
(title ,(string-append title " â " (site-title site)))
(link (@ (rel "stylesheet") (href "/static/normalize.css")))
(link (@ (rel "stylesheet") (href "/static/main.css")))
(link (@ (rel "stylesheet") (href "/static/prism.css"))))
(body
(div (@ (id "background"))
(video (@ (autoplay "") (loop "") (poster
"/static/video/poster.jpeg"))
(source (@ (src "/static/video/space.ogv") (type
"video/ogg")))))
(div (@ (id "container"))
(h1 (a (@ (href "//hypermove.net")) "hypermove"))
,body
(footer (@ (class "text-center"))
(p (small "Copyright © 2015 Amirouche Boubekki"))
(p
(small "The text and images on this site are free
culture works available under the " ,%cc-by-sa-link " license."))))
(script (@ (src "/static/prism.js"))))))
#:post-template
(lambda (post)
`((h1 ,(post-ref post 'title))
(div ,(post-sxml post))))
#:collection-template
(lambda (site title posts prefix)
(define (post-uri post)
(string-append "/" (or prefix "")
(site-post-slug site post) ".html"))
`((h2 "notes")
(ul
,@(map (lambda (post)
`(li
(a (@ (href ,(post-uri post)))
,(post-ref post 'title)
" â "
,(date->string* (post-date post)))))
(posts/reverse-chronological posts)))))))
(define %collections
`(("Home" "index.html" ,posts/reverse-chronological)))
(use-modules (srfi srfi-9)) ;; records
(use-modules (ice-9 match))
(use-modules (sxml simple))
;;;
;;; macro to quickly define immutable records
;;;
;;
;; FIXME: Taken from Guile (maybe should be in (srfi srfi-99))
;; adapted to make it possible to declare record type like `<abc>' and
keep
;; field accessor bracket free. record name *must* have brackets or
everything
;; is broken
;;
;; Usage:
;;
;; (define-record-type <abc> field-one field-two)
;; (define zzz (make-abc 1 2))
;; (abc-field-one zzz) ;; => 1
;;
(define-syntax define-record-type*
(lambda (x)
(define (%id-name name) (string->symbol (string-drop (string-drop-right
(symbol->string name) 1) 1)))
(define (id-name ctx name)
(datum->syntax ctx (%id-name (syntax->datum name))))
(define (id-append ctx . syms)
(datum->syntax ctx (apply symbol-append (map syntax->datum syms))))
(syntax-case x ()
((_ rname field ...)
(and (identifier? #'rname) (and-map identifier? #'(field ...)))
(with-syntax ((cons (id-append #'rname #'make- (id-name #'rname
#'rname)))
(pred (id-append #'rname (id-name #'rname #'rname) #'?))
((getter ...) (map (lambda (f)
(id-append f (id-name #'rname
#'rname) #'- f))
#'(field ...))))
#'(define-record-type rname
(cons field ...)
pred
(field getter)
...))))))
;;;
;;; parser combinator library
;;;
;;; setup datastructures with a helper
(define-record-type* <success> value rest)
(define-record-type* <failure> text)
(define* ((bind p fn) str)
"helper to chain parser procedures"
(match (p str)
[($ <success> val rest) ((fn val) rest)]
[($ <failure> rest) (make-failure rest)]))
;;; definitions of parser combinators
;; seq & seq*
(define (seq a b)
(define* ((succeed val) str)
(make-success val str))
(define (join x y)
(if (null? y)
x
(if (and (string? x) (string? y))
;; XXX: do not reverse x and y
(string-append x y)
(append x y))))
(bind a (lambda (x) (bind b (lambda (y) (succeed (join x y)))))))
(define-syntax seq*
(syntax-rules ()
((seq* a b) (seq a b))
((seq* a b c d ...) (seq a (seq* b c d ...)))))
;; alt & alt*
(define* ((alt a b) str)
(let ((result (a str)))
(match result
[($ <success> val rest) result]
[($ <failure> rest) (b str)])))
(define-syntax alt*
(syntax-rules ()
((alt* a b) (alt a b))
((alt* a b c d ...) (alt a (alt* b c d ...)))))
;; zero-or-more aka. `*` & one-or-more `+` combinators
(define* ((zero-or-more a) str)
"match A zero or more times"
(define (join x y)
(if (null? y)
x
(if (string? y)
(string-append y x)
(append y x))))
(let loop ((rest str)
(out '()))
(if (equal? rest "")
(make-success out "")
(if (null? rest)
(make-success '() out)
(match (a rest)
[($ <success> val rest) (loop rest (join val out))]
[($ <failure> rest) (make-success out rest)])))))
(define (one-or-more a)
"match A one or more times"
(seq a (zero-or-more a)))
;; not* combinator
(define* ((not* parser) str)
"match current char if PARSER doesn't match"
(match (parser str)
[($ <success> val rest) (make-failure str)]
[($ <failure> rest) (make-success (substring str 0 1) (substring str 1))]))
;; if* combinator (sometimes called `and` combinator)
(define* ((if* a b) str)
"match B if A match. A doesn't consume input"
(match (a str)
[($ <success> val rest) (b str)]
[($ <failure> rest) (make-failure str)]))
;; string matchers
(define* ((string! value) str)
"match VALUE string and store in <success> value"
(if (equal? str "")
(make-failure str)
(let* ((len (min (string-length str) (string-length value)))
(head (substring str 0 len))
(tail (substring str len)))
(if (equal? head value)
(make-success head tail)
(make-failure str)))))
(define* ((string value) str)
"match VALUE but do not store it in <success> value"
(match ((string! value) str)
[($ <success> val rest) (make-success '() rest)]
[($ <failure> rest) (make-failure str)]))
;; eol and eof matcher
(define (eol str)
"match end-of-line and don't store anything"
(if (equal? str "")
(make-success '() "")
(if (equal? (string-ref str 0) #\newline)
(make-success '() (substring str 1))
(make-failure str))))
(define (eof str)
"match end-of-file and don't store anything"
(if (equal? str "")
(make-success '() "")
(make-failure str)))
;;; other helpers
(define (unnest a)
;; XXX: output sanitization...
;; keep the nesting level to the minimum
(match a
[(x) (unnest x)]
[(x ...) a]
[_ (list a)]))
(define* ((node name parser) str)
"Prepend PARSER match with NAME"
(match (parser str)
[($ <success> val rest) (make-success (list (append (list name) val)) rest)]
[($ <failure> rest) (make-failure str)]))
(define-syntax define-parser
;; define configurable parser or plain parser as a combination of other
parsers
(syntax-rules ()
((define-parser (name a ...) b ...) (define (name a ...)
(lambda args (apply b ... args))))
((define-parser name b ...) (define name
(lambda args (apply b ... args))))))
;;;
;;; little markdown parser
;;;
;;; text
(define (single! pattern)
"check that there is a PATTERN appears only one time on the current line"
(if* (seq* (string pattern) (zero-or-more (not* (alt* (string pattern) eol
eof))) (alt eol eof)) (string! pattern)))
(define-parser text
;; parser a text without any style applied"
(node #:text (one-or-more (alt* (not* (alt* (string! "**") ;; command
characters
(string! "*")
(string! "[")
(string! "`")
;; neither text end
(seq eol eol)
eof))
(single! "**")
(single! "*")
(single! "`")))))
;;; inline styles
(define-parser (enclosed keyword pattern)
(node keyword (seq* (string pattern) (one-or-more (not* (string! pattern)))
(string pattern))))
(define italic (enclosed #:italic "*"))
(define bold (enclosed #:bold "**"))
(define code (enclosed #:code "`"))
(define link
(node #:link (seq* (string "[")
(node #:text (one-or-more (not* (string "]"))))
(string "]")
(string "(")
(node #:url (one-or-more (not* (string ")"))))
(string ")"))))
;;; block parser definition
;; paragraph
(define-parser inline
(alt* link bold italic code text))
(define-parser paragraph
(node #:paragraph (seq (one-or-more inline) (alt (one-or-more eol) eof))))
;;; section parser
(define-parser (section keyword pattern)
;; Parse a KEYWORD section using PATTERN
(node keyword (seq* (string pattern)
(one-or-more (not* eol))
(alt (one-or-more eol) eof))))
(define h1 (section #:h1 "#"))
(define h2 (section #:h2 "##"))
(define h3 (section #:h3 "###"))
(define h4 (section #:h4 "####"))
(define h5 (section #:h5 "#####"))
;;; code-block
(define-parser code-block
;; Parse a KEYWORD section using PATTERN
(node #:code-block (seq* (string "```")
(node #:lang (one-or-more (not* eol)))
eol
(node #:code (one-or-more (not* (string "```"))))
(string "```")
(alt (one-or-more eol) eof))))
;;; metadata
(define metadata-token
(node #:metadata
(seq* (node #:name (one-or-more (not* (alt (string ":") eol))))
(string ":")
(node #:value (one-or-more (not* eol)))
eol)))
;;; little markdown parser
(define-parser markdown-parser
(seq* (zero-or-more metadata-token)
eol
(one-or-more (alt* eol eof h5 h4 h3 h2 h1 code-block paragraph))))
;;;
;;; parser output processing
;;;
;; (define lipsum "abc [def](ghj) uiop
;; **xyz** tuv
;; ")
;; (use-modules (srfi srfi-26))
;; (map (cut pk 'm <>) (success-value (markdown lipsum)))
;; (define (success->ast success)
;; (success-value success))
;;; tests
(define lipsum "title: Story of lazy fox
# Story
## What happens
Behold *this* is a text about **something** that is happening
in the life of Laz Yfox. You now the guy `next door`. You smile?
Then you know him.
```scheme
(map display (list 'Laz 'Yfox))
```
## What will happen
Little did you know he was a [lazy fox](http://laz.yfox). Writing sofware
lazily.
**Obviously**. Only when required. And hacker know that's it's always
required to craft good software.
")
(map pk (success-value (inline "text [foo](http://lipsum)")))
(map pk (success-value (markdown-parser lipsum)))
(define (ast->sxml ast)
(match ast
((#:metadata (#:name . name) (#:value . value))
(list (cons name (string-trim-both value))))
((#:paragraph paragraph ...) `(p ,(map ast->sxml paragraph)))
((#:code-block (#:lang . lang) (#:code . code))
`(pre (@ (class ,(string-append "language-" lang))) (code
,(string-trim-both code))))
((#:h1 . text) `(h1 ,(string-trim-both text)))
((#:h2 . text) `(h2 ,(string-trim-both text)))
((#:h3 . text) `(h3 ,(string-trim-both text)))
((#:h4 . text) `(h4 ,(string-trim-both text)))
((#:h5 . text) `(h5 ,(string-trim-both text)))
((#:text . text) text)
((#:bold . text) `(b ,text))
((#:italic . text) `(em ,text))
((#:code . text) `(code ,text))
((#:link (#:text . text) (#:url . url)) `(a (@ (href ,url)) ,text))))
(define-public (string->sxml string)
(map ast->sxml (success-value (markdown-parser string))))
(define (markdown port)
(string->sxml (read-string port)))
(define (sxml->metadata+content sxml)
(define title (cdaar sxml))
(define date (string->date (cdaadr sxml) "~Y-~m-~d ~H:~M"))
(values `((title . ,title)
(date . ,date))
(cddr sxml)))
(define (read-markdown-post port)
(sxml->metadata+content (markdown port)))
(define markdown-reader
(make-reader (make-file-extension-matcher "md")
(cut call-with-input-file <> read-markdown-post)))
(site #:title "hypermove.net"
#:domain "hypermove.net"
#:default-metadata
'((author . "Amirouche Boubekki")
(email . "amirou...@hypermove.net"))
#:readers (list markdown-reader)
#:builders (list (blog #:theme haunt-theme #:collections %collections)
(atom-feed)
(atom-feeds-by-tag)
(static-directory "static")))