raym - Thanks so much for the code snippet.  That was just what I needed to 
get unstuck and start playing around with the parser.  I really appreciate 
the help.

- Dave

On Sunday, October 21, 2012 6:10:51 AM UTC-5, raym wrote:
>
> As Dave says, you can do this using line-seq, but you'll have to 
> accumulate some state as you read the lines so you can return all the 
> lines for a given thread's ReqStart to ReqEnd. Once you've returned 
> that block, you can delete the state for that thread-id, so your 
> accumulated state will only contain the 'active' requests. If you're 
> processing a very large file, you're best returning a lazy sequence of 
> the data. 
>
> Something like this should get you started: 
>
> (require [clojure.java.io :as io]) 
>
> (defn parse-line 
>   [s] 
>   (let [[_ thread_id tag value] (re-find #"^(\d+)\s+(\S+)\s+(.+)$" s)] 
>     [thread_id tag value])) 
>
> (defn parse-lines 
>   ([lines] 
>      (parse-lines lines {})) 
>   ([lines state] 
>      (lazy-seq 
>       (when (seq lines) 
>         (let [[thread-id tag value] (parse-line (first lines)) 
>               state (assoc-in state [thread-id tag] 
>                               (conj (get-in state [thread-id tag] []) 
> value))] 
>           (if (= tag "ReqEnd") 
>             (cons (get state thread-id) (parse-lines (rest lines) 
> (dissoc state thread-id))) 
>             (parse-lines (rest lines) state))))))) 
>
> (defn parse-log-file 
>   [file] 
>   (with-open [logfile (io/reader file)] 
>     (doall 
>      (filter #(and (get % "ReqStart") (get % "ReqEnd")) 
>              (parse-lines (line-seq logfile)))))) 
>
>
> On 21 October 2012 02:54, Dave <da...@sevenventures.net <javascript:>> 
> wrote: 
> > Clojurists - I'm fairly new to Clojure and didn't realize how broken 
> I've 
> > become using imperative languages all my life.  I'm stumped as to how to 
> > parse a Varnish (www.varnish-cache.org) log file using Clojure.  The 
> main 
> > problem is that for a single request a varnish log file generates 
> multiple 
> > log lines and each line is interspersed with lines from other threads. 
> > These log files can be several gigabytes in size (so using a stable sort 
> of 
> > the entire log by thread id is out of the question). 
> > 
> > Below I've included a small example log file and an example output 
> Clojure 
> > data structure.  Let me thank everyone in advance for any hints / help 
> they 
> > can provide on this seemingly simple problem. 
> > 
> > Rules of the Varnish Log File 
> > 
> > The first number on each line is the thread id (not unique and gets 
> reused 
> > frequently) 
> > Each ReqStart marks the start of a request and the last number on the 
> line 
> > is the unique transaction id (e.g. 118591777) 
> > ReqEnd denote the end of the processing of the request by the thread 
> > Each line is atomically written, however many threads generate log lines 
> > that are interspersed with other requests (threads) 
> > These log files can be VERY large (10+ Gigabytes in the case of my 
> > application) so using a stable sort by thread id or anything that loads 
> the 
> > entire file into memory is out of the question. 
> > 
> > 
> > Example Varnish Log file 
> >    40 ReqEnd       c 118591771 1350759605.775758028 1350759611.249602079 
> > 5.866879225 5.473801851 0.000042200 
> >    15 ReqStart     c 10.102.41.121 4187 118591777 
> >    15 RxRequest    c GET 
> >    15 RxURL        c /json/engagement 
> >    15 RxHeader     c host: www.example.com 
> >    30 ReqStart     c 10.102.41.121 3906 118591802 
> >    15 RxHeader     c Accept: application/json 
> >    30 RxRequest    c GET 
> >    30 RxURL        c /ws/boxtops/user/ 
> >    30 RxHeader     c host: www.example.com 
> >    15 ReqEnd       c 118591777 1350759605.775758028 1350759611.249602079 
> > 5.866879225 5.473801851 0.000042200 
> >    30 RxHeader     c Accept: application/xml 
> >    30 ReqEnd       c 118591802 1350759611.326084614 1350759611.329720259 
> > 0.005002737 0.003598213 0.000037432 
> >    15 ReqStart     c 10.102.41.121 4187 118591808 
> >    15 RxRequest    c GET 
> >    15 RxURL        c /ws/boxtops/user/ 
> >    30 ReqStart     c 10.102.41.121 3906 118591810 
> >    15 RxHeader     c host: www.example.com 
> >    15 RxHeader     c Accept: application/xml 
> >    30 RxRequest    c GET 
> >    30 RxURL        c /registration/success 
> >    30 RxHeader     c host: www.example.com 
> >    46 TxRequest    - GET 
> >    30 RxHeader     c Accept: text/html 
> >    46 TxURL        - /registration/success 
> >    15 ReqEnd       c 118591808 1350759611.442447424 1350759611.444925785 
> > 0.016906023 0.002441406 0.000036955 
> >    30 ReqEnd       c 118591810 1350759611.521781683 1350759611.525400877 
> > 0.098322868 0.003532171 0.000087023 
> > 
> > Desired Output 
> > { 
> >   118591802 
> >   { :ReqStart ["10.102.41.121 3906 118591802"] 
> >     :RxRequest ["GET"] 
> >     :RxURL ["/ws/boxtops/user/"] 
> >     :RxHeader ["host: www.example.com" "Accept: application/xml"] 
> >               or better yet 
> >     :RxHeader {:host "www.example.com" :Accept "application/xml"} 
> >     :ReqEnd ["118591802 1350759611.326084614 1350759611.329720259 
> > 0.005002737 0.003598213 0.000037432"] } 
> >   118591777 
> >   { :ReqStart ["10.102.41.121 4187 118591777"] 
> >     :RxRequest ["GET"] 
> >     :RxURL ["/json/engagement"] 
> >     :RxHeader ["host: www.example.com" "Accept: application/json"] 
> >     :ReqEnd ["118591777 1350759605.775758028 1350759611.249602079 
> > 5.866879225 5.473801851 0.000042200" ]} 
> >   118591808 
> >   { :ReqStart [10.102.41.121 4187 118591808] 
> >     :RxRequest ["GET"] 
> >     :RxURL ["/ws/boxtops/user/"] 
> >     :RxHeader ["host: www.example.com" "Accept: application/xml"] 
> >     :ReqEnd ["118591808 1350759611.442447424 1350759611.444925785 
> > 0.016906023 0.002441406 0.000036955"] } 
> >   118591810 
> >   { :ReqStart ["10.102.41.121 3906 118591810"] 
> >     :RxRequest ["GET"] 
> >     :RxURL ["/registration/success"] 
> >     :RxHeader ["host: www.example.com" "Accept: text/html] 
> >     :ReqEnd ["118591810 1350759611.521781683 1350759611.525400877 
> > 0.098322868 0.003532171 0.000087023"] } 
> > } 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "Clojure" group. 
> > To post to this group, send email to clo...@googlegroups.com<javascript:> 
> > Note that posts from new members are moderated - please be patient with 
> your 
> > first post. 
> > To unsubscribe from this group, send email to 
> > clojure+u...@googlegroups.com <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/clojure?hl=en 
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to