raym - Thanks so much for the code snippet. That was just what I needed to get unstuck and start playing around with the parser. I really appreciate the help.
- Dave On Sunday, October 21, 2012 6:10:51 AM UTC-5, raym wrote: > > As Dave says, you can do this using line-seq, but you'll have to > accumulate some state as you read the lines so you can return all the > lines for a given thread's ReqStart to ReqEnd. Once you've returned > that block, you can delete the state for that thread-id, so your > accumulated state will only contain the 'active' requests. If you're > processing a very large file, you're best returning a lazy sequence of > the data. > > Something like this should get you started: > > (require [clojure.java.io :as io]) > > (defn parse-line > [s] > (let [[_ thread_id tag value] (re-find #"^(\d+)\s+(\S+)\s+(.+)$" s)] > [thread_id tag value])) > > (defn parse-lines > ([lines] > (parse-lines lines {})) > ([lines state] > (lazy-seq > (when (seq lines) > (let [[thread-id tag value] (parse-line (first lines)) > state (assoc-in state [thread-id tag] > (conj (get-in state [thread-id tag] []) > value))] > (if (= tag "ReqEnd") > (cons (get state thread-id) (parse-lines (rest lines) > (dissoc state thread-id))) > (parse-lines (rest lines) state))))))) > > (defn parse-log-file > [file] > (with-open [logfile (io/reader file)] > (doall > (filter #(and (get % "ReqStart") (get % "ReqEnd")) > (parse-lines (line-seq logfile)))))) > > > On 21 October 2012 02:54, Dave <da...@sevenventures.net <javascript:>> > wrote: > > Clojurists - I'm fairly new to Clojure and didn't realize how broken > I've > > become using imperative languages all my life. I'm stumped as to how to > > parse a Varnish (www.varnish-cache.org) log file using Clojure. The > main > > problem is that for a single request a varnish log file generates > multiple > > log lines and each line is interspersed with lines from other threads. > > These log files can be several gigabytes in size (so using a stable sort > of > > the entire log by thread id is out of the question). > > > > Below I've included a small example log file and an example output > Clojure > > data structure. Let me thank everyone in advance for any hints / help > they > > can provide on this seemingly simple problem. > > > > Rules of the Varnish Log File > > > > The first number on each line is the thread id (not unique and gets > reused > > frequently) > > Each ReqStart marks the start of a request and the last number on the > line > > is the unique transaction id (e.g. 118591777) > > ReqEnd denote the end of the processing of the request by the thread > > Each line is atomically written, however many threads generate log lines > > that are interspersed with other requests (threads) > > These log files can be VERY large (10+ Gigabytes in the case of my > > application) so using a stable sort by thread id or anything that loads > the > > entire file into memory is out of the question. > > > > > > Example Varnish Log file > > 40 ReqEnd c 118591771 1350759605.775758028 1350759611.249602079 > > 5.866879225 5.473801851 0.000042200 > > 15 ReqStart c 10.102.41.121 4187 118591777 > > 15 RxRequest c GET > > 15 RxURL c /json/engagement > > 15 RxHeader c host: www.example.com > > 30 ReqStart c 10.102.41.121 3906 118591802 > > 15 RxHeader c Accept: application/json > > 30 RxRequest c GET > > 30 RxURL c /ws/boxtops/user/ > > 30 RxHeader c host: www.example.com > > 15 ReqEnd c 118591777 1350759605.775758028 1350759611.249602079 > > 5.866879225 5.473801851 0.000042200 > > 30 RxHeader c Accept: application/xml > > 30 ReqEnd c 118591802 1350759611.326084614 1350759611.329720259 > > 0.005002737 0.003598213 0.000037432 > > 15 ReqStart c 10.102.41.121 4187 118591808 > > 15 RxRequest c GET > > 15 RxURL c /ws/boxtops/user/ > > 30 ReqStart c 10.102.41.121 3906 118591810 > > 15 RxHeader c host: www.example.com > > 15 RxHeader c Accept: application/xml > > 30 RxRequest c GET > > 30 RxURL c /registration/success > > 30 RxHeader c host: www.example.com > > 46 TxRequest - GET > > 30 RxHeader c Accept: text/html > > 46 TxURL - /registration/success > > 15 ReqEnd c 118591808 1350759611.442447424 1350759611.444925785 > > 0.016906023 0.002441406 0.000036955 > > 30 ReqEnd c 118591810 1350759611.521781683 1350759611.525400877 > > 0.098322868 0.003532171 0.000087023 > > > > Desired Output > > { > > 118591802 > > { :ReqStart ["10.102.41.121 3906 118591802"] > > :RxRequest ["GET"] > > :RxURL ["/ws/boxtops/user/"] > > :RxHeader ["host: www.example.com" "Accept: application/xml"] > > or better yet > > :RxHeader {:host "www.example.com" :Accept "application/xml"} > > :ReqEnd ["118591802 1350759611.326084614 1350759611.329720259 > > 0.005002737 0.003598213 0.000037432"] } > > 118591777 > > { :ReqStart ["10.102.41.121 4187 118591777"] > > :RxRequest ["GET"] > > :RxURL ["/json/engagement"] > > :RxHeader ["host: www.example.com" "Accept: application/json"] > > :ReqEnd ["118591777 1350759605.775758028 1350759611.249602079 > > 5.866879225 5.473801851 0.000042200" ]} > > 118591808 > > { :ReqStart [10.102.41.121 4187 118591808] > > :RxRequest ["GET"] > > :RxURL ["/ws/boxtops/user/"] > > :RxHeader ["host: www.example.com" "Accept: application/xml"] > > :ReqEnd ["118591808 1350759611.442447424 1350759611.444925785 > > 0.016906023 0.002441406 0.000036955"] } > > 118591810 > > { :ReqStart ["10.102.41.121 3906 118591810"] > > :RxRequest ["GET"] > > :RxURL ["/registration/success"] > > :RxHeader ["host: www.example.com" "Accept: text/html] > > :ReqEnd ["118591810 1350759611.521781683 1350759611.525400877 > > 0.098322868 0.003532171 0.000087023"] } > > } > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Clojure" group. > > To post to this group, send email to clo...@googlegroups.com<javascript:> > > Note that posts from new members are moderated - please be patient with > your > > first post. > > To unsubscribe from this group, send email to > > clojure+u...@googlegroups.com <javascript:> > > For more options, visit this group at > > http://groups.google.com/group/clojure?hl=en > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en