If all you're looking for is the format CVE-NNNN-NNNNN then by all means
just use regex against the plain text of the page. If you need to do dom
traversal then jsoup is a good choice. Otherwise, like Mark said, tree-seq
is a great choice if you don't want to play with clojure.walk.

On Wed, Feb 2, 2022 at 2:58 PM Mark Nutter <manutte...@gmail.com> wrote:

> I don't know how common it is, but have you looked at the `tree-seq`
> function in Clojure? This seems like a good use case for it.
>
> Mark
>
> On Wed, Feb 2, 2022 at 3:22 PM lawrence...@gmail.com <
> lawrence.krub...@gmail.com> wrote:
>
>> Assume I've been cursed to scrape HTML. If I convert the pages to Hickory
>> I end up with a big mass of data which, sadly, lacks many "class" or "id"s
>> that would let me easily pick out the data I need. However, for the most
>> part, the only thing I really need off this page is the CVEs, which look
>> like this:
>>
>> CVE-2021-40539
>>
>> I'm thinking I might write regex against the plain text of the page, but
>> I'm also curious, is it common to take something like Hiccup or Hickory or
>> a zipper and run regex through it? If yes, how is that done?
>>
>> A small part of the data looks like this:
>>
>>                 :content
>>                 [{:type :element,
>>                   :attrs
>>                   {:class "tip-intro", :style "font-size: 15px;"},
>>                   :tag :p,
>>                   :content
>>                   [{:type :element,
>>                     :attrs nil,
>>                     :tag :em,
>>                     :content
>>                     ["This Joint Cybersecurity Advisory uses the MITRE
>> Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework,
>> Version 8. See the "
>>                      {:type :element,
>>                       :attrs
>>                       {:href
>>                        "
>> https://attack.mitre.org/versions/v9/techniques/enterprise/"},
>>                       :tag :a,
>>                       :content ["ATT&CK for Enterprise"]}
>>                      " for  referenced threat actor tactics and for
>> techniques."]}]}
>>                  "\n\n"
>>                  {:type :element,
>>                   :attrs nil,
>>                   :tag :p,
>>                   :content
>>                   ["This joint advisory is the result of analytic efforts
>> between the Federal Bureau of Investigation (FBI), United States Coast
>> Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure
>> Security Agency (CISA) to highlight the cyber threat associated with active
>> exploitation of a newly identified vulnerability (CVE-2021-40539) in
>> ManageEngine ADSelfService Plus—a self-service password management and
>> single sign-on solution."]}
>>                  "\n\n"
>>                  {:type :element,
>>                   :attrs nil,
>>                   :tag :p,
>>                   :content
>>                   ["CVE-2021-40539, rated critical by the Common
>> Vulnerability Scoring System (CVSS), is an authentication bypass
>> vulnerability affecting representational state transfer (REST) application
>> programming interface (API) URLs that could enable remote code execution.
>> The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT)
>> cyber actors are likely among those exploiting the vulnerability. The
>> exploitation of ManageEngine ADSelfService Plus poses a serious risk to
>> critical infrastructure companies, U.S.-cleared defense contractors,
>> academic institutions, and other entities that use the software. Successful
>> exploitation of the vulnerability allows an attacker to place webshells,
>> which enable the adversary to conduct post-exploitation activities, such as
>> compromising administrator credentials, conducting lateral movement, and
>> exfiltrating registry hives and Active Directory files."]}
>>                  "\n\n"
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com
>> <https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/clojure/CACMqiXAG3xtxa0XzHemyi-nf-HOQa1epoN%2BJrKN5AGJo7%3DVR%3Dw%40mail.gmail.com
> <https://groups.google.com/d/msgid/clojure/CACMqiXAG3xtxa0XzHemyi-nf-HOQa1epoN%2BJrKN5AGJo7%3DVR%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/CAMZDCY10O09mQ-Mtus%2B4dUKvL%2BznzehwGSfwH-bT%3DGwr%3D%3DkUtQ%40mail.gmail.com.

Reply via email to