Thank you, everyone.
On Wednesday, February 2, 2022 at 3:22:53 PM UTC-5 [email protected]
wrote:
> Assume I've been cursed to scrape HTML. If I convert the pages to Hickory
> I end up with a big mass of data which, sadly, lacks many "class" or "id"s
> that would let me easily pick out the data I need. However, for the most
> part, the only thing I really need off this page is the CVEs, which look
> like this:
>
> CVE-2021-40539
>
> I'm thinking I might write regex against the plain text of the page, but
> I'm also curious, is it common to take something like Hiccup or Hickory or
> a zipper and run regex through it? If yes, how is that done?
>
> A small part of the data looks like this:
>
> :content
> [{:type :element,
> :attrs
> {:class "tip-intro", :style "font-size: 15px;"},
> :tag :p,
> :content
> [{:type :element,
> :attrs nil,
> :tag :em,
> :content
> ["This Joint Cybersecurity Advisory uses the MITRE
> Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework,
> Version 8. See the "
> {:type :element,
> :attrs
> {:href
> "
> https://attack.mitre.org/versions/v9/techniques/enterprise/"},
> :tag :a,
> :content ["ATT&CK for Enterprise"]}
> " for referenced threat actor tactics and for
> techniques."]}]}
> "\n\n"
> {:type :element,
> :attrs nil,
> :tag :p,
> :content
> ["This joint advisory is the result of analytic efforts
> between the Federal Bureau of Investigation (FBI), United States Coast
> Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure
> Security Agency (CISA) to highlight the cyber threat associated with active
> exploitation of a newly identified vulnerability (CVE-2021-40539) in
> ManageEngine ADSelfService Plus—a self-service password management and
> single sign-on solution."]}
> "\n\n"
> {:type :element,
> :attrs nil,
> :tag :p,
> :content
> ["CVE-2021-40539, rated critical by the Common
> Vulnerability Scoring System (CVSS), is an authentication bypass
> vulnerability affecting representational state transfer (REST) application
> programming interface (API) URLs that could enable remote code execution.
> The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT)
> cyber actors are likely among those exploiting the vulnerability. The
> exploitation of ManageEngine ADSelfService Plus poses a serious risk to
> critical infrastructure companies, U.S.-cleared defense contractors,
> academic institutions, and other entities that use the software. Successful
> exploitation of the vulnerability allows an attacker to place webshells,
> which enable the adversary to conduct post-exploitation activities, such as
> compromising administrator credentials, conducting lateral movement, and
> exfiltrating registry hives and Active Directory files."]}
> "\n\n"
>
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/clojure/388183d2-7cf6-4b58-900f-aa0d0074cd91n%40googlegroups.com.