Assume I've been cursed to scrape HTML. If I convert the pages to Hickory I
end up with a big mass of data which, sadly, lacks many "class" or "id"s
that would let me easily pick out the data I need. However, for the most
part, the only thing I really need off this page is the CVEs, which look
like this:
CVE-2021-40539
I'm thinking I might write regex against the plain text of the page, but
I'm also curious, is it common to take something like Hiccup or Hickory or
a zipper and run regex through it? If yes, how is that done?
A small part of the data looks like this:
:content
[{:type :element,
:attrs
{:class "tip-intro", :style "font-size: 15px;"},
:tag :p,
:content
[{:type :element,
:attrs nil,
:tag :em,
:content
["This Joint Cybersecurity Advisory uses the MITRE
Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework,
Version 8. See the "
{:type :element,
:attrs
{:href
"https://attack.mitre.org/versions/v9/techniques/enterprise/"},
:tag :a,
:content ["ATT&CK for Enterprise"]}
" for referenced threat actor tactics and for
techniques."]}]}
"\n\n"
{:type :element,
:attrs nil,
:tag :p,
:content
["This joint advisory is the result of analytic efforts
between the Federal Bureau of Investigation (FBI), United States Coast
Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure
Security Agency (CISA) to highlight the cyber threat associated with active
exploitation of a newly identified vulnerability (CVE-2021-40539) in
ManageEngine ADSelfService Plus—a self-service password management and
single sign-on solution."]}
"\n\n"
{:type :element,
:attrs nil,
:tag :p,
:content
["CVE-2021-40539, rated critical by the Common
Vulnerability Scoring System (CVSS), is an authentication bypass
vulnerability affecting representational state transfer (REST) application
programming interface (API) URLs that could enable remote code execution.
The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT)
cyber actors are likely among those exploiting the vulnerability. The
exploitation of ManageEngine ADSelfService Plus poses a serious risk to
critical infrastructure companies, U.S.-cleared defense contractors,
academic institutions, and other entities that use the software. Successful
exploitation of the vulnerability allows an attacker to place webshells,
which enable the adversary to conduct post-exploitation activities, such as
compromising administrator credentials, conducting lateral movement, and
exfiltrating registry hives and Active Directory files."]}
"\n\n"
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com.