https://bugzilla.redhat.com/show_bug.cgi?id=2319926

            Bug ID: 2319926
           Summary: Review-request: python-html-text - Extract text from
                    HTML
           Product: Fedora
           Version: rawhide
                OS: Linux
            Status: NEW
         Component: Package Review
          Severity: medium
          Assignee: [email protected]
          Reporter: [email protected]
        QA Contact: [email protected]
                CC: [email protected]
  Target Milestone: ---
    Classification: Fedora



spec:
https://download.copr.fedorainfracloud.org/results/fed500/gourmand/fedora-rawhide-x86_64/08156160-python-html-text/python-html-text.spec
srpm:
https://download.copr.fedorainfracloud.org/results/fed500/gourmand/fedora-rawhide-x86_64/08156160-python-html-text/python-html-text-0.6.2-1.fc42.src.rpm

description:
How is html_text different from .xpath('//text()') from LXML
or .get_text() from Beautiful Soup?

- Text extracted with html_text does not contain inline styles,
javascript, comments and other text that is not normally visible
to users;

- html_text normalizes whitespace, but in a way smarter than
.xpath('normalize-space()), adding spaces around inline elements
(which are often used as block elements in html markup), and trying
to avoid adding extra spaces for punctuation;

- html-text can add newlines (e.g. after headers or paragraphs), so
that the output text looks more like how it is rendered in browsers.

fas: fed500

Comments:
Pytest7 warning seems spurious as pytest7 is not installed.

Reproducible: Always


-- 
You are receiving this mail because:
You are always notified about changes to this product and component
You are on the CC list for the bug.
https://bugzilla.redhat.com/show_bug.cgi?id=2319926

Report this comment as SPAM: 
https://bugzilla.redhat.com/enter_bug.cgi?product=Bugzilla&format=report-spam&short_desc=Report%20of%20Bug%202319926%23c0

-- 
_______________________________________________
package-review mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to