Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 6a685086c9911b31715b157db67cf81cfd87e091
https://github.com/WebKit/WebKit/commit/6a685086c9911b31715b157db67cf81cfd87e091
Author: Wenson Hsieh <[email protected]>
Date: 2026-06-11 (Thu, 11 Jun 2026)
Changed paths:
M Source/WebCore/page/text-extraction/TextExtraction.cpp
M Source/WebCore/page/text-extraction/TextExtractionTypes.h
M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
M Source/WebKit/Shared/TextExtractionToStringConversion.h
M Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in
M Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm
M Source/WebKit/WebProcess/WebPage/WebFrame.cpp
M Tools/TestWebKitAPI/Tests/WebKit/WKWebView/TextExtractionTests.mm
Log Message:
-----------
[macOS Golden Gate] Notify Me: add support for PDFs
https://bugs.webkit.org/show_bug.cgi?id=316940
rdar://179406752
Reviewed by Abrar Rahman Protyasha.
Add basic support for extracting text representations from PDF documents as
markdown, HTML, JSON and
text tree formats.
* Source/WebCore/page/text-extraction/TextExtraction.cpp:
(WebCore::TextExtraction::extractRecursive):
* Source/WebCore/page/text-extraction/TextExtractionTypes.h:
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::formatPDFMarkdownForOutput):
For now, leave PDF content as plain text underneath a single root node when
extracting HTML, JSON
and text tree. In the future, we could find more elaborate ways to represent
PDFs in better fidelity
(including links and images).
* Source/WebKit/Shared/TextExtractionToStringConversion.h:
* Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in:
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView
_extractDebugTextWithConfigurationWithoutUpdatingFilterRules:assertionScope:completionHandler:]):
* Source/WebKit/WebProcess/WebPage/WebFrame.cpp:
(WebKit::WebFrame::requestTextExtraction):
* Tools/TestWebKitAPI/Tests/WebKit/WKWebView/TextExtractionTests.mm:
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsMarkdown)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsTextTree)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsHTML)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsJSON)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsPlainText)):
Canonical link: https://commits.webkit.org/315072@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications