Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 6a685086c9911b31715b157db67cf81cfd87e091
      
https://github.com/WebKit/WebKit/commit/6a685086c9911b31715b157db67cf81cfd87e091
  Author: Wenson Hsieh <[email protected]>
  Date:   2026-06-11 (Thu, 11 Jun 2026)

  Changed paths:
    M Source/WebCore/page/text-extraction/TextExtraction.cpp
    M Source/WebCore/page/text-extraction/TextExtractionTypes.h
    M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
    M Source/WebKit/Shared/TextExtractionToStringConversion.h
    M Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in
    M Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm
    M Source/WebKit/WebProcess/WebPage/WebFrame.cpp
    M Tools/TestWebKitAPI/Tests/WebKit/WKWebView/TextExtractionTests.mm

  Log Message:
  -----------
  [macOS Golden Gate] Notify Me: add support for PDFs
https://bugs.webkit.org/show_bug.cgi?id=316940
rdar://179406752

Reviewed by Abrar Rahman Protyasha.

Add basic support for extracting text representations from PDF documents as 
markdown, HTML, JSON and
text tree formats.

* Source/WebCore/page/text-extraction/TextExtraction.cpp:
(WebCore::TextExtraction::extractRecursive):
* Source/WebCore/page/text-extraction/TextExtractionTypes.h:
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::formatPDFMarkdownForOutput):

For now, leave PDF content as plain text underneath a single root node when 
extracting HTML, JSON
and text tree. In the future, we could find more elaborate ways to represent 
PDFs in better fidelity
(including links and images).

* Source/WebKit/Shared/TextExtractionToStringConversion.h:
* Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in:
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView 
_extractDebugTextWithConfigurationWithoutUpdatingFilterRules:assertionScope:completionHandler:]):
* Source/WebKit/WebProcess/WebPage/WebFrame.cpp:
(WebKit::WebFrame::requestTextExtraction):
* Tools/TestWebKitAPI/Tests/WebKit/WKWebView/TextExtractionTests.mm:
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsMarkdown)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsTextTree)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsHTML)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsJSON)):
(TestWebKitAPI::(TextExtractionTests, ExtractFromPDFAsPlainText)):

Canonical link: https://commits.webkit.org/315072@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to