You really need to debug ShouldContentExtractionIncludeIframe <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/document_chunker.cc;drc=1651676a30cd7abcd177975f7cd0e37bd945f663;bpv=1;bpt=1;l=61>
On Tue, Nov 26, 2024 at 1:15 PM 'Salvador Guerrero Ramos' via blink-dev < blink-dev@chromium.org> wrote: > Hi > > I've been working on a prototype that uses the Element.innerText > <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/editing/element_inner_text.cc;l=466;drc=277f4ab48eb85f7441f78aed191c31068ce89814> > API to get text from a web page (I'm calling this API with InnerTextAgent > <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/inner_text_agent.cc;l=56;drc=277f4ab48eb85f7441f78aed191c31068ce89814>). > In some web pages the resulting text includes text from cross-origin > iframes (e.g. embedded tweets). My expectation is that this would work > similarly to the InnerText JS API > <https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText>, > which does not return iframe text. > > I'm able to reproduce this on Desktop and Android builds of Chromium, but > only on certain websites. > > Here's a couple of examples: > https://www.sfgate.com/weather/article/sf-flood-warning-california- > atmospheric-river-19937062.php > > https://www.si.com/nba/celtics/news/celtics-jayson-tatum-reveals-the-simple-reason-boston-took-down-undefeated-cavaliers > > Is this the right API to use for this scenario? I'd like to replicate the > behavior of the JS API. > > -- > You received this message because you are subscribed to the Google Groups > "blink-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to blink-dev+unsubscr...@chromium.org. > To view this discussion visit > https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZE-fbBq1zRXJq%2Bk57RAVCbom2a%3DNLkgcMKKVss3ifbAhg%40mail.gmail.com > <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZE-fbBq1zRXJq%2Bk57RAVCbom2a%3DNLkgcMKKVss3ifbAhg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscr...@chromium.org. To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHgVhZUeRgtfxbNuspJwUkshyb5oo6b1QOoCjjUe3D6_9LxiWg%40mail.gmail.com.