+mcrouse@

On Tuesday, November 26, 2024 at 1:46:23 PM UTC-8 Salvador Guerrero Ramos 
wrote:

> I added log statements to that method as well as its caller 
> InnerTextBuilder::Build 
> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/inner_text_builder.cc;l=36;drc=1651676a30cd7abcd177975f7cd0e37bd945f663>.
>  Build() 
> gets the innerText of the HTMLElement it receives, and then iterates 
> through its child_iframes and calls ShouldContentExtractionIncludeIframe on 
> each one. According to the logs the innerText from the argument already 
> contains the iframe text, even before iterating through the iframes.
>
> On Android ShouldContentExtractionIncludeIframe gets called with 3rd party 
> iframes, and it correctly determines that the origins are different so it 
> returns false. On desktop, ShouldContentExtractionIncludeIframe only gets 
> called on about:blank frames.
>
> --Salvador
>
> On Tue, Nov 26, 2024 at 10:23 AM Dave Tapuska <dtap...@chromium.org> 
> wrote:
>
>> You really need to debug ShouldContentExtractionIncludeIframe 
>> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/document_chunker.cc;drc=1651676a30cd7abcd177975f7cd0e37bd945f663;bpv=1;bpt=1;l=61>
>>
>> On Tue, Nov 26, 2024 at 1:15 PM 'Salvador Guerrero Ramos' via blink-dev <
>> blin...@chromium.org> wrote:
>>
>>> Hi
>>>
>>> I've been working on a prototype that uses the Element.innerText 
>>> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/editing/element_inner_text.cc;l=466;drc=277f4ab48eb85f7441f78aed191c31068ce89814>
>>>  
>>> API to get text from a web page (I'm calling this API with 
>>> InnerTextAgent 
>>> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/inner_text_agent.cc;l=56;drc=277f4ab48eb85f7441f78aed191c31068ce89814>).
>>>  
>>> In some web pages the resulting text includes text from cross-origin 
>>> iframes (e.g. embedded tweets). My expectation is that this would work 
>>> similarly to the InnerText JS API 
>>> <https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText>, 
>>> which does not return iframe text.
>>>
>>> I'm able to reproduce this on Desktop and Android builds of Chromium, 
>>> but only on certain websites.
>>>
>>> Here's a couple of examples:
>>> https://www.sfgate.com/weather/article/sf-flood-warning-california-
>>> atmospheric-river-19937062.php
>>>
>>> https://www.si.com/nba/celtics/news/celtics-jayson-tatum-reveals-the-simple-reason-boston-took-down-undefeated-cavaliers
>>>
>>> Is this the right API to use for this scenario? I'd like to replicate 
>>> the behavior of the JS API.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "blink-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to blink-dev+...@chromium.org.
>>> To view this discussion visit 
>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZE-fbBq1zRXJq%2Bk57RAVCbom2a%3DNLkgcMKKVss3ifbAhg%40mail.gmail.com
>>>  
>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZE-fbBq1zRXJq%2Bk57RAVCbom2a%3DNLkgcMKKVss3ifbAhg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/5f4f4970-0177-49bc-9e49-68470d9ce7e2n%40chromium.org.

Reply via email to