I have done some webscraping before i think you need to get a slightly more tactical way to get these titles scraped . Try to see what classes identify the cards (in which movie title is given) and then try to pull the heading out of those. Try to get the divs in a list , something like this "<div class="jsx-2692754980 listicle-item-image ">" in my case and then try to pull the h3 tag out of it . Onething to note is react os single page heavy webapps have seemed to be difficult to scrape maybe beautiful isnt made for JSX .
On Thu, Feb 18, 2021 at 9:09 PM Bischoop <bisch...@vimart.net> wrote: > > I'm learning Scraping actually and would like to scrape the movie titles > from https://www.empireonline.com/movies/features/best-movies-2 . > In the course I was learning I was supposed to do it with bs4: > titles = soup.find_all(name = 'h3', class_ = 'title') > > but after after a while I guess the site has changed and now the class > is: jsx-2692754980 > > <h3 class="jsx-2692754980">100) Stand By Me</h3> > > but anyway if I do try get those titles by name and class, my list is > empty: > titles = soup.find_all(name = 'h3', class_ = 'jsx-2692754980') > > I tried also selenium and manage get those titles with: > driver.get('https://www.empireonline.com/movies/features/best-movies-2') > > #driver.find_element_by_xpath('/html/body/div/div[3]/div[5]/button[2]').click() > > titles = driver.find_elements_by_css_selector("h3.jsx-2692754980") > > tit=[] > for e in titles: > tit.append(e.text) > > print(tit) > > But in Chrome I get a popup asking to accept cookies and I need to > click to accept them. > > Is someone here who knows how can I get those titles with BeautifulSoup > and how to deal with > cookies if using Selenium? > > -- > Thanks > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list