Hi my 2 cents. Have a look at scrapy for scraping.selenium is v good tool to learn but is mainly to automate uat of guis Scrapy will scrape for you and u can automate it via cron. It's same stuff I am doing ATM Hth
On Sun, Jan 27, 2019, 8:34 AM <mhysnm1...@gmail.com wrote: > All, > > > > Goal of new project. > > I want to scrape all my books from Audible.com that I have purchased. > Eventually I want to export this as a CSV file or maybe Json. I have not > got > that far yet. The reasoning behind this is to learn selenium for my work > and get the list of books I have purchased. Killing two birds with one > stone > here. The work focus is to see if selenium can automate some of the > testing I have to do and collect useful information from the web page for > my > reports. This part of the goal is in the future. As I need to build my > python skills up. > > > > Thus far, I have been successful in logging into Audible and showing the > library of books. I am able to store the table of books and want to use > BeautifulSoup to extract the relevant information. Information I will want > from the table is: > > * Author > * Title > * Date purchased > * Length > * Is the book in a series (there is a link for this) > * Link to the page storing the publish details. > * Download link > > Hopefully this has given you enough information on what I am trying to > achieve at this stage. AS I learn more about what I am doing, I am adding > possible extra's tasks. Such as verifying if I have the book already > download via itunes. > > > > Learning goals: > > Using the BeautifulSoup structure that I have extracted from the page > source for the table. I want to navigate the tree structure. BeautifulSoup > provides children, siblings and parents methods. This is where I get stuck > with programming logic. BeautifulSoup does provide find_all method plus > selectors which I do not want to use for this exercise. As I want to learn > how to walk a tree starting at the root and visiting each node of the tree. > Then I can look at the attributes for the tag as I go. I believe I have to > set up a recursive loop or function call. Not sure on how to do this. > Pseudo > code: > > > > Build table structure > > Start at the root node. > > Check to see if there is any children. > > Pass first child to function. > > Print attributes for tag at this level > > In function, check for any sibling nodes. > > If exist, call function again > > If no siblings, then start at first sibling and get its child. > > > > This is where I get struck. Each sibling can have children and they can > have > siblings. So how do I ensure I visit each node in the tree? > > Any tips or tricks for this would be grateful. As I could use this in other > situations. > > > > Sean > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor