I'm new to Python and programming. Been learning it for 3 weeks now but have 
had lot of obstacles along the way. I found some of your insights very useful 
as a starter but I have come across many more complicated challenges that 
aren't very intuitive.

For example,I'm trying to scrap this web(via university library (fully access) 
so it is a proxy) using selenium (because it is very heavily java script 
driven). There is a button which allows user to navigate to the next page of 
company and my script go and find the elements of interest from each page write 
to a csv and then click to the next page and do it recursively. I have a couple 
of problems need some help with. Firstly the element that I'm really interested 
is only company website(which isn't always there) but when it is there the 
location of the element can change all the time(see 
http://pasteboard.co/2GOHkbAD.png and http://pasteboard.co/2GOK2NBT.png) 
depending on the number of elements in the parent level. I'm using 
driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]")

hoping to capture all information(e.g. phone,email,website) and then do some 
cleansing later on However, it appears not all the web elements are captured 
using this method and write to csv from each page. Some pages were written to 
the file but some were missing. I couldn't figure it out why.

A second problem which is a more complicate and have been driving me nuts was 
the the DOM changes as a result of web content changes and elements are 
destroyed and/maybe being recreated after 
driver.find_element_by_id('detail-pagination-next-btn').click()

I have tried uncountable number of methods (e.g. explicit, implicit wait) but 
the stale error still persists as it seems to stays stale as long as it is 
staled.

Have anyone come up with a solution to this and what is the best way to deal 
with DOM tree changes.

Much appreciated for your help. My code is attached:


with open('C:/Python34/email.csv','w') as f:
z=csv.writer(f, delimiter='\t',lineterminator = '\n',)
while True:
        row = []
        for link in 
driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]"):
            try:
                row.append(str(link.text))
                z.writerow(link.text)
                WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 
50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 
50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]")))
                time.sleep(10)
                c=driver.find_element_by_id('detail-pagination-next-btn')
                WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 
50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 
50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]")))
                c.click()
                time.sleep(10)
                continue
            except StaleElementReferenceException as e:
                c=driver.find_element_by_id('detail-pagination-next-btn')
                for link in 
driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]"):
                    row.append(str(link.text))
                    z.writerow(link.text)
                    WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                    WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                    WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 
50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 
50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]")))
                    time.sleep(10)
                    c=driver.find_element_by_id('detail-pagination-next-btn')
                    WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                    WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                    WebDriverWait(driver, 
50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 
50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 
50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position()
 = 1 or position() = 2 or position() = 3]")))
                    c.click()
                    time.sleep(10)


much appreciated
Iverson
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to