Veek. M wrote: > Chris Angelico wrote: > >> On Sun, Jan 31, 2016 at 3:58 PM, Veek. M <vek.m1...@gmail.com> wrote: >>> I'm parsing html and i'm doing: >>> >>> x = root.find_class(... >>> y = root.find_class(.. >>> z = root.find_class(.. >>> >>> all 3 are likely to fail so typically i'd have to stick it in a try. >>> This is a huge pain for obvious reasons. >>> >>> try: >>> .... >>> except something: >>> x = 'default_1' >>> (repeat 3 times) >>> >>> Is there some other nice way to wrap this stuff up? >> >> I'm not sure what you're using to parse HTML here (there are several >> libraries for doing that), but the first thing I'd look for is an >> option to have it return a default if it doesn't find something - even >> if that default has to be (say) None. >> >> But failing that, you can always write your own wrapper: >> >> def find_class(root, ...): >> try: >> return root.find_class(...) >> except something: >> return 'default_1' >> >> Or have the default as a parameter, if it's different for the different >> ones. >> >> ChrisA > > I'm using lxml.html > > def parse_page(self, root): > for li_item in root.xpath('//li[re:test(@id, "^item[a-z0-9]+$")]', > namespaces={'re': "http://exslt.org/regular-expressions"}): > description = li_item.find_class('vip')[0].text_content() > link = li_item.find_class('vip')[0].get('href') > price_dollar = li_item.find_class('lvprice prc') > [0].xpath('span')[0].text > bids = li_item.find_class('lvformat')[0].xpath('span')[0].text > > tme_time = li_item.find_class('tme')[0].xpath('span') > [0].get('timems') > if tme_time: > time_hrs = int(tme_time)/1000 - time.time() > else: > time_hrs = 'No time found' > > shipping = li_item.find_class('lvshipping') > [0].xpath('span/span/span')[0].text_content()" > > print('{} {} {} {} {}'.format(link, price_dollar, time_hrs, > shipping, bids)) > print('-----------------------------------------------------------------')
When you use XPath instead of the chained function calls your initial > Pass the statement as a string to a try function? idea works out naturally: def parse_page(self, root): def get_xpath(path, default="<not available>"): result = li_item.xpath(path) if result: return " ".join(part.strip() for part in result) return default for li_item in root.xpath( '//li[re:test(@id, "^item[a-z0-9]+$")]', namespaces={'re': "http://exslt.org/regular-expressions"}): description = get_xpath("*[@class='vip']//text()") link = get_xpath("*[@class='vip']/@href") price = get_xpath("*[@class='lvprice prc']/span/text()") bids = get_xpath("*[@class='lvformat']/span/text()") tme_time = get_xpath("*[@class='tme']/span/@timems", None) if tme_time is not None: time_hrs = int(tme_time)/1000 - time.time() else: time_hrs = "No time found" shipping = get_xpath( "*[@class='lvshipping']/span/span/span//text()") -- https://mail.python.org/mailman/listinfo/python-list