Veek. M wrote: > Chris Angelico wrote: > >> On Sun, Jan 31, 2016 at 3:58 PM, Veek. M <vek.m1...@gmail.com> wrote: >>> I'm parsing html and i'm doing: >>> >>> x = root.find_class(... >>> y = root.find_class(.. >>> z = root.find_class(.. >>> >>> all 3 are likely to fail so typically i'd have to stick it in a try. >>> This is a huge pain for obvious reasons. >>> >>> try: >>> .... >>> except something: >>> x = 'default_1' >>> (repeat 3 times) >>> >>> Is there some other nice way to wrap this stuff up? >> >> I'm not sure what you're using to parse HTML here (there are several >> libraries for doing that), but the first thing I'd look for is an >> option to have it return a default if it doesn't find something - even >> if that default has to be (say) None. >> >> But failing that, you can always write your own wrapper: >> >> def find_class(root, ...): >> try: >> return root.find_class(...) >> except something: >> return 'default_1' >> >> Or have the default as a parameter, if it's different for the different >> ones. >> >> ChrisA > > I'm using lxml.html > > def parse_page(self, root): > for li_item in root.xpath('//li[re:test(@id, "^item[a-z0-9]+$")]', > namespaces={'re': "http://exslt.org/regular-expressions"}): > description = li_item.find_class('vip')[0].text_content() > link = li_item.find_class('vip')[0].get('href') > price_dollar = li_item.find_class('lvprice prc') > [0].xpath('span')[0].text > bids = li_item.find_class('lvformat')[0].xpath('span')[0].text > > tme_time = li_item.find_class('tme')[0].xpath('span') > [0].get('timems') > if tme_time: > time_hrs = int(tme_time)/1000 - time.time() > else: > time_hrs = 'No time found' > > shipping = li_item.find_class('lvshipping') > [0].xpath('span/span/span')[0].text_content()" > > print('{} {} {} {} {}'.format(link, price_dollar, time_hrs, > shipping, bids)) > print('-----------------------------------------------------------------')
Someone suggested i refactor the find_class/xpath into wrapper functions but i tried it and it didn't look all that great.. Just give me a general idea of how to deal with messy crud like this.. -- https://mail.python.org/mailman/listinfo/python-list