On Sun, Jan 31, 2016 at 8:21 PM, Steven D'Aprano <st...@pearwood.info> wrote: > Hmmm. Well, I've never used lxml, but the first obvious problem I see is > that your lines: > > description = li_item.find_class('vip')[0].text_content() > > link = li_item.find_class('vip')[0].get('href') > > price_dollar = li_item.find_class('lvprice prc')[0].xpath('span')[0].text > > bids = li_item.find_class('lvformat')[0].xpath('span')[0].text > > > look suspiciously like a violation of the Liskov Substitution Principle. > ("Talk to your dog, not to the dog's legs!") A long series of chained dot > accesses (or equivalent getitem, call, getitem, dot, etc) is a code-smell > suggesting that you are trying to control your dog's individual legs, > instead of just calling the dog.
(Isn't that the Law of Demeter, not LSP?) The principle of "one dot maximum" is fine when dots represent a form of ownership. The dog owns his legs; you own (or, have a relationship with) the dog. But in this case, the depth of subscripting is more about the inherent depth of the document, and it's more of a data thing than a code one. Imagine taking a large and complex JSON blob and loading it into a Python structure with nested lists and dicts - it wouldn't violate software design principles to call up info["records"][3]["name"], even though that's three indirections in a row. Parsing HTML is even worse, as there's generally going to be numerous levels of structure that have no semantic meaning (they're there for layout) - so instead of three levels, you might easily have a dozen. ChrisA -- https://mail.python.org/mailman/listinfo/python-list