On Fri, Sep 2, 2011 at 10:59 AM, Travis Parks <jehugalea...@gmail.com> wrote: > Hello: > > I am working on an algorithms library. It provides LINQ like > functionality to Python iterators. Eventually, I plan on having > feaures that work against sequences and mappings. > > I have the code up at http://code.google.com/p/py-compass. > > This is my first project in Python, so I'd like some feedback. I want > to know if I am following conventions (overall style and quality of > code).
Sure, here are my comments. In the "forever" and "__forever" functions, your use of the term "generator" is confusing. "__forever" is a generator function, because it has a yield statement. Its argument, called "generator", appears to be a callable, not a generator or even necessarily a generator function. Also, note that __forever(lambda: value) is functionally equivalent to the more efficient itertools.repeat(value). The staticmethod __next(iterator) accesses the class it is defined in, which suggests that it might be better written as a classmethod __next(cls, iterator). Each of the LINQ-style methods is divided into two parts: the public method that contains the docstring and some argument checks, and a private staticmethod that contains the implementation. I'm not certain what the purpose of that is. If it's to facilitate overriding the implementation in subclasses, then you need to change the names of the private methods to start with only one _ character so that they won't be mangled by the compiler. The comments before each method that only contain the name of the immediately following method are redundant. aggregate: the default aggregator is unintuitive to me. I would make it a required field and add a separate method called sum that calls aggregate with the operator.add aggregator. Also, the implementation doesn't look correct. Instead of passing in each item to the aggregator, you're passing in the number of items seen so far? The LINQ Aggregate method is basically reduce, so rather than reinvent the wheel I would suggest this: # MISSING is a unique object solely defined to represent missing arguments. # Unlike None we can safely assume it will never be passed as actual data. MISSING = object() def aggregate(self, aggregator, seed=MISSING): if seed is self.MISSING: return reduce(aggregator, self._iterable) else: return reduce(aggregator, self._iterable, seed) Note for compatibility that in Python 3 the reduce function has been demoted from a builtin to a member of the functools module. any: the name of this method could cause some confusion with the "any" builtin that does something a bit different. compare: the loop would more DRY as a for loop: def __compare(first, second, comparison): for firstval, secondval in itertools.izip_longest(first, second, fillvalue=self.MISSING): if firstval is self.MISSING: return -1 elif secondval is self.MISSING: return 1 else: result = comparison(firstval, secondval) if result != 0: return result return 0 concatenate: again, no need to reinvent the wheel. This should be more efficient: def concatenate(self, other): return extend(itertools.chain(self.__iterable, other)) equals: could be just "return self.compare(other, comparison) == 0" __last: the loop could be a for loop: # assume we're looking at the last item and try moving to the next item = result.Value for item in iterator: pass return item lastOrDefault: there's a lot of repeated logic here. This could just be: def lastOrDefault(self, default=None): try: return self.last() except ValueError: return default map / forEach: .NET has to separate these into separate methods due to static typing. It seems a bit silly to have both of them in Python. Also, map would be more efficient as "return itertools.imap(mapper, self.__iterable)" max / min: it would be more efficient to use the builtin: def max(self, key): return max(self.__iterable, key=key) If somebody really needs to pass a comparison function instead of a key function, they can use functools.cmp_to_key. randomSamples: a more canonical way to pass the RNG would be to pass an instance of random.Random rather than an arbitrary function. Then to get a random integer you can just call generator.randrange(total). Note that for the default you can just use the random module itself, which provides default implementations of all the Random methods. Also, for loops: def __randomSamples(iterable, sampleCount, generator): samples = [] iterator = iter(iterable) # fill up the samples with the items from the beginning of the iterable for i, value in zip(xrange(sampleCount), iterator): samples.append(value) # replace items if the generated number is less than the total total = len(samples) for value in iterator: total += 1 index = generator.randrange(total) if index < len(samples): samples[index] = result return samples __reverse: you could just "return reversed(list(iterable))" __rotateLeft: def __rotateLeft(iterable, shift): iterator = iter(iterable) front = list(itertools.islice(iterator, shift)) return itertools.chain(iterator, front) skipWhile: suggest using itertools.dropwhile take: suggest using itertools.islice takeWhile: suggest using itertools.takewhile __toList: "return list(iterable)" __toSet: "return set(iterable)" __toTuple: "return tuple(iterable)". Note that as currently written this actually returns a generator, not a tuple. __where: suggest using itertools.ifilter __zip: suggest using a for loop over itertools.izip(first, second) Lookup: is inconsistent. The overridden __iter__ method returns an iterator over the values of the groups, but all the inherited methods are going to iterate over the keys. Probably you want to pass groups.values() to the superclass __init__ method. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list