On Sun, 25 Aug 2013 17:14:37 -0700, Phil Steitz wrote:
On 8/25/13 9:59 AM, Gilles wrote:
On Sun, 25 Aug 2013 09:19:41 -0700, Phil Steitz wrote:
On 8/25/13 8:10 AM, Gilles wrote:
On Sat, 24 Aug 2013 21:55:36 -0000, pste...@apache.org wrote:

[...]

I wonder whether the utility should involve the creation of
an instance of a bona fide "Combination" class.
I.e. instead of a "naked"
  Iterator<int[]> combinationIterator(int n, int k)
there would be
  Iterator<Combination> combinationIterator(int n, int k)
where "Combination" would provide an accessor
  public int[] toArray() {
    // ...
  }

I need the "naked arrays" generated very quickly with minimal
overhead for the K-S use case.

The only "overhead" would be the creation of an instance for
wrapping the "int[]".
[As allocation is fairly cheap in Java, that should not be a
problem.]

My use case uses the arrays
directly, but others could use the indices to extract objects from
lists.  For example,  what I thought would be natural would be to
add
public static <T> Iterator<List<T>> combinationsIterator(List<T>
list, int k)

Would this be the base implementation, from which your case would
be implemented by retrieving "Integer" from the list and copy/convert
them into an "int[]"?

No, the opposite actually.  The base impl is what exists now.  It
works efficiently directly on an int[] array to compute the indices
within the source list (in my use case, {0, 1, ..., n-1} which is
actually canonical).  Then if you want to source subsets of objects
from a list of objects, you can compute the int[] array that
actually *is* the combination and extract the corresponding elements
from the list.

I got it.



What you would get out of this is element lists.  And the
implementation could postpone copy / creation until the indices have
been selected using the existing method.

I don't follow ...

See above.  I am probably still not explaining clearly what I have
in mind.  To me, a "combination" is a subset of the integers {0,
..., n-1}.  Technically it is a set, not a list, but there is a
natural order available and enumeration can work easier if you use
that order and adopt the convention that you represent combinations
as ordered lists of integers.  The LexicographicCombinationIterator
that I committed does that, using primitive arrays to represent the
lists.  The iterator itself uses a primitive array of int[] to
maintain state and create successive iterates.  Use of a single
primitive array eliminates all object creation / garbage collection,
which is not necessary to just enumerate the combinations.  Once you
have a "true" combination represented as an array of int[], you can
use it to specify a "combination"  (really sample or subset) of a
list of any kind of object.

So, IIUC it would be useful to have a utility like
-----
 public static List<T> getSample(List<T> elements,
                                 int[] combination) {
   final List<T> samples = new ArrayList<T>(combination.length);
   for (int index : combination) {
     samples.add(elements.get(index));
   }
 }
-----


 I am not yet sold on the
value of "Combination" as an abstraction, since what you really use
is just the list of selected elements.

Maybe "Combination" is not worth implementing indeed; but what about
a "Combinations" class:

public class Combinations implements Iterable<int[]> {
  private final int n;
  private final int k;

  public Combinations(int n, int k) {
    // ...
  }

  public Iterator<int[]> iterator() {
    return new LexicographicCombinationIterator(n, k);
  }

  public Comparator<int[]> comparator() {
    return new LexicographicCombinationComparator(n, k);
  }

  private static class LexicographicCombinationIterator implements
Iterator<int[]> {
    // [Your implementation]
  }

  private static class LexicographicCombinationComparator
implements Comparator<int[]> {
      public int compare(int[], int[]) {
      // ... using e.g. "lexNorm" implementation.
    }
  }
}

Being "Iterable" allows nice usage in a "for" loop:

for (int[] combination : new Combinations(4, 2)) {
  // ... etc.
}

I thought about that and that is what I meant by the original
suggestion that we possibly make the
LexicographicCombinationsIterator a public class.  I just don't see
the practical use cases for it as a standalone class.

The "for"-loop example above uses the iterator functionality but
in a much less verbose way (no explicit calls to "hasNext()" and
"next()").
This is standard practice for everything that can be handled as a
collection.

Then, we can also have your iterator factory method:
-----
public static Iterator<int[]> combinationsIterator(int n, int k) {
  return new Combinations(4, 2).iterator();
}
-----

Then, we also don't hide some code in "test" where it could be made
visible and be used elsewhere (cf. "lexNorm").

 What I have
immediate practical need for is just a way to get an Iterator<int[]>
based on <n,k>.  As you note above, it is the Iterator that is
useful.  I can imagine use cases for the object-list version
public static <T> Iterator<List<T>> combinationsIterator(List<T>
list, int k ) though.  If we do extract a public class, I would
prefer that it be called something like CombinationsIterator.

I think that a design based on the "Iterable" interface is preferrable.
From it you can always get the iterator functionality (by definition of
the interface), while the reverse is not true.

If we
have other iteration strategies that we want to implement, we could
make it an abstract parent for LexicographicCombinationsIterator.  I
would recommend holding off complicating the design / structure
though until we have these though.

Sorry to insist but it is not a complication, it is a _simplification_ ;-) Instead of a specifically named method ("combinationsIterator"), we rely
on the standard API ("Iterable") of the Java language.


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to