A SequencedCollection listIterator() is missed; it's useful for List.equals, or to implement a List in general. LinkedHashSet seems incomplete without it. Even something like "ListIterator Collections.forwardListIterator(int index, iterator)", if compatible with equals() of the various list types, would be handy. Doesn't forward only work for sublist?. Only traversal, optional modification. Modification is tricky; an add might have an implicit remove for a Set. If not part of SequencedCollection, maybe "interface ListIteratorProvider { ListIterator listIterator(int index); }" to tag as needed.

This message is primarily about "definition of change and equals" and "pluggable eviction policy"; they're discussed through a simple MRU example implemented with a SequencedSet and a capacity limit. The 'changed' concept comes from Collection<>.

// MRU implementation recommendations:
//      - implement List<> for equals
//      - modifications only permitted through direct MRU methods
public interface MRU<E> extends Collection<E> {
    boolean addItem(E key);        // return changed
    boolean removeItem(E key);     // remove(E) contract
    boolean setCapacity(int size); // return changed
    int getCapacity();
    void clear();
    boolean addAll(Collection< ? extends E> c);
    public static <E> MRU<E> synchronizedMRU(MRU<E> mru) {...}
}

To check out SequencedCollection I sketched out an implementation of MRU, see below. Currently I have two implementations based on LinkedList and LinkedHashSet, each has some ugly stuff. The implementation based on SequencedSet below is clean. All the MRU methods are easily implemented without consideration of the actual implementation, but AddAll() is problematic. removeItem() is included to allow external EvictionPolicy.

Is there a preferred direction for performance? I'm guessing the only difference might be going through a "reversed()" view.

Definition of "change" and "equals()" in a SequencedCollection
==============================================================
The SequencedCollection uses "void addFirst(e)", there's no indication if the collection is changed; this is the only place I'm aware of in the Collections universe where a functional add operation does not indicate changed when it may or may not change the collection (I didn't look, just guessing). List and Set both have "changed" consistent with their "equals()".

Consistent with List equals, the MRU addItem() implementation returns changed true if either sequence or content changed.

The MRU implementation as shown may have an inaccurate return from addAll(); it's possible that "changed == true && before.equals(after)". Collection specifies the addAll() return. It's expensive to produce an accurate result from addAll(). I suppose it could be argued that the definition of changed is ambiguous in this case. Document that AddAll() may return a false positive.

Thinking out loud... In SequencedSet a method getChanged() returning an EnumSet with SEQUENCE_CHANGED, CONTENTS_CHANGED bits. Could automatically clear the bits on every operation or require an explicit call to clear the bits. This wouldn't help the addAll() problem, but supports a workaround for addFirst/addLast.

pluggable Eviction Policy
=========================
In my earlier message, when I shifted to refer to an eviction policy, I acknowledge that even a simple builtin policy was inappropriate. While this thread may not be the best place to discuss it, I'll make this comment here; can start another thread if there's reason.

It's tricky to extend a Collection and implement an Eviction Policy without support from the collection. For example, if there are several ways to add to the collection, then in general each must be intercepted and handled. It may be that simply overriding add(E) is sufficient, but the source for the implementation must be examined for each implementation and on every release.

This suggests an "interface EvictionPolicy" (or EvictionPolicySupport) which, at a minimum, is invoked whenever the collection's size increases; could be invoked whenever the collection is modified. Something like an optional setEvictionPolicy(ep), or solely by implementation.

MRU implementation example
==========================

/** MRU SequenceSet implementation. Newest/mru at start of list */
public abstract class AbstractMRU<E>
extends AbstractSequentialList<E> implements MRU<E> {
    protected final SequencedSet<E> seqset;
    protected int capacity = Integer.MAX_VALUE;

    public AbstractMRU(SequencedSet<E> c, int capacity) {
        this.seqset = c;
        setCapacity(capacity);
    }

    @Override public int size() { return seqset.size(); }
    @Override public Iterator() { return seqset.iterator(); }
    @Override public boolean addAll(Collection<? extends E> c) { addAllInFront(c); }
    @Override public void clear() { seqset.clear(); }
    @Override public boolean addItem(E key) {
        if(key == null)
            throw new IllegalArgExcept();
        if(Objects.equals(key, peekMRU())) return false; // no change
        seqset.addFirst(key);
        evict();
        return true; // must have changed since not early return
    }
    @Override public boolean removeItem(E key) { return seqset.remove(key); }

    @Override boolean setCapacity(int capacity) { // return changed
        if(capacity < 0)
            throw new IllegalArgumentException("Size must be >= 0");
        this.capacity = capacity;
        return evict();
    }
    @Override public int getCapacity() { return capacity; }

    protected E peekMRU() { return seqset.isEmpty ? null : seqset.getFirst(); }
    protected boolean evict() { // return true if size changed
        int extra = size() - capacity;
        if(extra <= 0)
            return false;
        while(--extra >= 0) seqset.removeLast();
        return true;
    }
    protectd abstract boolean addAllInFront(Collection<? extends E> c);
}

/**
 * Newest at the front of the list (opposite of LinkedHashset defaults).
 * This is not dependent on how SeqSet is implemented,
 * but just put it down here to get the mess out of the way.
 */
public LinkedHashSetMRU<E> extends AbstractMRU<E> {
    public LinkedHashSetMRU(int numElements) {
super(LinkedHashSet.newLinkedHashSet<E>(numElements), numElements);
    }

    protected boolean addAllInFront(Collection<? extends E> c) {
        // TODO: check null item
        boolean modified;
        if(c instanceof SequencedCollection sq) {
            for(e : sq.reversed())
                modified |= seqset.addFirst(e);
        } else {
            // lazy
            for(e : new ArrayList(c).reversed())
                modified |= seqset.addFirst(e);
        }
        return evict() || modified;
    }

    // TODO:    implement ListIterator's iterator(index),
    //          only forward iteration should be OK for equals.
    // ListIterator listIterator(int index)
    //     { return Collections.forwardListIterator(index, seqset.iterator()); }

    // Could handle equals for both List,Set.
    // Just List equals seems OK. Use MRU.containsAll directly for Set equals.
    @Override public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof List))
            return false;
        Iterator<E> e1 = seqset.iterator();
        Iterator<?> e2 = ((List<?>) o).iterator();
        while (e1.hasNext() && e2.hasNext())
            if(!Objects.equals(e1.next(), e2.next())
                return false;
        return !(e1.hasNext() || e2.hasNext());
    }
}


On 9/21/22 5:38 PM, Stuart Marks wrote:
Hi, yes, this is the right place to discuss Sequenced Collections. I'm glad you find it promising.

Note that Sequenced Collections actually has very little implementation in it, aside from various reversed views of things. The actual data is still stored in existing concrete collections such as ArrayList, ArrayDeque, LinkedHashMap, and TreeMap.

I think Sequenced Collections has the right set of abstractions in it as it stands, and I don't want to expand its scope by talking about additional concepts like size limits or eviction policy.

However, those things are quite reasonable to discuss independently of the current Sequenced Collections proposal. Having a maximum size on a collection seems independent of sequencing. An eviction policy *might* be based on the sequence, but it might not; consider the various eviction policies available for a cache library such as Caffeine [1].

[1] https://github.com/ben-manes/caffeine/wiki

However, I'm somewhat skeptical of trying to build things like eviction policies directly into collections. It's tempting to add a simple thing like a size and just throw away things in some well-defined order whenever the size is exceeded. The problem is that if this policy doesn't do *exactly* what you want to do, then you're out of luck.

The current (pre Sequenced Collections) LinkedHashMap is a good example of this. It's suitable for a least-recently-inserted expiration policy; there's a method removeEldestEntry() that programs can use to implement a simple policy, size-based or otherwise. (Unfortunately they have to subclass-and-override, but whatever.) The problem is that it allows removal of only one element -- the eldest (first) element.

If you want to change the policy of insertion order of an LHM, you have only one alternative: access order. Enabling this has some weird side effects though. For example, get() now rearranges the order of entries in the map, and is thus a structural modification -- which means that it spoils any iterators over the map's views.

These are both fairly common cases, which is probably why they were added. But they're not very flexible, and if you want to do something slightly different, you're on your own -- and it's pretty hard to implement your own policy, because LHM lacks a bunch of essential operations.

Where the Sequenced Collections proposal helps is that instead of adding more policies, it adds the missing primitive operations. You can add/get/remove at either end, and you can reposition mappings to either end. If you have some different recent-usage policy or some unusual cache eviction policy that I've never heard of, you can use the primitives to implement it yourself. That's much better than trying to bake a few more specific cases into LinkedHashMap or other collections.

Is there a discussion about making the SynchronizedCollection family of classes public?

No. Synchronizing on every collection operation is the wrong level of abstraction. Typical collection usage involves too much external iteration and too much check-then-act logic. Callers would have to wrap those in synchronized blocks, and in general they don't know when that's necessary. Certain transaction-style operations (like Map::computeIfAbsent) can be made to work, but those are all pretty low level.

s'marks



On 9/21/22 9:32 AM, Ernie Rael wrote:
 > I don't see why you think a general collection...

I thought the Subject would be sufficient to indicate that I was not talking about collections in general. I should have been more precise with my words; guess I was just excited by a bi-directional ordered set.

The MRU _example_ is useful; the current collections handle it poorly and Sequenced Collections is ideal. Caches with an eviction policy are common; I suspect caches will be a common use for SequencedSet family. Note there are fixed sized Collections and SequencedCollection borrows heavily from that family. Perhaps this issue should be considered in the context of adding an **Eviction Policy** to appropriate collections.

MRU is a Collection; for example, I pass an MRU to a persistence mechanism that takes a collection. Saying "all methods offered by `Collection` should [not] even be part of an `MRU` interface" is innacurate, especially when considered in the context of a SequencedCollection.

-ernie

PS - Loosely related is extending a Collection and providing a synchronized version. Is there a discussion about making the SynchronizedCollection family of classes public?


On 9/21/22 4:22 AM, John Hendrikx wrote:
I don't see why you think a general collection, that is in 99.9% of the cases not used to implement an MRU, should burden every call to #add with a check to see if it isn't exceeding its maximum size or to see if a maximum size has been set.

This is much better done by composition, as I don't think all methods offered by `Collection` should even be part of an `MRU` interface.

--John

On 20/09/2022 21:08, Ernie Rael wrote:
(There may be a better place to send this, let me know where)

Suggesting an option to limit the size of the collection, e.g "setMaxSize(int)", default of zero means no limit.

I put together "interface MRU<E> extends Collection<E>" some months ago, it has two implementations based on LinkedList and LinkedHashSet. The code can be seen at https://sourceforge.net/p/jvi/raelity-lib/ci/default/tree/lib/src/main/java/com/raelity/lib/

A SequencedCollection, as outlined in the JEP draft 2020/09/01, would be almost perfect to implement MRU; I've run into most of the problems/issues discussed in the JEP draft.

The MRU is a cache, as I use it; it typically has a max size for the collection. Handling this natively in the collection would be ideal; if an add operation would overflow, remove the item at the other end. Note that addAll() is used when initializing from backing store.

FYI, I use a "Supplier<Integer>" to the constructor to provide maxSize, but a property makes much more sense. I'll make that change in MRU for sanity, and get rid of the trim() method. setMaxSize can do the trim.

-ernie





Reply via email to