> On Dec. 8, 2015, 10:55 p.m., Yi Pan (Data Infrastructure) wrote: > > @Amit, thanks for putting up this patch. I have two high-level comments: > > * I would prefer to keep the return type of seek() as an iterator. The > > pattern of the current API looks fine to me: all() and range() gives > > iterators. All it needs is to add seek() API, which also gives an iterator > > as a returned value which starts w/ certain position. With that, > > seekToFirst() and newIterator() are not needed. And what's the particular > > use case for seekToLast()? It does not seem to be useful. Do you have any > > specific use case for it? It would also appears more natural if we just add > > APIs like: first(), last(), instead of seekToLast(). > > > > Thanks! > > Amit Yadav wrote: > @Yi, I will share my thoughts here about designing this change this way. > 1. Seperation of Concerns > Providing an iterator is a funationality of a Store, whereas Iteration > is a property of the Iterator itself. The two APIs newItearator and seek*() > exactly provides this seperation of concern. With this, an iterator can be > created and then seek could be called multiple times on that iterator > allowing random iteration if needed. Keeping these seperate from each other > seems a more elegent solution, IMHO. The API 'all' effectively provides a > convenience function which combines 'newIterator' and 'seekToFirst'. This > design, however, provides more flexible iteration using a single iterator and > therefore is more efficient (If seek() returns an Iterator, then that could > only be used to iterate to the 'next' element) > 2. 'seekToLast' provides a way to iterate in the backward direction, > should the underlying store supports it. RocksDB supports this therefore It > make sense to expose that functionality and include that in the interface > definition of the Iterator. > > Given that we do not always know all the way a samza processor would want > to use an iterator, I thought we could aim to provide this functionality and > make it available. Given, that the changes are minimal to do so, it seems > like a good think to have. Thoughts? > > Yi Pan (Data Infrastructure) wrote: > Hi, @Amit, thanks for the detailed explanation. I understand your ideas > fully now. Yes, I agree that providing seek functionalities to the iterator > is reasonable. The concern I had is: the implementation of KeyValueIterator > in-memory store is NOT-SUPPORTED. I would prefer either the seek > functionalities are added to all key value stores' iterators, or create > another SeekableIterator extends KeyValueIterator as the return type from > newIterator() (maybe changing it to getSeekableIterator()?). > Then, in-memory store can throw exceptions on getSeekableIterator() and > there is no need to add the new seek functions to InMemoryIterator(). Then, > for SeekableIterator, all seek functions in your proposal would apply, in > addition to all the basic iterator methods defined in KeyValueIterator. > Please let me know whether that sounds good to you.
@Yi, Thank you for your comments. I did consider the path you suggested as noted in point 1 of the description. However, createing a SeekableIterator would require creating a almost duplicate hiarchy of all the classes which are implementing the KeyValueStore/Iterator interface, which IMHO is not a clean solution. The idea of Optional methods in an interface defination is not new either, java.utils.iterator has remove() API which is optional. The current implementation ensure that the existing client can take advantage of this new functionality without require too many changes in their code as well. - Amit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41068/#review109417 ----------------------------------------------------------- On Feb. 26, 2016, 6:37 p.m., Amit Yadav wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/41068/ > ----------------------------------------------------------- > > (Updated Feb. 26, 2016, 6:37 p.m.) > > > Review request for samza and Yi Pan (Data Infrastructure). > > > Repository: samza > > > Description > ------- > > Added Seek functionality to KeyValueStore and KeyValueStoreIterator. The > following alternatives were considered > > 1. Create a 'SeekingKeyValueIterator' and 'SeekingKeyValueStore' - Have a > completely separate classes which extends their existing > counterparts. This solution is appealing as it provides separation of > concerns. However, this also requires creating a complete > hierarchy of all the classes which are implementing the > KeyValueStore/Iterator interface. Therefore the effort to value didnt > made much sense. > > 2. Use the existing 'range' functionality - This solution requires a new > iterator to be created which could supply the 'end' > key for the range. We could make this user configurable, however this > assumes that the user can somehow come up with an > end key in all the cases. This assumption may not be true all the times. > Also, I feel this is more of a workaround then > a real solution, more of range degenerated to a seekTo functionality and > therefore is not a clean interaface. > > 3. Add seek functionality to existing interfaces and classes :- In the end, > this is what I implemented since it is the cleanest solution > in amongst all the options. This also requires minimum changes. The > downside of this approach is that not al stores can support seek > to a particular key (especially, the one derived from java iterators). > > > Diffs > ----- > > samza-api/src/main/java/org/apache/samza/storage/kv/KeyValueIterator.java > 854ebbfa640e96a87977ae25b67736ef57fadd8e > samza-api/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java > b1fea7bc38d1a44e7bba6bdeb638dc33c0936c6f > > samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala > 72f25a354eaa98e8df379d07d9cc8613dfafd13a > > samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala > d614f8a58882628129ec30c0fe8800395d60ed99 > samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala > 9a5b2d5f8f76220dfc8637823e2b63dffc138a5d > > samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStoreMetrics.scala > df8efaeefba829a47cb744d7a755cbe9e1f562f1 > > samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala > e5a66a4770b9553a1cc48fbb505f52d123c6c754 > > samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngineMetrics.scala > 233fba91caf041bfb78189efef00ce8fc56f9f15 > > samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala > 967d5097253640ee41ee84e551e15fa2285c00e2 > samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala > 7bba6ff37d8266674e7f15c10c7c146f4a41fc91 > > samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStoreMetrics.scala > 743151a310792d4a6ff48ea102e796eb9f630bba > > samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala > 3de257c3117c16b7cfdb7a3d7bb28d5cddc7c10a > > samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala > 8e183efcdec6fd3f921fc2bfe1971c95715930ed > > samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStoreMetrics.scala > 841e4a236da99cbfe121054654c648887f39c3f6 > samza-kv/src/test/scala/org/apache/samza/storage/kv/MockKeyValueStore.scala > 595dd0df6fde50f91ab5a046a193559326f2a1d5 > > Diff: https://reviews.apache.org/r/41068/diff/ > > > Testing > ------- > > > Thanks, > > Amit Yadav > >