Hi all,

The current CacheBasedDataSet destroys the cache and all data along with
it...there is no option to turn this off either.

https://github.com/apache/ignite/blob/master/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/CacheBasedDataset.java#L189

/** {@inheritDoc} */
@Override public void close() {
    datasetCache.destroy();
    ComputeUtils.removeData(ignite, datasetId);
    ComputeUtils.removeLearningEnv(ignite, datasetId);
}


Why does it do this?
It means that using SqlDatasetBuilder will result in the data being deleted
after training a model.
We had to work around this with

var datasetBuilder = new SqlDatasetBuilder(repo.getCtx().getIgnite(),
cacheName, (k, v) -> {
  //*...*
});
var wrapper = new DatasetBuilder<Object, BinaryObject>() {
  @Override
  public <C extends Serializable, D extends AutoCloseable> Dataset<C,
D> build(LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<Object, BinaryObject, C> partCtxBuilder,
PartitionDataBuilder<Object, BinaryObject, C, D> partDataBuilder,
LearningEnvironment localLearningEnv) {
    var cbd = datasetBuilder.build(envBuilder, partCtxBuilder,
partDataBuilder, localLearningEnv);
    return new DatasetWrapper(cbd) {
      @Override public void close() {
        System.out.println("Dataset closed");
        //DO NOT call close. Cache based data set deletes the data in
the cache like some mad man!
      }
    };
  }

  @Override
  public DatasetBuilder<Object, BinaryObject>
withUpstreamTransformer(UpstreamTransformerBuilder builder) {
    return datasetBuilder.withUpstreamTransformer(builder);
  }

  @Override
  public DatasetBuilder<Object, BinaryObject>
withFilter(IgniteBiPredicate<Object, BinaryObject> filterToAdd) {
    return datasetBuilder.withFilter(filterToAdd);
  }
};

which works but seems very hacky.
Are we misusing the API somehow - examples/docs do not mention or indicate
anything about this as far as I've found.

Regards,
Courtney Robinson
Founder and CEO, Hypi
https://hypi.io

Reply via email to