IMO we should start working on NiFi 2.0 going forward and it sounds like a
good opportunity to make such changes in our components.


Le mar. 25 oct. 2022 à 19:33, Mike Thomsen <[email protected]> a
écrit :

> The hash-based deduplication strategy used the built-in "md5"
> attribute to offload the work to the database. That functionality was
> deprecated and AFAICT gone as of Mongo 5:
>
> https://www.mongodb.com/docs/manual/core/gridfs/#files.md5
>
> I am proposing two changes:
>
> * Remove deduplication
> * Create a MongoDB DistributedMapCache client that can query on the
> file metadata since GridFS stores metadata separately from chunks
> making lookups that way cheap and flexible.
>
> I could easily add that to this PR which already covers Testcontainers
> integration, making it super easy to test the changed behavior:
>
> https://github.com/apache/nifi/pull/6460
>
> Thoughts?
>

Reply via email to