https://bugs.kde.org/show_bug.cgi?id=375573
--- Comment #3 from Mario Frank <mario.fr...@uni-potsdam.de> --- Hey Dan, I will answer inline since there are some things that came me in mind. (In reply to Dan Dascalescu from comment #2) > Hey Mario, > > Thank you for the explanation. I understand the tradeoff - accuracy in > reporting the number of dupes, vs. speedy processing. The solution I propose > revolved around lazy calculation - does the user care more about a precise > number shown next to the album *when they get to see it*, or to be able to > move on to examine the other duplicates in the cluster? I would expect the latter to be more important than the accuracy. Thus, delaying is an option for me. > > I mentioned "when they get to see it" because after the user deletes one of > the duplicates, the list of duplicate clusters in the left pane always > scrolls to the top (IMO this could be improved to try to keep the scroll > position, but digiKam probably just re-sorts the list), so if they were > working on a duplicate cluster below the fold (i.e. if they have scrolled > down at all), the number of duplicates in that album won't be visible > anyway. In fact, when you deal with many clusters of duplicates, only those > items at the top, according to the sort order (Ref. images filename, # of > items, or Avg. similarity) will be visible. Okay, let's switch to your terminus. With duplicates albums, we refer to what you call duplicates clusters (internally called search albums), i.e. the entries in the left table - one duplicates album is one entry here. Scrolling to the top is really annoying. This could be resolved. But I will come to that later. > > Not sure what you meant by "one duplicates album" (needs to be adjusted) - > did you mean a cluster (in DUFF terminology, http://duff.dreda.org/) of > duplicates (which may be spread across different albums), or an album that > contains duplicates, so the count of items in the album needs to be > adjusted? In the latter case, that count is even farther from the user's > attention, because the user is in the Fuzzy tab, vs. in the Albums tab. > Could the recalculation of counts be done only once, when the user leaves > the Fuzzy tab? > > Also, there are two different scenarios I see when it comes to deleting > duplicates: > > 1) Deleting images in duplicate clusters one by one, while the user looks at > the picture in Preview Mode, to examine it in as large of a size as > possible. In this case, only one image is deleted at a time. Would counts be > easier to decrement in this case? Yes, this was my first approach when I tried to fix the referenced bug. But the fact that the image should also vanish from other duplicates clusters would have forced me to decrement there, too. But the count of images is defined in the internal search albums in the way that the count is the count of image ids. And the cluster list does not know how many of the images are existent. Nevertheless, it is technically possible to get the cluster list to know which images still exist and which do not. But then again, the average similarity is not correct anymore as it is calculated on the complete set of images. This could be also solved by the fact that I introduced the similarities between images in database shortly before release of 5.4. > > 2) Staying in Thumbnails or Table, selecting multiple images, and deleting > them at once. > > Finally, question about "the deleted image may be member of other duplicates > albums" (this relates to the cluster vs. album distinction) - is the > duplicate relationship transitive? I mean, if images A and B are dupes > within the similarity range, and B is part of another cluster of duplicates, > A should be part of that cluster too, which means only two counts need to be > updates: the number of dupes in that cluster, and the number of items in the > album the image belongs to. Theoretically, you are right. If image A is a duplicate of reference images B and C, the images B and C have *some* similarity, too. But as in audio streams - if stream a is part of stream b and c, the latter streams have *some* similarity in *some* position. Perhaps the similar parts are only 2 %. Depending on the given similarity range, this similarity is ignored. We cannot use transitive closures here. So, to roll up. If we have duplicates cluster A and we delete some image that is also part of duplicates cluster B, we need to update both clusters - in some way: rescanning/decrementing counts. If we delete the reference image of cluster A itself, the cluster would currently vanish. As consequence, the internal search album is removed and you lose context. This is a problem which was not addressed in the referenced bug. And this is a real disturbance in the workflow. I would thus propose the following: the removal of an image in some duplicates album should signal the list of duplicates clusters to update. The count of images in clusters is recalculated by getting the information which images still exist. At the same time, the new average similarity is calculated with the similarities of the remaining images to the reference image. All duplicates clusters which only contain one image are removed from the list as they are not relevant anymore. This all should be technically quite easy to implement until the release of 5.5. What do the other devs think? If this is confirmed, I would do that after I am finished with my small garbage collection project. -- You are receiving this mail because: You are watching all bug changes.