On Sat, Jun 25, 2011 at 02:49, Kent Fredric <kentfred...@gmail.com> wrote: > I'm strongly of the mind that by making the tag system arbitrarily > flat, you might be prematurely limiting yourself, as well as risking a > future where the "tag index" is a sea of meaningless words. > > Tags in my mind, should be grouped by the sort of information they are > trying to convey, as opposed to being arbitrary and completely > un-grouped. > > The present category system only has one namespace, which is more or > less "what-you-use-it-for", and if your tag system is likewise going > to take that vector as the only approach, you will ultimately end up > duplicating the category system, albeit without the present limitation > that means one package can only exist in one place. > > This need not be the case, we can suggest alternative tag namespaces, > such as : The sorts of files it supports working with, the sorts of > things it can read, the sorts of things it can write. > > At present, things that migrate one type of media to another, such as > pdf -> image , image -> pdf, image -> video , video -> images , etc > have to be forced to a sort of useless categorisation system. > > However, if via tag data, we were able to annotate a) what can be > written and b) what can be read, this system could be leveraged to > epic proportions of win. > Okay, apologies in advance for my long-windedness. I hope this all makes sense to everyone.
I should probably clarify that cloying strictly to flatness is not what I'm proposing. Reality has borne out the need for implications and aliases in sanitising an unruly dataset with a complex user-generated index, while arbitrary democratised group building has improved some aspects of discovery. However, I would consider these features to be a lower priority than having a system at all. So to break it down: Tags - a concise vocabulary used for search. In their default state they are untyped and non-hierarchical. They identify traits of a package. Suggest using lower-case and simple, descriptive naming conventions. Highest priority. Example: alien {{converter nogui package_management reads_tgz reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm writes_pkg writes_slp writes_lsb}} Alias - a relationship between two tags establishing equivalence. Query of the left term returns results of the right. This type of relationship helps reduce dictionary clutter. Low priority. Example: sound = audio. Attempting to add "sound" to a package will instead add "audio" and searches for sound will return the results for audio. Implication - a relationship between two tags where the presence of the left term necessarily requires the right. This relationship reduces menial work. Low priority. Example: mpd -> audio. Adding "mpd" to the package will also add "audio". Kent, your idea is pretty interesting and I rather like it. Fortunately, it's completely possible within the context of the basic flat layout, as I outlined with Alien above. It probably looks ugly to you-- this is no illusion; it's pretty ugly. But it also grants us the flexibility to get a basic system in place quickly and without a lot of hassle. We get 90% of the benefit up front, and can extend it as necessary. Unfortunately for "real" hierarchical methods, people still have difficulty with even simple metadata systems. Fetch some MP3s off the internet and check their tags or look at search engine queries and you'll find an entire class of people hampered by what is currently a largely alien art. In the end, this system needs to be usable by people and by keeping it primarily flat, we ease the conceptual overhead of its implementation and its use. If it can't be implemented on itch-scratching timescales, we have failed. If people can't use it with very little learning curve, we have failed. A word on vocabulary: As you've no doubt noticed, there seems to be a degree of combinatoric explosion of tags in the method I propose. In practical use, it's not as bad as it looks. For Gentoo, I'd recommend a basic "canonical" list of general tags based on the current category system (subject to discussion and addition/subtraction) and incorporate suggestions like Kent's as they come up. It's okay to control the vocabulary. What you find is that after the initial implementation, it grows fairly slowly. (Even with reads_* and writes_* the number will probably be south of 500 tags for a long time; the current categories dissolve into about 175 tags from what I can see.) Regards, Wyatt