Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

Wyatt Epp Sat, 25 Jun 2011 03:48:01 -0700

On Sat, Jun 25, 2011 at 02:49, Kent Fredric <kentfred...@gmail.com> wrote:
> I'm strongly of the mind that by making the tag system arbitrarily
> flat, you might be prematurely limiting yourself, as well as risking a
> future where the "tag index" is a sea of meaningless words.
>
> Tags in my mind, should be grouped by the sort of information they are
> trying to convey, as opposed to being arbitrary and completely
> un-grouped.
>
> The present category system only has one namespace, which is more or
> less "what-you-use-it-for", and if your tag system is likewise going
> to take that vector as the only approach, you will ultimately end up
> duplicating the category system, albeit without the present limitation
> that means one package can only exist in one place.
>
> This need not be the case, we can suggest alternative tag namespaces,
> such as : The sorts of files it supports working with, the sorts of
> things it can read, the sorts of things it can write.
>
> At present, things that migrate one type of media to another, such as
> pdf -> image , image -> pdf, image -> video , video -> images , etc
> have to be forced to a sort of useless categorisation system.
>
> However, if via tag data, we were able to annotate a) what can be
> written and b) what can be read, this system could be leveraged to
> epic proportions of win.
>
Okay, apologies in advance for my long-windedness.  I hope this all
makes sense to everyone.


I should probably clarify that cloying strictly to flatness is not
what I'm proposing.  Reality has borne out the need for implications
and aliases in sanitising an unruly dataset with a complex
user-generated index, while arbitrary democratised group building has
improved some aspects of discovery.  However, I would consider these
features to be a lower priority than having a system at all.

So to break it down:
Tags - a concise vocabulary used for search.  In their default state
they are untyped and non-hierarchical.  They identify traits of a
package.  Suggest using lower-case and simple, descriptive naming
conventions. Highest priority.
Example: alien {{converter nogui package_management reads_tgz
reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm
writes_pkg writes_slp writes_lsb}}

Alias - a relationship between two tags establishing equivalence.
Query of the left term returns results of the right.  This type of
relationship helps reduce dictionary clutter. Low priority.
Example: sound = audio.  Attempting to add "sound" to a package will
instead add "audio" and searches for sound will return the results for
audio.

Implication - a relationship between two tags where the presence of
the left term necessarily requires the right.  This relationship
reduces menial work.  Low priority.
Example: mpd -> audio.  Adding "mpd" to the package will also add "audio".

Kent, your idea is pretty interesting and I rather like it.
Fortunately, it's completely possible within the context of the basic
flat layout, as I outlined with Alien above.  It probably looks ugly
to you-- this is no illusion; it's pretty ugly.  But it also grants us
the flexibility to get a basic system in place quickly and without a
lot of hassle.  We get 90% of the benefit up front, and can extend it
as necessary.

Unfortunately for "real" hierarchical methods, people still have
difficulty with even simple metadata systems.  Fetch some MP3s off the
internet and check their tags or look at search engine queries and
you'll find an entire class of people hampered by what is currently a
largely alien art.  In the end, this system needs to be usable by
people and by keeping it primarily flat, we ease the conceptual
overhead of its implementation and its use.  If it can't be
implemented on itch-scratching timescales, we have failed.  If people
can't use it with very little learning curve, we have failed.

A word on vocabulary:
As you've no doubt noticed, there seems to be a degree of combinatoric
explosion of tags in the method I propose.  In practical use, it's not
as bad as it looks.  For Gentoo, I'd recommend a basic "canonical"
list of general tags based on the current category system (subject to
discussion and addition/subtraction) and incorporate suggestions like
Kent's as they come up.  It's okay to control the vocabulary.  What
you find is that after the initial implementation, it grows fairly
slowly. (Even with reads_* and writes_* the number will probably be
south of 500 tags for a long time; the current categories dissolve
into about 175 tags from what I can see.)

Regards,
Wyatt

Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

Reply via email to