On Tue, 19 Jul 2011, C. P. Ghost wrote:
Speaking with my university sysadmin hat on: you're NOT allowed to
peek inside personal files of your users, UNLESS the user has waived
his/her rights to privacy by explicitly agreeing to the TOS and
there's legal language in the TOS that allows staff to inspect files
(and then staff needs to abide by those rules in a very strict and
cautious manner). So unless the TOS are very explicit, a sysadmin or
an IT head can get in deep trouble w.r.t. privacy laws.
Yes, but I am not an expert on privacy laws in France, and I suspect
you are not either. Whether examining the magic number (first four bytes)
of a file constitutes a breach of privacy is a matter for legal advice
applicable to the particular jurisdiction. You certainly can look at the
external package: file size and name.
You may want to look for files that are unusually large.
They could possibly be ISOs, dvdrips, HD movie dumps...
Not to forget encrypted RAR files (which btw. could contain anything,
including legitimate content, so be careful here).
We have the same problem here with users sharing movies on the file
servers, and what makes it worse is some of their movie files are
legit because they're, for example, official trailers that are
reworked and redistributed to our customers.
You won't win this, tell your boss it can not be done.
What can technically be done is that the copyright owner provides a
list of hashes for his files, and requests that you traverse your
filesystems, looking for files that match those hashes. AND, even
then, all you can do is flag the files, and you'll have to check with
the user that he/she doesn't own a license permitting him/her to own
that file!
You cannot generate a hash without at a certain automated level opening the
file. If you can do that, couldn't you generate a hash of the first four
bytes to match with hashes of known magic numbers? If you can "look" at the
whole file, surely you can "look" at just the first four bytes.
Of course software cannot determine legal issues, such as whether works are
properly licensed or are pornographic according to local legislation, etc.
However, even that isn't foolproof: nothing prevents a user from
flipping a bit or two, rescaling, resampling, splitting the files into
multiple files in a non-obvious manner, adding random bytes at the end
etc...: the result would still be infringing, but can't be detected
automatically (at least not in a reasonable amount of time).
This is a bit like security. There is no absolute that can be achieved.
You don't have to be smarter than God, you just have to be smarter than the
users. Now the whole point of infringing schemes is that most dumb users
have to be able to use the files they download. They can reasonablely do
things like rename the files or pass them through a commonly available
decoder. No point in trying to "file share" if users have to be the NSA to
play the music.
You can scan (where legal) for the common stuff. You can't find stuff
encoded by Dr. Evil Genius Hacker -- but neither can the party claiming to
be infringed and neither can Suzie Shebop who just wants free music.
--
Lars Eighner
http://www.larseighner.com/index.html
8800 N IH35 APT 1191 AUSTIN TX 78753-5266
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"