> > Since this is related to my earlier question: sorry to have let you wait, > Timo. > > Kelly, the reason I brought up my original question was because my use case > involves delivering videos under load. > Suppose there is a cluster of 50 nodes with a replication value of three. Now > if a random node is queried for a file of say, 500MB. Then I would expect > 30MB to be local, and 470MB (or 94%) to be retrieved from the cluster, > assuming local data is used if available. > If this happens over and over again, you want caching. Trouble with video: > users might skip parts, and those requests are nasty to most caches. If they > are few, it is Ok to just not cache them, but otherwise, we might need a > solution. > One approach is to have a proxy "translate" a range request into a series of > Riak queries, each of which is small enough to be cached in "no time", but > not so small as to allow the overhead outgrowing the real data.
Just a side note: writing such a proxy would be trivial in Go using the goriakpbc driver. I’m including a typical Go http handler below (uses a hardcoded “file.mp4” key, of course in reality you’d analyze the http.Request first - also all error checking is removed for clarity). This handler would support such range queries and would get only the chunks of data that are needed from Riak. You can choose the chunk/segment size when creating the file. func handler(w http.ResponseWriter, r *http.Request) { f, err := riak.OpenFile("files", "file.mp4") if err != nil { fmt.Printf("Error getting file from Riak - %v\n", err) return } http.ServeContent(w, r, "file.mp4", time.Now(), f) } func main() { // Copy a local MP4 file into Riak riak.ConnectClient("127.0.0.1:8087") src, _ := os.Open("file.mp4") dst, _ := riak.CreateFile("files", "file.mp4", "video/mp4", 102400) _, _ := io.Copy(dst, src) http.HandleFunc("/", handler) http.ListenAndServe(":8888", nil) } > > You are right that RiakCS does a fine job in what it does, but typical caches > can only query it sequentially, which precludes caching of range requests on > any object of relevant size. > > Timo, of course you are somewhat reinventing the wheel, half the work you > have to do is similar to how RiakCS stores files. However, since the other > half cannot "just use" S3, I fail to see the alternative, apart from > cancelling the project. I realize I am somewhat re-inventing the wheel. However I do not need an “S3 interface”, don’t have a need for accounting / different users, we are only using the Riak datastore internally. I don’t know Riak CS in detail but seems like setting up more components then plain Riak + possible a single point of failure with the Stanchion instance. > That said, I am not an expert in these matters, but your basic approach to > big files looks sound to me. > I am curious about the use case of growing files. > My first thought was that if I wanted to store logs in a database, I would > prefer to store e.g. single lines as values, or rows in a relational database. I’m storing log files in Riak because I really like the operational aspects of Riak. Setting up (replicated) relational databases and keep them humming along is much more complex IMHO. I’m storing the log files on a daily (or hourly) basis instead of each line as a separate value to make the key easy to compute. Scanning a Riak database over a range of keys is typically not recommended, and now I simply know / can calculate my key which would be “device-yyyymmdd.log” or something like that. Regards, Timo
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com