> 
> Since this is related to my earlier question: sorry to have let you wait, 
> Timo.
> 
> Kelly, the reason I brought up my original question was because my use case 
> involves delivering videos under load.
> Suppose there is a cluster of 50 nodes with a replication value of three. Now 
> if a random node is queried for a file of say, 500MB. Then I would expect 
> 30MB to be local, and 470MB (or 94%) to be retrieved from the cluster, 
> assuming local data is used if available.
> If this happens over and over again, you want caching. Trouble with video: 
> users might skip parts, and those requests are nasty to most caches. If they 
> are few, it is Ok to just not cache them, but otherwise, we might need a 
> solution.
> One approach is to have a proxy "translate" a range request into a series of 
> Riak queries, each of which is small enough to be cached in "no time", but 
> not so small as to allow the overhead outgrowing the real data.

Just a side note: writing such a proxy would be trivial in Go using the 
goriakpbc driver. I’m including a typical Go http handler below (uses a 
hardcoded “file.mp4” key, of course in reality you’d analyze the http.Request 
first - also all error checking is removed for clarity). This handler would 
support such range queries and would get only the chunks of data that are 
needed from Riak. You can choose the chunk/segment size when creating the file.

func handler(w http.ResponseWriter, r *http.Request) {
        f, err := riak.OpenFile("files", "file.mp4")
        if err != nil {
                fmt.Printf("Error getting file from Riak - %v\n", err)
                return
        }
        http.ServeContent(w, r, "file.mp4", time.Now(), f)
}

func main() {
        // Copy a local MP4 file into Riak
        riak.ConnectClient("127.0.0.1:8087")
        src, _ := os.Open("file.mp4")
        dst, _ := riak.CreateFile("files", "file.mp4", "video/mp4", 102400)
        _, _ := io.Copy(dst, src)

        http.HandleFunc("/", handler)
        http.ListenAndServe(":8888", nil)
}

> 
> You are right that RiakCS does a fine job in what it does, but typical caches 
> can only query it sequentially, which precludes caching of range requests on 
> any object of relevant size.
> 
> Timo, of course you are somewhat reinventing the wheel, half the work you 
> have to do is similar to how RiakCS stores files. However, since the other 
> half cannot "just use" S3, I fail to see the alternative, apart from 
> cancelling the project.

I realize I am somewhat re-inventing the wheel. However I do not need an “S3 
interface”, don’t have a need for accounting / different users, we are only 
using the Riak datastore internally. I don’t know Riak CS in detail but seems 
like setting up more components then plain Riak + possible a single point of 
failure with the Stanchion instance.

> That said, I am not an expert in these matters, but your basic approach to 
> big files looks sound to me.
> I am curious about the use case of growing files. 
> My first thought was that if I wanted to store logs in a database, I would 
> prefer to store e.g. single lines as values, or rows in a relational database.

I’m storing log files in Riak because I really like the operational aspects of 
Riak. Setting up (replicated) relational databases and keep them humming along 
is much more complex IMHO. I’m storing the log files on a daily (or hourly) 
basis instead of each line as a separate value to make the key easy to compute. 
Scanning a Riak database over a range of keys is typically not recommended, and 
now I simply know / can calculate my key which would be “device-yyyymmdd.log” 
or something like that.

Regards,
Timo

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to