Is Riak the right software for me?
I am looking for a software to host millions of files in the order of 2-20MB for web serving. A few (~5%) files are "hot" and accessed heavily but most of the files are cold. The system is going to work like a big cache and therefore I need to make sure that the cluster won't go out of space while new files are beeing added. I'd like to store the last time a file was accessed and delete the least accessed files to make sure there is always 20-30% free space in the cluster. This system is currently spread across alot of different standalone servers and serves a few gbit of bandwidth. Is this possible with riak? Is there anything I should take into consideration? I did a little testing with a 4 node cluster. Basically with the default configuration, I've only changed the ip addresses and disabled nagle buffering for http. When I want to access a 20MB file through the REST API it sometimes takes 4-8 seconds until any data is beeing sent to the client. I've made use of the "r" parameter (r=1) and the wait time improved a little bit but is still unacceptable for production use. Is there some config option I am missing here to get the requested file without such high wait times? Best Regards Philip ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak_core app using riak_kv
What is the best way for a riak_core app to use riak_kv for persistent storage? I thought that riak search did this but haven't found it in that code yet. Should I use the erlang interface via http or protobuf or is there an API or module I can use since my app is running as a member of the cluster? Sent from my iPhone ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is Riak the right software for me?
It looks like these high wait times occur when the request hits a node which does not store the requested file. The node will download the whole file into memory and after that he will send the data to the client. How about directly pipe'ing the download to the output stream or (even better) redirect the client to a node with the file so the overall network usage decreases? Am 4. März 2012 11:40 schrieb Philip : > I am looking for a software to host millions of files in the order of > 2-20MB for web serving. A few (~5%) files are "hot" and accessed > heavily but most of the files are cold. The system is going to work > like a big cache and therefore I need to make sure that the cluster > won't go out of space while new files are beeing added. I'd like to > store the last time a file was accessed and delete the least accessed > files to make sure there is always 20-30% free space in the cluster. > This system is currently spread across alot of different standalone > servers and serves a few gbit of bandwidth. > > Is this possible with riak? Is there anything I should take into > consideration? > > I did a little testing with a 4 node cluster. Basically with the > default configuration, I've only changed the ip addresses and disabled > nagle buffering for http. When I want to access a 20MB file through > the REST API it sometimes takes 4-8 seconds until any data is beeing > sent to the client. I've made use of the "r" parameter (r=1) and the > wait time improved a little bit but is still unacceptable for > production use. Is there some config option I am missing here to get > the requested file without such high wait times? > > Best Regards > Philip ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is Riak the right software for me?
A memory based caching layer will be added to the frontends so the hot files won't really be a problem. I just mentioned it because the memory layer might fail and result in increased use of the data source. However, I am still not sure if it is possible to use MapReduce to get a ordered list of the last accessed objects within a bucket. And the vclock consistency check shouldn't be performed for certain requests. I don't update any items and if so it wouldn't be a problem if an outdated version would be served for a short period of time. It simply consumes too much ressources if you are dealing with larger objects. Am 4. März 2012 17:08 schrieb Paul Armstrong : > On 04/03/2012, at 2:40, Philip wrote: > >> I am looking for a software to host millions of files in the order of >> 2-20MB for web serving. A few (~5%) files are "hot" and accessed >> heavily but most of the files are cold. The system is going to work >> like a big cache and therefore I need to make sure that the cluster >> won't go out of space while new files are beeing added. I'd like to >> store the last time a file was accessed and delete the least accessed >> files to make sure there is always 20-30% free space in the cluster. >> This system is currently spread across alot of different standalone >> servers and serves a few gbit of bandwidth. > > Couchbase is probably a better solution for your needs (let it eject things > based on LRU). Another option is to have memcache in front of Riak so your > hot objects are fast and kicked by LRU and you have a large, replicable store > behind it. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is Riak the right software for me?
Are you looking for something like this? https://help.basho.com/entries/358219-can-riak-be-used-to-build-a-secure-s3-alternative Ideally you would run a reverse proxy in front of riak. So riak would be your store and the reverse proxy would serve files directly to the client... Sent from my iPhone. On Mar 4, 2012, at 6:54 AM, Philip wrote: > It looks like these high wait times occur when the request hits a node > which does not store the requested file. The node will download the > whole file into memory and after that he will send the data to the > client. How about directly pipe'ing the download to the output stream > or (even better) redirect the client to a node with the file so the > overall network usage decreases? > > Am 4. März 2012 11:40 schrieb Philip : >> I am looking for a software to host millions of files in the order of >> 2-20MB for web serving. A few (~5%) files are "hot" and accessed >> heavily but most of the files are cold. The system is going to work >> like a big cache and therefore I need to make sure that the cluster >> won't go out of space while new files are beeing added. I'd like to >> store the last time a file was accessed and delete the least accessed >> files to make sure there is always 20-30% free space in the cluster. >> This system is currently spread across alot of different standalone >> servers and serves a few gbit of bandwidth. >> >> Is this possible with riak? Is there anything I should take into >> consideration? >> >> I did a little testing with a 4 node cluster. Basically with the >> default configuration, I've only changed the ip addresses and disabled >> nagle buffering for http. When I want to access a 20MB file through >> the REST API it sometimes takes 4-8 seconds until any data is beeing >> sent to the client. I've made use of the "r" parameter (r=1) and the >> wait time improved a little bit but is still unacceptable for >> production use. Is there some config option I am missing here to get >> the requested file without such high wait times? >> >> Best Regards >> Philip > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Search quite slow
Was doing some benchmarks with search and while thinking how to implement something more-or-less relevant i found this: http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/. So I forked and added riak https://github.com/Techmind/opensearch (though building java client took too much time, riak-java-client doesn't have support for search querying and sorlj depends on old libraries). It is single node test (as other tests use one node too) and i think i couldn't screw up in too many places because test is quite simple. I have to filter-out words like `with|of|and|the|a` because riak complained about `too_many_results`. So results was like: 30-40 min indexing, 20 seconds querying, indexing slower than sqlite and a bit faster querying than sqlite, but much slower than any other search-engine =) I think basho should really discourage people to use riak-search or optimize it somehow at least for something like batch-indexing. -- email: bogu...@gmail.com skype: i.bogunov phone: +7 968 842 5783 Regards, Bogunov Ilya ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Unicode String problem
Hi everybody! I can't put unicode string to riak. I was following the riak-erlang-client docs, and this doesn't work: Object = riakc_obj:new(<<"snippet">>, <<"odam">>, <<"Одамлардан тинглаб хикоя">>). ** exception error: bad argument I googled but couldn't find anything meaningful about this issue. So, I'd be very grateful if someone could refer me to relevant documentation or give me some hints to solve the problem. Thanks! -- Buriwoy ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Questions on configuring public and private ips for riak on ubuntu
Hello all, I have a few questions on networking configs for riak. I have both a public ip and a private ip for each riak node. I want Riak to communicate over the private ip addresses to take advantage of free bandwidth, but I would also like the option to interface with riak using the public ip's if need be (i.e. for testing / demo's etc). I'm gathering that the way people to this is by setting up app.config to use ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have a unique name in the cluster so I would need to use the hostname for the -name option (i.e. r...@www.fake-node-domain-name-1.com). My hosts file would contain: 127.0.0.1 localhost.localdomain localhost x.x.x.xwww.fake-node-domain-name-1.commynode-1 where x.x.x.x is the public ip not the private. This is where I start to get lost. As it sits, if I attempt to join using the private ip's i will get the unreachable error - yet I can telnet connect to/from the equivalent nodes. So I could add a second IP to the hosts file, but since I need to keep the public one as well, how is that riak is going to use the private ips for gissip ring, hinted hand-off, ... etc etc. There's obviously some networking basics I am missing. Any guidance from those of you who have done this? Thanks. Tim ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Questions on configuring public and private ips for riak on ubuntu
this is a "Very Bad" idea. do not expose your riak instance over a public ip address. riak has no internal security mechanism to keep people from doing very bad things to your data, configuration, etc. -Alexander Sicular @siculars On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote: > Hello all, > > I have a few questions on networking configs for riak. > > I have both a public ip and a private ip for each riak node. I want Riak to > communicate over the private ip addresses to take advantage of free > bandwidth, but I would also like the option to interface with riak using the > public ip's if need be (i.e. for testing / demo's etc). > > I'm gathering that the way people to this is by setting up app.config to use > ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have > a unique name in the cluster so I would need to use the hostname for the > -name option (i.e. r...@www.fake-node-domain-name-1.com). > > My hosts file would contain: > > 127.0.0.1 localhost.localdomain localhost > x.x.x.xwww.fake-node-domain-name-1.commynode-1 > > > where x.x.x.x is the public ip not the private. > > This is where I start to get lost. > > As it sits, if I attempt to join using the private ip's i will get the > unreachable error - yet I can telnet connect to/from the equivalent nodes. > > So I could add a second IP to the hosts file, but since I need to keep the > public one as well, how is that riak is going to use the private ips for > gissip ring, hinted hand-off, ... etc etc. > > There's obviously some networking basics I am missing. > > Any guidance from those of you who have done this? > > Thanks. > Tim > > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Questions on configuring public and private ips for riak on ubuntu
Right now I am just loading data for test purposes. It's nice to be able to do some benchmarks against the private network (which is @1Gbit/s)... while being able to poke a hole in the firewall when I want to do a test/demo. Tim -Original Message- From: "Alexander Sicular" Sent: Sunday, March 4, 2012 9:15pm To: "Tim Robinson" Cc: "riak-users@lists.basho.com" Subject: Re: Questions on configuring public and private ips for riak on ubuntu this is a "Very Bad" idea. do not expose your riak instance over a public ip address. riak has no internal security mechanism to keep people from doing very bad things to your data, configuration, etc. -Alexander Sicular @siculars On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote: > Hello all, > > I have a few questions on networking configs for riak. > > I have both a public ip and a private ip for each riak node. I want Riak to > communicate over the private ip addresses to take advantage of free > bandwidth, but I would also like the option to interface with riak using the > public ip's if need be (i.e. for testing / demo's etc). > > I'm gathering that the way people to this is by setting up app.config to use > ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have > a unique name in the cluster so I would need to use the hostname for the > -name option (i.e. r...@www.fake-node-domain-name-1.com). > > My hosts file would contain: > > 127.0.0.1 localhost.localdomain localhost > x.x.x.xwww.fake-node-domain-name-1.commynode-1 > > > where x.x.x.x is the public ip not the private. > > This is where I start to get lost. > > As it sits, if I attempt to join using the private ip's i will get the > unreachable error - yet I can telnet connect to/from the equivalent nodes. > > So I could add a second IP to the hosts file, but since I need to keep the > public one as well, how is that riak is going to use the private ips for > gissip ring, hinted hand-off, ... etc etc. > > There's obviously some networking basics I am missing. > > Any guidance from those of you who have done this? > > Thanks. > Tim > > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com Tim Robinson Tim Robinson ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Questions on configuring public and private ips for riak on ubuntu
Yeah, I read your blog post when it first came out. I liked it. I appreciate the warning, but practically speaking I'm really just not worried about it. It's a test environment on an external VPS that no one knows the info for. Demo to the company means show image/content-type load, JSON via browser with proper indentation, and Riak Control. SSH isn't going to do that for me. I'm using public data for the testing. I can blow the whole thing away any time. Aside from warnings does anyone want to help with the question. Thanks, Tim -Original Message- From: "Aphyr" Sent: Sunday, March 4, 2012 10:41pm To: "Tim Robinson" Subject: Re: Questions on configuring public and private ips for riak on ubuntu I can get SSH access over Riak's HTTP and protobufs interfaces in about five seconds, and can root a box shortly after that, depending on kernel. Please don't do it. Just don't. http://aphyr.com/posts/224-do-not-expose-riak-to-the-internet http://aphyr.com/posts/218-systems-security-a-primer --Kyle On 03/04/2012 09:38 PM, Tim Robinson wrote: > Right now I am just loading data for test purposes. It's nice to be able to > do some benchmarks against the private network (which is @1Gbit/s)... while > being able to poke a hole in the firewall when I want to do a test/demo. > > Tim > > -Original Message- > From: "Alexander Sicular" > Sent: Sunday, March 4, 2012 9:15pm > To: "Tim Robinson" > Cc: "riak-users@lists.basho.com" > Subject: Re: Questions on configuring public and private ips for riak on > ubuntu > > this is a "Very Bad" idea. do not expose your riak instance over a public ip > address. riak has no internal security mechanism to keep people from doing > very bad things to your data, configuration, etc. > > -Alexander Sicular > > @siculars > > On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote: > >> Hello all, >> >> I have a few questions on networking configs for riak. >> >> I have both a public ip and a private ip for each riak node. I want Riak to >> communicate over the private ip addresses to take advantage of free >> bandwidth, but I would also like the option to interface with riak using the >> public ip's if need be (i.e. for testing / demo's etc). >> >> I'm gathering that the way people to this is by setting up app.config to use >> ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to >> have a unique name in the cluster so I would need to use the hostname for >> the -name option (i.e. r...@www.fake-node-domain-name-1.com). >> >> My hosts file would contain: >> >> 127.0.0.1 localhost.localdomain localhost >> x.x.x.xwww.fake-node-domain-name-1.commynode-1 >> >> >> where x.x.x.x is the public ip not the private. >> >> This is where I start to get lost. >> >> As it sits, if I attempt to join using the private ip's i will get the >> unreachable error - yet I can telnet connect to/from the equivalent nodes. >> >> So I could add a second IP to the hosts file, but since I need to keep the >> public one as well, how is that riak is going to use the private ips for >> gissip ring, hinted hand-off, ... etc etc. >> >> There's obviously some networking basics I am missing. >> >> Any guidance from those of you who have done this? >> >> Thanks. >> Tim >> >> >> >> >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > Tim Robinson > > > > Tim Robinson > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > Tim Robinson Tim Robinson ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Questions on configuring public and private ips for riak on ubuntu
ssh -NL 8098:localhost:8098 your.vps.com --Kyle On 03/04/2012 09:55 PM, Tim Robinson wrote: Yeah, I read your blog post when it first came out. I liked it. I appreciate the warning, but practically speaking I'm really just not worried about it. It's a test environment on an external VPS that no one knows the info for. Demo to the company means show image/content-type load, JSON via browser with proper indentation, and Riak Control. SSH isn't going to do that for me. I'm using public data for the testing. I can blow the whole thing away any time. Aside from warnings does anyone want to help with the question. Thanks, Tim -Original Message- From: "Aphyr" Sent: Sunday, March 4, 2012 10:41pm To: "Tim Robinson" Subject: Re: Questions on configuring public and private ips for riak on ubuntu I can get SSH access over Riak's HTTP and protobufs interfaces in about five seconds, and can root a box shortly after that, depending on kernel. Please don't do it. Just don't. http://aphyr.com/posts/224-do-not-expose-riak-to-the-internet http://aphyr.com/posts/218-systems-security-a-primer --Kyle On 03/04/2012 09:38 PM, Tim Robinson wrote: Right now I am just loading data for test purposes. It's nice to be able to do some benchmarks against the private network (which is @1Gbit/s)... while being able to poke a hole in the firewall when I want to do a test/demo. Tim -Original Message- From: "Alexander Sicular" Sent: Sunday, March 4, 2012 9:15pm To: "Tim Robinson" Cc: "riak-users@lists.basho.com" Subject: Re: Questions on configuring public and private ips for riak on ubuntu this is a "Very Bad" idea. do not expose your riak instance over a public ip address. riak has no internal security mechanism to keep people from doing very bad things to your data, configuration, etc. -Alexander Sicular @siculars On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote: Hello all, I have a few questions on networking configs for riak. I have both a public ip and a private ip for each riak node. I want Riak to communicate over the private ip addresses to take advantage of free bandwidth, but I would also like the option to interface with riak using the public ip's if need be (i.e. for testing / demo's etc). I'm gathering that the way people to this is by setting up app.config to use ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have a unique name in the cluster so I would need to use the hostname for the -name option (i.e. r...@www.fake-node-domain-name-1.com). My hosts file would contain: 127.0.0.1 localhost.localdomain localhost x.x.x.xwww.fake-node-domain-name-1.commynode-1 where x.x.x.x is the public ip not the private. This is where I start to get lost. As it sits, if I attempt to join using the private ip's i will get the unreachable error - yet I can telnet connect to/from the equivalent nodes. So I could add a second IP to the hosts file, but since I need to keep the public one as well, how is that riak is going to use the private ips for gissip ring, hinted hand-off, ... etc etc. There's obviously some networking basics I am missing. Any guidance from those of you who have done this? Thanks. Tim ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com Tim Robinson Tim Robinson ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com Tim Robinson Tim Robinson ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: licenses (was Re: riakkit, a python riak object mapper, has hit beta!(
Hey Andrey, I've spent well over a decade dealing with licensing issues. One thing that I've learned is that licensing is a personal choice and decision, and it is nearly impossible to alter somebody's philosophy. I find people fall into the GPL camp ("free software"), or the Apache/BSD camp ("permissive / open source"), so I always recommend GPLv3 or ALv2. (I find people choosing weak reciprocal licenses like LGPL, EPL, MPL, CDDL, etc should make up their mind and go to GPL or AL) In any case... license choice and arguments for one over the other is best left to personal email, rather than a public mailing list like riak-users. Changing minds doesn't happen on a mailing list :-) Cheers, -g On Fri, Mar 2, 2012 at 05:24, Andrey V. Martyanov wrote: > Hi Justin, > > Sorry for the late response, I didn't see your message! In fact, I know the > differences between the two. But, what is the profit of using it? Why don't > just use BSD, for example, like many open source projects do. The biggest > minus of LGPL is that many people think that it's the same as GPL and have > problems understanding it. Even your think that I don't know the difference! > :) Why? Because, it's a common practice. A lot of people really don't know > the difference. That's why I said before that (L)GPL is overcomplicated. If > you open the LGPL main page [1], first thing you will see is "Why you > shouldn't use the Lesser GPL for your next library". Is it normal? It > confuses people. There are a lot of profit in pulling back the changes > you've made - a lot of people see it, fix it, comment it, improve it and so > on. Why the license forces me to to that? It shouldn't. > > [1] http://www.gnu.org/licenses/lgpl.html > > Best regards, > Andrey Martyanov > > On Fri, Mar 2, 2012 at 8:29 AM, Justin Sheehy wrote: >> >> Hi, Andrey. >> >> On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov" >> wrote: >> >> > Sorry for GPL, it's a typo. I just don't like GPL-based licenses, >> > including LGPL. I think it's overcomplicated. >> >> You are of course free to dislike anything you wish, but it is worth >> mentioning that GPL and LGPL are very different licenses; the LGPL is >> missing infectious aspects of the GPL. >> >> There are many projects which could not use GPL code compatibly with their >> preferred license but which can safely use LGPL code. >> >> Justin >> >> > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com