Is Riak the right software for me?

2012-03-04 Thread Philip
I am looking for a software to host millions of files in the order of
2-20MB for web serving. A few (~5%) files are "hot" and accessed
heavily but most of the files are cold. The system is going to work
like a big cache and therefore I need to make sure that the cluster
won't go out of space while new files are beeing added. I'd like to
store the last time a file was accessed and delete the least accessed
files to make sure there is always 20-30% free space in the cluster.
This system is currently spread across alot of different standalone
servers and serves a few gbit of bandwidth.

Is this possible with riak? Is there anything I should take into consideration?

I did a little testing with a 4 node cluster. Basically with the
default configuration, I've only changed the ip addresses and disabled
nagle buffering for http. When I want to access a 20MB file through
the REST API it sometimes takes 4-8 seconds until any data is beeing
sent to the client. I've made use of the "r" parameter (r=1) and the
wait time improved a little bit but is still unacceptable for
production use. Is there some config option I am missing here to get
the requested file without such high wait times?

Best Regards
Philip

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


riak_core app using riak_kv

2012-03-04 Thread Adam Schepis
What is the best way for a riak_core app to use riak_kv for persistent
storage? I thought that riak search did this but haven't found it in
that code yet. Should I use the erlang interface via http or protobuf
or is there an API or module I can use since my app is running as a
member of the cluster?

Sent from my iPhone

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is Riak the right software for me?

2012-03-04 Thread Philip
It looks like these high wait times occur when the request hits a node
which does not store the requested file. The node will download the
whole file into memory and after that he will send the data to the
client. How about directly pipe'ing the download to the output stream
or (even better) redirect the client to a node with the file so the
overall network usage decreases?

Am 4. März 2012 11:40 schrieb Philip :
> I am looking for a software to host millions of files in the order of
> 2-20MB for web serving. A few (~5%) files are "hot" and accessed
> heavily but most of the files are cold. The system is going to work
> like a big cache and therefore I need to make sure that the cluster
> won't go out of space while new files are beeing added. I'd like to
> store the last time a file was accessed and delete the least accessed
> files to make sure there is always 20-30% free space in the cluster.
> This system is currently spread across alot of different standalone
> servers and serves a few gbit of bandwidth.
>
> Is this possible with riak? Is there anything I should take into 
> consideration?
>
> I did a little testing with a 4 node cluster. Basically with the
> default configuration, I've only changed the ip addresses and disabled
> nagle buffering for http. When I want to access a 20MB file through
> the REST API it sometimes takes 4-8 seconds until any data is beeing
> sent to the client. I've made use of the "r" parameter (r=1) and the
> wait time improved a little bit but is still unacceptable for
> production use. Is there some config option I am missing here to get
> the requested file without such high wait times?
>
> Best Regards
> Philip

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is Riak the right software for me?

2012-03-04 Thread Philip
A memory based caching layer will be added to the frontends so the hot
files won't really be a problem. I just mentioned it because the
memory layer might fail and result in increased use of the data
source.

However, I am still not sure if it is possible to use MapReduce to get
a ordered list of the last accessed objects within a bucket. And the
vclock consistency check shouldn't be performed for certain requests.
I don't update any items and if so it wouldn't be a problem if an
outdated version would be served for a short period of time. It simply
consumes too much ressources if you are dealing with larger objects.

Am 4. März 2012 17:08 schrieb Paul Armstrong :
> On 04/03/2012, at 2:40, Philip  wrote:
>
>> I am looking for a software to host millions of files in the order of
>> 2-20MB for web serving. A few (~5%) files are "hot" and accessed
>> heavily but most of the files are cold. The system is going to work
>> like a big cache and therefore I need to make sure that the cluster
>> won't go out of space while new files are beeing added. I'd like to
>> store the last time a file was accessed and delete the least accessed
>> files to make sure there is always 20-30% free space in the cluster.
>> This system is currently spread across alot of different standalone
>> servers and serves a few gbit of bandwidth.
>
> Couchbase is probably a better solution for your needs (let it eject things 
> based on LRU). Another option is to have memcache in front of Riak so your 
> hot objects are fast and kicked by LRU and you have a large, replicable store 
> behind it.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is Riak the right software for me?

2012-03-04 Thread Shekhar Vemuri
Are you looking for something like this? 

https://help.basho.com/entries/358219-can-riak-be-used-to-build-a-secure-s3-alternative

Ideally you would run a reverse proxy in front of riak. So riak would be your 
store and the reverse proxy would serve files directly to the client...

Sent from my iPhone.

On Mar 4, 2012, at 6:54 AM, Philip  wrote:

> It looks like these high wait times occur when the request hits a node
> which does not store the requested file. The node will download the
> whole file into memory and after that he will send the data to the
> client. How about directly pipe'ing the download to the output stream
> or (even better) redirect the client to a node with the file so the
> overall network usage decreases?
> 
> Am 4. März 2012 11:40 schrieb Philip :
>> I am looking for a software to host millions of files in the order of
>> 2-20MB for web serving. A few (~5%) files are "hot" and accessed
>> heavily but most of the files are cold. The system is going to work
>> like a big cache and therefore I need to make sure that the cluster
>> won't go out of space while new files are beeing added. I'd like to
>> store the last time a file was accessed and delete the least accessed
>> files to make sure there is always 20-30% free space in the cluster.
>> This system is currently spread across alot of different standalone
>> servers and serves a few gbit of bandwidth.
>> 
>> Is this possible with riak? Is there anything I should take into 
>> consideration?
>> 
>> I did a little testing with a 4 node cluster. Basically with the
>> default configuration, I've only changed the ip addresses and disabled
>> nagle buffering for http. When I want to access a 20MB file through
>> the REST API it sometimes takes 4-8 seconds until any data is beeing
>> sent to the client. I've made use of the "r" parameter (r=1) and the
>> wait time improved a little bit but is still unacceptable for
>> production use. Is there some config option I am missing here to get
>> the requested file without such high wait times?
>> 
>> Best Regards
>> Philip
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Search quite slow

2012-03-04 Thread Bogunov
Was doing some benchmarks with search and while thinking how to implement
something more-or-less relevant i found this:
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/.
So I forked and added riak https://github.com/Techmind/opensearch (though
building java client took too much time, riak-java-client doesn't have
support for search querying and sorlj depends on old libraries).

It is single node test (as other tests use one node too) and i think i
couldn't screw up in too many places because test is quite simple. I have
to filter-out words like `with|of|and|the|a` because riak complained about
`too_many_results`.

So results was like: 30-40 min indexing, 20 seconds querying, indexing
slower than sqlite and a bit faster querying than sqlite, but much slower
than any other search-engine =)

I think basho should really discourage people to use riak-search or
optimize it somehow at least for something like batch-indexing.

-- 
email: bogu...@gmail.com
skype: i.bogunov
phone: +7 968 842 5783
Regards, Bogunov Ilya
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Unicode String problem

2012-03-04 Thread Buri Arslon
Hi everybody!

I can't put unicode string to riak. I was following the riak-erlang-client
docs, and this doesn't work:

  Object = riakc_obj:new(<<"snippet">>, <<"odam">>, <<"Одамлардан тинглаб
хикоя">>).
 ** exception error: bad argument

I googled but couldn't find anything meaningful about this issue. So, I'd
be very grateful if someone could refer me
to relevant documentation or give me some hints to solve the problem.

Thanks!
-- Buriwoy
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Questions on configuring public and private ips for riak on ubuntu

2012-03-04 Thread Tim Robinson
Hello all,

I have a few questions on networking configs for riak.
 
I have both a public ip and a private ip for each riak node. I want Riak to 
communicate over the private ip addresses to take advantage of free bandwidth, 
but I would also like the option to interface with riak using the public ip's 
if need be (i.e. for testing / demo's etc).

I'm gathering that the way people to this is by setting up app.config to use ip 
"0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have a 
unique name in the cluster so I would need to use the hostname for the -name 
option (i.e. r...@www.fake-node-domain-name-1.com).

My hosts file would contain:

127.0.0.1  localhost.localdomain  localhost
x.x.x.xwww.fake-node-domain-name-1.commynode-1


where x.x.x.x is the public ip not the private.

This is where I start to get lost.

As it sits, if I attempt to join using the private ip's i will get the 
unreachable error - yet I can telnet connect to/from the equivalent nodes. 

So I could add a second IP to the hosts file, but since I need to keep the 
public one as well, how is that riak is going to use the private ips for gissip 
ring, hinted hand-off, ... etc etc.

There's obviously some networking basics I am missing.

Any guidance from those of you who have done this?

Thanks.
Tim





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Questions on configuring public and private ips for riak on ubuntu

2012-03-04 Thread Alexander Sicular
this is a "Very Bad" idea. do not expose your riak instance over a public ip 
address. riak has no internal security mechanism to keep people from doing very 
bad things to your data, configuration, etc.

-Alexander Sicular

@siculars

On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote:

> Hello all,
> 
> I have a few questions on networking configs for riak.
> 
> I have both a public ip and a private ip for each riak node. I want Riak to 
> communicate over the private ip addresses to take advantage of free 
> bandwidth, but I would also like the option to interface with riak using the 
> public ip's if need be (i.e. for testing / demo's etc).
> 
> I'm gathering that the way people to this is by setting up app.config to use 
> ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have 
> a unique name in the cluster so I would need to use the hostname for the 
> -name option (i.e. r...@www.fake-node-domain-name-1.com).
> 
> My hosts file would contain:
> 
> 127.0.0.1  localhost.localdomain  localhost
> x.x.x.xwww.fake-node-domain-name-1.commynode-1
> 
> 
> where x.x.x.x is the public ip not the private.
> 
> This is where I start to get lost.
> 
> As it sits, if I attempt to join using the private ip's i will get the 
> unreachable error - yet I can telnet connect to/from the equivalent nodes. 
> 
> So I could add a second IP to the hosts file, but since I need to keep the 
> public one as well, how is that riak is going to use the private ips for 
> gissip ring, hinted hand-off, ... etc etc.
> 
> There's obviously some networking basics I am missing.
> 
> Any guidance from those of you who have done this?
> 
> Thanks.
> Tim
> 
> 
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Questions on configuring public and private ips for riak on ubuntu

2012-03-04 Thread Tim Robinson
Right now I am just loading data for test purposes. It's nice to be able to do 
some benchmarks against the private network (which is @1Gbit/s)... while being 
able to poke a hole in the firewall when I want to do a test/demo.
 
Tim

-Original Message-
From: "Alexander Sicular" 
Sent: Sunday, March 4, 2012 9:15pm
To: "Tim Robinson" 
Cc: "riak-users@lists.basho.com" 
Subject: Re: Questions on configuring public and private ips for riak on ubuntu

this is a "Very Bad" idea. do not expose your riak instance over a public ip 
address. riak has no internal security mechanism to keep people from doing very 
bad things to your data, configuration, etc.

-Alexander Sicular

@siculars

On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote:

> Hello all,
> 
> I have a few questions on networking configs for riak.
> 
> I have both a public ip and a private ip for each riak node. I want Riak to 
> communicate over the private ip addresses to take advantage of free 
> bandwidth, but I would also like the option to interface with riak using the 
> public ip's if need be (i.e. for testing / demo's etc).
> 
> I'm gathering that the way people to this is by setting up app.config to use 
> ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have 
> a unique name in the cluster so I would need to use the hostname for the 
> -name option (i.e. r...@www.fake-node-domain-name-1.com).
> 
> My hosts file would contain:
> 
> 127.0.0.1  localhost.localdomain  localhost
> x.x.x.xwww.fake-node-domain-name-1.commynode-1
> 
> 
> where x.x.x.x is the public ip not the private.
> 
> This is where I start to get lost.
> 
> As it sits, if I attempt to join using the private ip's i will get the 
> unreachable error - yet I can telnet connect to/from the equivalent nodes. 
> 
> So I could add a second IP to the hosts file, but since I need to keep the 
> public one as well, how is that riak is going to use the private ips for 
> gissip ring, hinted hand-off, ... etc etc.
> 
> There's obviously some networking basics I am missing.
> 
> Any guidance from those of you who have done this?
> 
> Thanks.
> Tim
> 
> 
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



Tim Robinson



Tim Robinson



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Questions on configuring public and private ips for riak on ubuntu

2012-03-04 Thread Tim Robinson
Yeah, I read your blog post when it first came out. I liked it.

I appreciate the warning, but practically speaking I'm really just not worried 
about it. It's a test environment on an external VPS that no one knows the info 
for. Demo to the company means show image/content-type load, JSON via browser 
with proper indentation, and Riak Control. SSH isn't going to do that for me.

I'm using public data for the testing. I can blow the whole thing away any 
time. 

Aside from warnings does anyone want to help with the question.

Thanks,
Tim


-Original Message-
From: "Aphyr" 
Sent: Sunday, March 4, 2012 10:41pm
To: "Tim Robinson" 
Subject: Re: Questions on configuring public and private ips for riak on ubuntu

I can get SSH access over Riak's HTTP and protobufs interfaces in about 
five seconds, and can root a box shortly after that, depending on 
kernel. Please don't do it. Just don't.

http://aphyr.com/posts/224-do-not-expose-riak-to-the-internet
http://aphyr.com/posts/218-systems-security-a-primer

--Kyle

On 03/04/2012 09:38 PM, Tim Robinson wrote:
> Right now I am just loading data for test purposes. It's nice to be able to 
> do some benchmarks against the private network (which is @1Gbit/s)... while 
> being able to poke a hole in the firewall when I want to do a test/demo.
>
> Tim
>
> -Original Message-
> From: "Alexander Sicular"
> Sent: Sunday, March 4, 2012 9:15pm
> To: "Tim Robinson"
> Cc: "riak-users@lists.basho.com"
> Subject: Re: Questions on configuring public and private ips for riak on 
> ubuntu
>
> this is a "Very Bad" idea. do not expose your riak instance over a public ip 
> address. riak has no internal security mechanism to keep people from doing 
> very bad things to your data, configuration, etc.
>
> -Alexander Sicular
>
> @siculars
>
> On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote:
>
>> Hello all,
>>
>> I have a few questions on networking configs for riak.
>>
>> I have both a public ip and a private ip for each riak node. I want Riak to 
>> communicate over the private ip addresses to take advantage of free 
>> bandwidth, but I would also like the option to interface with riak using the 
>> public ip's if need be (i.e. for testing / demo's etc).
>>
>> I'm gathering that the way people to this is by setting up app.config to use 
>> ip "0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to 
>> have a unique name in the cluster so I would need to use the hostname for 
>> the -name option (i.e. r...@www.fake-node-domain-name-1.com).
>>
>> My hosts file would contain:
>>
>> 127.0.0.1  localhost.localdomain  localhost
>> x.x.x.xwww.fake-node-domain-name-1.commynode-1
>> 
>>
>> where x.x.x.x is the public ip not the private.
>>
>> This is where I start to get lost.
>>
>> As it sits, if I attempt to join using the private ip's i will get the 
>> unreachable error - yet I can telnet connect to/from the equivalent nodes.
>>
>> So I could add a second IP to the hosts file, but since I need to keep the 
>> public one as well, how is that riak is going to use the private ips for 
>> gissip ring, hinted hand-off, ... etc etc.
>>
>> There's obviously some networking basics I am missing.
>>
>> Any guidance from those of you who have done this?
>>
>> Thanks.
>> Tim
>>
>>
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> Tim Robinson
>
>
>
> Tim Robinson
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


Tim Robinson



Tim Robinson



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Questions on configuring public and private ips for riak on ubuntu

2012-03-04 Thread Aphyr
ssh -NL 8098:localhost:8098 your.vps.com

--Kyle

On 03/04/2012 09:55 PM, Tim Robinson wrote:

Yeah, I read your blog post when it first came out. I liked it.

I appreciate the warning, but practically speaking I'm really just not worried 
about it. It's a test environment on an external VPS that no one knows the info 
for. Demo to the company means show image/content-type load, JSON via browser 
with proper indentation, and Riak Control. SSH isn't going to do that for me.

I'm using public data for the testing. I can blow the whole thing away any time.

Aside from warnings does anyone want to help with the question.

Thanks,
Tim


-Original Message-
From: "Aphyr"
Sent: Sunday, March 4, 2012 10:41pm
To: "Tim Robinson"
Subject: Re: Questions on configuring public and private ips for riak on ubuntu

I can get SSH access over Riak's HTTP and protobufs interfaces in about
five seconds, and can root a box shortly after that, depending on
kernel. Please don't do it. Just don't.

http://aphyr.com/posts/224-do-not-expose-riak-to-the-internet
http://aphyr.com/posts/218-systems-security-a-primer

--Kyle

On 03/04/2012 09:38 PM, Tim Robinson wrote:

Right now I am just loading data for test purposes. It's nice to be able to do 
some benchmarks against the private network (which is @1Gbit/s)... while being 
able to poke a hole in the firewall when I want to do a test/demo.

Tim

-Original Message-
From: "Alexander Sicular"
Sent: Sunday, March 4, 2012 9:15pm
To: "Tim Robinson"
Cc: "riak-users@lists.basho.com"
Subject: Re: Questions on configuring public and private ips for riak on ubuntu

this is a "Very Bad" idea. do not expose your riak instance over a public ip 
address. riak has no internal security mechanism to keep people from doing very bad 
things to your data, configuration, etc.

-Alexander Sicular

@siculars

On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote:


Hello all,

I have a few questions on networking configs for riak.

I have both a public ip and a private ip for each riak node. I want Riak to 
communicate over the private ip addresses to take advantage of free bandwidth, 
but I would also like the option to interface with riak using the public ip's 
if need be (i.e. for testing / demo's etc).

I'm gathering that the way people to this is by setting up app.config to use ip 
"0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have a 
unique name in the cluster so I would need to use the hostname for the -name option (i.e. 
r...@www.fake-node-domain-name-1.com).

My hosts file would contain:

127.0.0.1  localhost.localdomain  localhost
x.x.x.xwww.fake-node-domain-name-1.commynode-1


where x.x.x.x is the public ip not the private.

This is where I start to get lost.

As it sits, if I attempt to join using the private ip's i will get the 
unreachable error - yet I can telnet connect to/from the equivalent nodes.

So I could add a second IP to the hosts file, but since I need to keep the 
public one as well, how is that riak is going to use the private ips for gissip 
ring, hinted hand-off, ... etc etc.

There's obviously some networking basics I am missing.

Any guidance from those of you who have done this?

Thanks.
Tim





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




Tim Robinson



Tim Robinson



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




Tim Robinson



Tim Robinson



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: licenses (was Re: riakkit, a python riak object mapper, has hit beta!(

2012-03-04 Thread Greg Stein
Hey Andrey,

I've spent well over a decade dealing with licensing issues. One thing
that I've learned is that licensing is a personal choice and decision,
and it is nearly impossible to alter somebody's philosophy. I find
people fall into the GPL camp ("free software"), or the Apache/BSD
camp ("permissive / open source"), so I always recommend GPLv3 or
ALv2. (I find people choosing weak reciprocal licenses like LGPL, EPL,
MPL, CDDL, etc should make up their mind and go to GPL or AL)

In any case... license choice and arguments for one over the other is
best left to personal email, rather than a public mailing list like
riak-users. Changing minds doesn't happen on a mailing list :-)

Cheers,
-g

On Fri, Mar 2, 2012 at 05:24, Andrey V. Martyanov  wrote:
> Hi Justin,
>
> Sorry for the late response, I didn't  see your message! In fact, I know the
> differences between the two. But, what is the profit of using it? Why don't
> just use BSD, for example, like many open source projects do. The biggest
> minus of LGPL is that many people think that it's the same as GPL and have
> problems understanding it. Even your think that I don't know the difference!
> :) Why? Because, it's a common practice. A lot of people really don't know
> the difference. That's why I said before that (L)GPL is overcomplicated. If
> you open the LGPL main page [1], first thing you will see is "Why you
> shouldn't use the Lesser GPL for your next library". Is it normal? It
> confuses people. There are a lot of profit in pulling back the changes
> you've made - a lot of people see it, fix it, comment it, improve it and so
> on. Why the license forces me to to that? It shouldn't.
>
> [1] http://www.gnu.org/licenses/lgpl.html
>
> Best regards,
> Andrey Martyanov
>
> On Fri, Mar 2, 2012 at 8:29 AM, Justin Sheehy  wrote:
>>
>> Hi, Andrey.
>>
>> On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov" 
>> wrote:
>>
>> > Sorry for GPL, it's a typo. I just don't like GPL-based licenses,
>> > including LGPL. I think it's overcomplicated.
>>
>> You are of course free to dislike anything you wish, but it is worth
>> mentioning that GPL and LGPL are very different licenses; the LGPL is
>> missing infectious aspects of the GPL.
>>
>> There are many projects which could not use GPL code compatibly with their
>> preferred license but which can safely use LGPL code.
>>
>> Justin
>>
>>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com