Python client: three issues
Hi all, I'm digging into Riak, and its Python client. I've discovered several issues and would like to provide feedback and get some input, please. 1. I'd like Riak to generate keys for some of my objects, and I'm using the HTTP API. Pretty simple: do a POST, and the Location: header will tell me the key. However, looking at the code... it doesn't appear that object.store() actually reads and returns that key. Am I missing something here? Maybe there is a Proper Way to do this? 2. The HTTP transport code doesn't use httplib properly at all. When I wrote the httplib library, my primary design point was to enable persistent, HTTP/1.1 connections. That was the whole idea behind HTTPConnection. The transport library opens a *new* connection for every request. I'm not surprised that the performance suffers, since a TCP setup handshake needs to occur on every request. ... So: are there any plans to rebuild the HTTP support to use persistent connections? 3. The Luwak support reads the *entire* "file" into memory. My understanding is that Luwak is for "large" files. If those files are (say) a gigabyte, then the current code is going to impact clients in a *very* bad way. I can see that the code wants to read the response, then return the connection back to the pool... but that just isn't workable. A connection needs to be held by the reader, which then streams it to completion (or close), and only then should the connection be returned. Now, if Luwak is defined for files less than (say) 10 megabytes, then maybe this approach isn't wrong, but I haven't seen any documentation about size limitations. Any thoughts on this? Thanks! -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python client: three issues
On Sun, Sep 4, 2011 at 22:12, Brett Hoerner wrote: > On Sun, Sep 4, 2011 at 7:05 PM, Greg Stein wrote: >> 1. I'd like Riak to generate keys for some of my objects, and I'm >> using the HTTP API. Pretty simple: do a POST, and the Location: header >> will tell me the key. However, looking at the code... it doesn't >> appear that object.store() actually reads and returns that key. Am I >> missing something here? Maybe there is a Proper Way to do this? > > You're right, it looks like this functionality was added without the > logic to pull the key back out of the response. It definitely should. Okay. Just wanted to make sure that I wasn't missing something. I'll craft up a patch. >... >> 2. The HTTP transport code doesn't use httplib properly at all. When I >> wrote the httplib library, my primary design point was to enable >> persistent, HTTP/1.1 connections. That was the whole idea behind >> HTTPConnection. The transport library opens a *new* connection for >> every request. I'm not surprised that the performance suffers, since a >> TCP setup handshake needs to occur on every request. ... So: are there >> any plans to rebuild the HTTP support to use persistent connections? > > I believe I've read that Riak itself doesn't support keep-alive. > Someone else would have to comment for sure here. Regardless, this > feature would be handy because many people use Riak over HTTP through > a reverse proxy. It looks like Riak supports persistent connections, according to this post: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000744.html I'd like to understand if there is a roadmap for the HTTP client, and if >> 3. The Luwak support reads the *entire* "file" into memory. My >> understanding is that Luwak is for "large" files. If those files are >> (say) a gigabyte, then the current code is going to impact clients in >> a *very* bad way. I can see that the code wants to read the response, >> then return the connection back to the pool... but that just isn't >> workable. A connection needs to be held by the reader, which then >> streams it to completion (or close), and only then should the >> connection be returned. Now, if Luwak is defined for files less than >> (say) 10 megabytes, then maybe this approach isn't wrong, but I >> haven't seen any documentation about size limitations. Any thoughts on >> this? > > You're 100% right that it should spool to disk, Luwak is meant for > very large files. I see no reason to spool to disk. Just keep the socket open, and let the application read as necessary. That said, thanks for validating my understanding of Luwak files. > The Python client has been pretty quiet in my experience (though I > only started messing with it about a month ago). I'm sure patches for > any/all of the above are welcome. I'm hoping the Basho guys have > someone on it to review pull requests and such sooner than later. Can do. I've already submitted a pull request to allow the client to work with older versions of Python (e.g the Python 2.5 installed on my Mac OS Leopard laptop). Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
more Python client questions
I've got a couple more questions: 1. Why is "pycurl" used in the http transport? What benefits does it offer? Based on reading the code, I see no additional functionality, so I don't understand why this complexity exists. I'd like to submit a changeset that just removes it, but would like to understand if there is a reason/history for using pycurl. 2. What is the policy around the API on the transports? As a concrete example, I'd like to remove the HOST and PORT parameters from RiakHttpTransport.http_request() (and remove them from the return values of build_rest_path()). The parameters are always self._host and self._port. Given the semantics of the transport, these would *never* change... even by third party code that might call into these functions. Would removing these parameters pose problems w.r.t API stability guarantees? Thanks, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: more Python client questions
Well... I'm looking at some work on transports/http.py and that host/port is passed around *as if* there is a possibility/reason to vary the values. But when you get down to it, .http_request can just use ._host and ._port, and we can eliminate these spurious parameters from thought. In particular, I'd like a transport that manages N persistent connections, so requests will be using a "connection" parameter rather than a host/port combination. This is why I ask about API guarantees. My assumption is that transports/transport.py defines the only API that must be preserved, and all other methods are open to change... but I wanted to ask for confirmation first. Cheers, -g On Mon, Sep 5, 2011 at 10:44, Jonathan Langevin wrote: > Re: 2. Not that I disagree with removing those values, but is there some > additional benefit that you're expecting by removing the static values? > (just curious)* > > <http://www.loomlearning.com/> > Jonathan Langevin > Systems Administrator > Loom Inc. > Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com - > www.loomlearning.com - Skype: intel352 > * > > > On Mon, Sep 5, 2011 at 8:47 AM, Greg Stein wrote: > >> I've got a couple more questions: >> >> 1. Why is "pycurl" used in the http transport? What benefits does it >> offer? Based on reading the code, I see no additional functionality, >> so I don't understand why this complexity exists. I'd like to submit a >> changeset that just removes it, but would like to understand if there >> is a reason/history for using pycurl. >> >> 2. What is the policy around the API on the transports? As a concrete >> example, I'd like to remove the HOST and PORT parameters from >> RiakHttpTransport.http_request() (and remove them from the return >> values of build_rest_path()). The parameters are always self._host and >> self._port. Given the semantics of the transport, these would *never* >> change... even by third party code that might call into these >> functions. Would removing these parameters pose problems w.r.t API >> stability guarantees? >> >> >> Thanks, >> -g >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python client: three issues
On Mon, Sep 5, 2011 at 16:14, Mark Phillips wrote: >... > I just wanted to jump in here and reiterate that patches, issues, and > opinions about how the python code works/should work are all > encouraged. Mathias Meyer and Jared Morrow (among others) have been > handling issues, pull requests, releases, etc., for the the last few > months and should be able to do so moving forward. Between them, other > members of the Basho dev team, and a quickly-growing group of python > users, we should be able to handle pull requests, issues, and other > contributions at a pretty good clip. Thanks, Mark. I figured as much, but wanted to know what *kinds* of changes follow the existing roadmap (if any). If I can submit patches that improve things for my situation, *and* is deemed applicable to the standard client... then we all win :-) Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
incorrect link, or missing download
Hello, I went to download Riak for Mac OS, and started to follow the instructions on: http://wiki.basho.com/Installing-on-Mac-OS-X.html It says to download: http://downloads.basho.com/riak/riak-0.14/riak-0.14.2-osx-i386.tar.gz However, that resource does not exist. There is only 0.14.0: http://downloads.basho.com/riak/riak-0.14/riak-0.14.0-osx-i386.tar.gz Something needs to be fixed :-) Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: incorrect link, or missing download
Thanks, Jared. Not a problem. Since this is simple testing, I can work with 0.14.0. My production server (EC2/Ubuntu) is built from 0.14.2 source, so I have no serious concern here. That said, I'd advise fixing the link on that web page, to avoid future questions. Cheers, -g On Mon, Sep 5, 2011 at 22:37, Jared Morrow wrote: > Riak 0.14.2 was not built for OSX unfortunately. Before now, the OSX > builds were simply done by hand by doing a 'make rel' and packaging the > results of the 'rel/riak' directory. This manual step was not done for > 0.14.1 or 0.14.2. OSX packaging has been added to riak mainline now > though, so all future releases will be built for OSX. For now, you can > produce your own binaries in the way I mentioned above. > Sorry for the inconvenience and confusion. > -Jared > > On Mon, Sep 5, 2011 at 8:27 PM, Greg Stein wrote: >> >> Hello, >> >> I went to download Riak for Mac OS, and started to follow the instructions >> on: >> http://wiki.basho.com/Installing-on-Mac-OS-X.html >> >> It says to download: >> http://downloads.basho.com/riak/riak-0.14/riak-0.14.2-osx-i386.tar.gz >> >> However, that resource does not exist. There is only 0.14.0: >> http://downloads.basho.com/riak/riak-0.14/riak-0.14.0-osx-i386.tar.gz >> >> >> Something needs to be fixed :-) >> >> Cheers, >> -g >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Review request: revamped http transport
Hi all, I've got a rough first cut of a modified RiakHttpTransport class. You can review it here: https://github.com/gstein/riak-python-client/commits/newhttp (note that it includes changes from my 'relax-deps' and 'simplify' branches; just look at the most recent commit(s)) This code has been tested in very simple form. I haven't done an extensive series of testing. No multi-thread, multiple hosts, etc. I *have* verified that the HTTP connection is persistent, as expected, so it should be much faster than normal. But I don't have any performance tests to run. ... my hope is that some more experience Riak python developers can do some testing. This is just a rough draft for initial review on the approach. It needs comments and documentation, if people like the direction. Some comments on the ConnectionManager: * when creating a ConnectionManager, you can specify multiple hosts (eg. all the riak servers) * in a single-threaded environment, only one connection will be opened right now. my next change will be to pre-open one connection to each host. * when new connections are needed (ie. in a multi-threaded environment), it round-robins across the set of servers * higher-level logic can remove hosts if they go down (I stlll need to remove existing conns) * similarly, hosts can be added as they are discovered or added to the ring With some more work to pass along multiple hosts to the constructor of RiakHttpTransport, then the full capability is reached and RiakHttpReuseTransport and RiakHttpPoolTransport can be removed. Similar changes can be made in pbc.py to use a ConnectionManager for connections to the server(s), and RiakPbcCachedTransport can be removed. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Review request: revamped http transport
On Wed, Sep 7, 2011 at 20:42, Brett Hoerner wrote: > Greg, can you make a Github pull request from your branch? > > It'd be easier to review the commits together (well, git log is easy) > but more importantly comment in-line. Done: https://github.com/basho/riak-python-client/pull/56 Thanks! -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Python transports
Hey, all... After a couple comments on my recent work, and some archaeology on the Python/Riak work over the past month... I've realized that I might have a very different view of RiakTransport compared to what I'm seeing in the current work. I figured it best to bring that to the forefront and discuss: In my view, RiakTransport is used by RiakClient (and others) to "talk to the Riak server". Some of the current work, and some proposed pull requests, seem to take the position that a RiakTransport is "one connection to a server, and the client should manage those". Needless to say, I'm in favor of my own position :-) ... I think it is best to transfer *all* responsibility for talking to the server(s) to the transport layer. I really don't think the client/bucket/object layers should know anything about talking to the server(s). I'd like to see the transport layer be told about all server(s) available, and then it Just Works. I'm still a newbie with all this code, and need to keep plugging away at the higher levels of functionality and compensation for problems. I'd like to build up some code that contacts *one* given server, asks for all of the ring servers, and then opens connections to those servers. And then, it should (automatically) maintain client connections based on what is happening with the Riak cluster. The current (proposed) code manages connections to N servers, but has no automatic add/remove based on changes in the cluster status. I think this happens at a layer *just* above the actual transport. ie. something tracks the changes in the ring status and its servers, and transmits those changes into the transport layer, which alters its communication with that cluster (regardless of whether that communication is via HTTP or protobuf). Okee doke. That's the end of my brain dump and future thoughts on the transport and communication layer. I'd really like some feedback, review, and thoughts. Thanks! -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak 1.0 pre-release series
On Wed, Sep 7, 2011 at 19:24, David Smith wrote: >... > The first stage is "pre-release" (aka PR) builds that represent the > feature-complete Riak 1.0, with minor cleanup and tweaks still > ongoing. You WILL find some rough edges with packaging and the > features -- we're still working through these and will do pre-releases > as often as necessary to get these issues resolved. We'll also be Understood. I just checked out the link (provided below) and did not find a *source* distribution. Could you work that into the distribution? In my particular situation, I'm working on Mac OS 10.5 (Leopard) and Ubuntu 9.x. I can manage building if given a source distro, but would certainly prefer pre-built packages (32-bit on both platforms). >... > We would love to get your feedback on these builds, good or bad. We're > trying this new process to do a better job of incorporating community > feedback BEFORE the release. :) hehe... helpful :-) ... I'm not at production load, so I won't be able to find server bugs for you. But I'll provide packaging feedback where I can (as above). Most of my feedback will be on the client side, and I'm already working to engage Basho devs on the riak-python-client project. >... > creating duplicate issues. Also note that the docs on wiki.basho.com > may still reflect 0.14.2; we'll be addressing those over next few > weeks. I had a one-line change for one of those pages last week, and sent that to this mailing list (and it was promptly fixed; yay!). What is the recommended "best" process for feedback? Should we all fork the website, make changes, and send pull requests? Send mail here? File issues? >... Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak 1.0 pre-release series
On Thu, Sep 8, 2011 at 07:47, David Smith wrote: > On Thu, Sep 8, 2011 at 5:41 AM, Greg Stein wrote: > >> I just checked out the link (provided below) and did not find a >> *source* distribution. Could you work that into the distribution? > > Yes, there should be one of those -- will ping Jared. Thanks! >> In my particular situation, I'm working on Mac OS 10.5 (Leopard) and >> Ubuntu 9.x. I can manage building if given a source distro, but would >> certainly prefer pre-built packages (32-bit on both platforms). > > There is a i386 .deb which may work on ubuntu9; would be curious to > know if it does. :) I'm away [on vacation] from my ubuntu systems right now, but will try it late next week and provide feedback. >... >> I had a one-line change for one of those pages last week, and sent >> that to this mailing list (and it was promptly fixed; yay!). What is >> the recommended "best" process for feedback? Should we all fork the >> website, make changes, and send pull requests? Send mail here? File > > Pull requests are easiest from a procedural standpoint. Will do. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python transports
Yup, agreed. I can certainly see use-cases where a client may have particular knowledge about data/server locality, and wants to open connections to *just* those servers. The transports that I am advocating open connections to servers described by one layer higher. That layer could say "all Riak servers in the cluster", or it could say "these three servers holding copies of data". An auto-discovery layer (to add and remove) in-between the client and the transport would be handy. I can envision that, and my proposed code makes it possible, but (at my current stage) I don't have a use for it. I'd hope that others can build on my connection pooling work to see that through. Cheers, -g On Thu, Sep 8, 2011 at 07:49, Phil Stanhope wrote: > I like the idea of RiakTransport as you describe it. It opens the door to > other potential underlying transports and isolating a client from knowledge > of those (websockets, tcp, zeromq messaging come to mind). I'm not > suggesting that RIAK will ever support these transports by this comment, > however. > I also agree with the need to have RiakRingAwareTransport as an additional > layer that *might* be used by a client. There may be valid reasons why a > client might want to force particular traffic onto a subset of the ring > (e.g. M/R config, Search Config, forcing read/write traffic onto different > nodes, etc). Again, I'm not suggesting that using a subset of the ring for > particular operations is the best practice. But it may be necessary to do so > in order to validate and do certain types of testing to prove or disprove > certain access patterns. > -phil > > On Thu, Sep 8, 2011 at 7:31 AM, Greg Stein wrote: >> >> Hey, all... >> >> After a couple comments on my recent work, and some archaeology on the >> Python/Riak work over the past month... I've realized that I might >> have a very different view of RiakTransport compared to what I'm >> seeing in the current work. I figured it best to bring that to the >> forefront and discuss: >> >> In my view, RiakTransport is used by RiakClient (and others) to "talk >> to the Riak server". >> >> Some of the current work, and some proposed pull requests, seem to >> take the position that a RiakTransport is "one connection to a server, >> and the client should manage those". >> >> Needless to say, I'm in favor of my own position :-) ... I think it is >> best to transfer *all* responsibility for talking to the server(s) to >> the transport layer. I really don't think the client/bucket/object >> layers should know anything about talking to the server(s). I'd like >> to see the transport layer be told about all server(s) available, and >> then it Just Works. >> >> I'm still a newbie with all this code, and need to keep plugging away >> at the higher levels of functionality and compensation for problems. >> I'd like to build up some code that contacts *one* given server, asks >> for all of the ring servers, and then opens connections to those >> servers. And then, it should (automatically) maintain client >> connections based on what is happening with the Riak cluster. The >> current (proposed) code manages connections to N servers, but has no >> automatic add/remove based on changes in the cluster status. I think >> this happens at a layer *just* above the actual transport. ie. >> something tracks the changes in the ring status and its servers, and >> transmits those changes into the transport layer, which alters its >> communication with that cluster (regardless of whether that >> communication is via HTTP or protobuf). >> >> >> Okee doke. That's the end of my brain dump and future thoughts on the >> transport and communication layer. I'd really like some feedback, >> review, and thoughts. >> >> Thanks! >> -g >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak 1.0 pre-release series
No worries. That is what "pre-release" is all about :-) ... I'm focusing on the Python support right now, and 0.14.x works just fine for that, but I'll look into the 1.0-pre in the next week or two. Thanks! -g On Thu, Sep 8, 2011 at 13:02, Jared Morrow wrote: > Sorry about the lack of a source package, that was my oversight. I just > uploaded one to downloads.basho.com. > Thanks for the feedback, > -Jared > On Thu, Sep 8, 2011 at 5:52 AM, Greg Stein wrote: >> >> On Thu, Sep 8, 2011 at 07:47, David Smith wrote: >> > On Thu, Sep 8, 2011 at 5:41 AM, Greg Stein wrote: >> > >> >> I just checked out the link (provided below) and did not find a >> >> *source* distribution. Could you work that into the distribution? >> > >> > Yes, there should be one of those -- will ping Jared. >> >> Thanks! >> >> >> In my particular situation, I'm working on Mac OS 10.5 (Leopard) and >> >> Ubuntu 9.x. I can manage building if given a source distro, but would >> >> certainly prefer pre-built packages (32-bit on both platforms). >> > >> > There is a i386 .deb which may work on ubuntu9; would be curious to >> > know if it does. :) >> >> I'm away [on vacation] from my ubuntu systems right now, but will try >> it late next week and provide feedback. >> >> >... >> >> I had a one-line change for one of those pages last week, and sent >> >> that to this mailing list (and it was promptly fixed; yay!). What is >> >> the recommended "best" process for feedback? Should we all fork the >> >> website, make changes, and send pull requests? Send mail here? File >> > >> > Pull requests are easiest from a procedural standpoint. >> >> Will do. >> >> Cheers, >> -g >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python transports
On my 'newhttp' branch, the ConnectionManager class handles all connections to the server(s) for the transports (many connections to many servers). The transport can worry about what goes onto the wire, and the CM worries about the underlying connections. Today, there are multiple transports, each attempting to manage connections in various ways, and frankly... doing a poor job of it. The HTTP transport opens a new connection for every request. To borrow a phrase from Scott Conant on 'Chopped', that "drives me to anger" :-) So yes, my changes are a big improvement in that we now have *persistent* connections to the server, with the commensurate performance improvement. The basic problem is that "RiakTransport" conflates wire formatting *and* connection management. I'm trying to separate those two, in order to improve the connections. Once they are separated, then connection policies (as you describe) will be much easier to implement. Your timeout work is great, and I completely agree with you: timeouts are a "must have" in a production environment. I don't see any problem adding timeouts; my comment on your github commit was more about 2.5 compatibility, than denial. And I consider it my problem to figure out how to make it work in a 2.5 environment. Cheers, -g On Thu, Sep 8, 2011 at 13:47, Brett Hoerner wrote: > I own the pull request that adds (very basic) pooling logic to the > Riak client. I left Transports alone (each is a single connection) and > decided to have pools be another class you pick just like transport. > This allows you instantly use any pooling logic (never remove down > servers, delete down servers, round robin, whatever) with *any* > transport class: Http, Http keep-alive, PBC, CachedPBC, etc. > > I've added pooling and timeouts and use both in production because I'm > honestly not sure how you use Riak in a highly available way without > them... so we need to make sure they both work well/easily under this > new scheme. > > Do your changes make anything new possible or are they just about cleanup? > > > > On Thu, Sep 8, 2011 at 5:17 AM, Greg Stein wrote: >> Yup, agreed. I can certainly see use-cases where a client may have >> particular knowledge about data/server locality, and wants to open >> connections to *just* those servers. The transports that I am >> advocating open connections to servers described by one layer higher. >> That layer could say "all Riak servers in the cluster", or it could >> say "these three servers holding copies of data". >> >> An auto-discovery layer (to add and remove) in-between the client and >> the transport would be handy. I can envision that, and my proposed >> code makes it possible, but (at my current stage) I don't have a use >> for it. I'd hope that others can build on my connection pooling work >> to see that through. >> >> Cheers, >> -g >> >> On Thu, Sep 8, 2011 at 07:49, Phil Stanhope wrote: >>> I like the idea of RiakTransport as you describe it. It opens the door to >>> other potential underlying transports and isolating a client from knowledge >>> of those (websockets, tcp, zeromq messaging come to mind). I'm not >>> suggesting that RIAK will ever support these transports by this comment, >>> however. >>> I also agree with the need to have RiakRingAwareTransport as an additional >>> layer that *might* be used by a client. There may be valid reasons why a >>> client might want to force particular traffic onto a subset of the ring >>> (e.g. M/R config, Search Config, forcing read/write traffic onto different >>> nodes, etc). Again, I'm not suggesting that using a subset of the ring for >>> particular operations is the best practice. But it may be necessary to do so >>> in order to validate and do certain types of testing to prove or disprove >>> certain access patterns. >>> -phil >>> >>> On Thu, Sep 8, 2011 at 7:31 AM, Greg Stein wrote: >>>> >>>> Hey, all... >>>> >>>> After a couple comments on my recent work, and some archaeology on the >>>> Python/Riak work over the past month... I've realized that I might >>>> have a very different view of RiakTransport compared to what I'm >>>> seeing in the current work. I figured it best to bring that to the >>>> forefront and discuss: >>>> >>>> In my view, RiakTransport is used by RiakClient (and others) to "talk >>>> to the Riak server". >>>> >>>> Some of the current work, and some proposed pull requests, seem to &g
Re: Python transports
Hi all, I've been putting more thought into this problem, particularly in contrast to the "client manages N transport [connections]." I believe that the latter is not very workable given the variance in underlying transports. Not enough information is available to the client to manage the connections properly without putting transport-specific information into the client (eg. the difference between EPIPE and ECONNREFUSED). I think it would be wrong to put connection-related code into the client. Here are the three layers that I see, and the direction my branch takes: 1. client: responsible for a high-level API for applications. It maps this API into the underlying transport primitives (as defined by riak.transports.transport.RiakTransport). 2. transport: maps the primitives into the appropriately formatted wire request(s), and handles the response(s). 3. connection manager (CM): handles multiple connections to multiple hosts for use by the transport. The CM creates connection objects that understand the protocol (e.g HTTPConnecction), which is also an object that the transport understands how to use. The connection objects can signal errors to the CM for removal when (say) the server goes down or is otherwise unavailable. The client does need to be aware of host/port pairs, and pass those into the transport for provision to the CM. (or possibly the client creates the appropriate CM, and passes that to the transport). The different types of CMs (connection type, connection policies/params etc) imply that the client may be the one to create this, with the right params. Otherwise, the transport would get a list of host/port pairs and CM options, and the transport would create the CM with connection objects appropriate to the transport. To follow up with my original concern, and to show a concrete example: In connection.py, we would create a subclass of HTTPConnection that overrides the .connect() method. If the superclass raises ECONNRefused, then the subclass would remove the host from the CM. The subclass does not have to manage EPIPE since it knows that HTTPConnection can already manage that itself (except for certain types of variant-sized requests, such as needs to be done for Luwak requests). There is a Socket class in connection.py for managing bare sockets for the protobuf connections. That needs to create a .send() method that manages EPIPE in some way. It would also have logic for ECONNREFUSED similar to our HTTPConnection subclass: remove the host from the available set. Long-running client applications need to monitor the state of the ring, and propagate join/leave changes into the available host/port pairs in the CM. If one server returns ECONNREFUSED and is removed from the available set, but it is determined that is *transient*, then the client would need to recognize that and put it back into the set. I do not have an answer for how the system can know the problem was transient, or how it recognizes the server is back. Possibly, the host moves to an "offline" list, and the CM periodically pings it to see if it is alive (again). If the client removes it (due to a detected ring change), then it removes it from the offline list. Possibly after time period T, it is removed from the offline list. I believe these are all workable details. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Question about Content-Type
Hey all, I just ran into a questionable situation. On the following doc page: http://wiki.basho.com/HTTP-Store-Object.html It says: " Important headers: Content-Type must be set for the stored object. Set what you expect to receive back when next requesting it. " So I ran the following test via telnet against a 0.14.0 server. Note the Content-Length is 5 given invisible characters: "def\r\n". $ telnet 127.0.0.1 8098 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. POST /riak/test?returnbody=true HTTP/1.0 Content-Length: 5 Content-Type: application/octet-stream def HTTP/1.0 201 Created X-Riak-Vclock: a85hYGBgzGDKBVIsbI0n/DOYEhnzWBmKDz84zpcFAA== Vary: Accept-Encoding Server: MochiWeb/1.1 WebMachine/1.7.3 (participate in the frantic) Location: /riak/test/F9uE8Heo6v0wtXY2rMKOktQbPGH Link: ; rel="up" Date: Sun, 11 Sep 2011 05:30:27 GMT Content-Type: application/json Content-Length: 5 def Connection closed by foreign host. My question is why the response says application/json when I gave the server application/octet-stream. Did I miss some RTFM documentation? Thanks, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[RFC] Python client: move to properties
Hi all, There are some non-Pythonic patterns in riak-python-client that should be pretty easy to switch. Things like client.get_r() and client.set_r() are kinda silly. Python long ago moved past the getter/setter paradigm, with the notion of directly exposing instance attributes. As Guido has said numerous times, "we're all adults here", so we don't need to wrap simple attribute access/change in methods which do exactly that anyways. Just allow apps to directly change it, since the developers doing that *are* adults and know what they're doing. For something like bucket.get_r()... first: it should not have a damned argument that is returned if you pass in a value. That is nonsense. Don't call the function if you're going to pass an argument! The remaining logic looks at a local value, or defers to the client default value. We can make "bucket.r" a property, and create a getter that does the right thing. Applications can then use "bucket.r" and they will get the right logic, rather than the messier "bucket.get_r()". There are similar changes throughout (eg. get_transport). But... this goes back to a question that I've raised earlier: what kinds of API guarantees are being made on the interface? Can we simply get rid of the .get_r() method? If we choose to do so, is there a deprecation policy? With a policy in place, then it would be easier for me (and others) to provide patches, knowing that they conform to compatibility requirements. As it stands, I'd be happy to code a patch, but am wary that my effort would be rejected per some (unstated) policy. I don't know how much of a compatibility policy lies with Basho or the community. Dunno how to help there. And back to the start: can we get the code simplified, and move towards properties? Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [RFC] Python client: move to properties
On Wed, Sep 14, 2011 at 10:40, Russell Brown wrote: > > On 14 Sep 2011, at 12:37, Mathias Meyer wrote: > >> The short answer: yes, we can and we should. I had that on my radar for a >> while too, because it felt un-Pythonic. >> >> As for deprecation, there's no specific rule for the Python client yet. I'm >> happy to accept a patch for it for e.g. a version of the client 1.4.0 with >> an announcement that support for these getters/setters will be removed in >> said version. I'm not a fan of removing things in patch versions myself, but >> that's certainly up for discussion. > > I'd really like to see the python client be more pythonic. That said, I'd > really like current production users to keep running their code, and take > advantage of new features without recoding their use of existing features. > For the Java client I've deprecated a lot of the API, but it will stay on for > a couple of releases. No-one likes living with warts on software but > deprecating is a good way to show that the un-idiomatic stuff is on the way > out, without pulling the rug from under users. > >> The Python client is quite old and has come a long way, so I'm all for >> getting rid of the cruft that came with it. >> > > Me too, but after it has been deprecated for at least one release? That isn't > an official policy, it is just the one I have set for the Java client. > Deprecate for one release, remove in the next. Does that sound reasonable for > the Python client, too? For at least three developers :-P ... there seems to be rough consensus here: * Documented APIs are first marked as "deprecated" in documentation/docstrings, and will produce appropriate warnings from the code (but remain available and with their old semantics) * One release is made with the deprecated; the next release afterwards may remove all deprecated APIs I've switched to this policy for my 'newhttp' branch. You'll note that I have added an 'api' classvar to the transports. The client code alters its invocations based on that API. Also note that the transport APIs are *not* documented, so I have taken the position of "okay to change". Third parties who *write* their own transport class and pass to the "documented" transport_class argument of RiakClient/RiakSearch may need to change their code. If their transport inherits from RiakHttpTransport or RiakPbcTransport, then it will "suddenly" jump from api==1 to api==2. If they wrote their transport from scratch (and/or subclassing RiakTransport), then it will remain as api==1 and function as before. I don't have a better solution (without applying further thought), but am going to say "good enough" with the position that RiakTransport and its subclasses are not documented and (thus) subject to change. Applications that simply pass a builtin transport will require no change. For all those getters/setters, we can mark them as deprecated (using [1]) and then simply add the new properties. Cheers, -g [1] https://github.com/gstein/riak-python-client/commit/85e9d5460787d2ad6b3e07b106f8cd71e05d ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Anyone going to Surge?
I'll be there Wednesday/Thursday. Not attending the conference itself, but "around" (hanging in the hallway or the bar). On Sun, Sep 25, 2011 at 20:00, Mark Phillips wrote: > I mentioned this in Friday's Recap but I figured it was worth asking > again: anyone going to Surge [1] next week? I'll be there along with > Sean Cribbs, Ian Plosker, Ryan Zezeski, and a few other members of the > Basho team. If you're attending and haven't already notified me, let > me know (or I'll have trouble buying you a beverage or two). > > Mark > > 1 - http://omniti.com/surge/2011 > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Running the python client's test suite
Your search/luwak tests are failing, presumably because those options are not enabled in your Riak installation. You can disable them in the test suite by doing: $ SKIP_SEARCH=1 SKIP_LUWAK=1 python setup.py test You also seem to be running into a problem with leftover keys in one of the test buckets. That problem is fixed by pull request #67, which has not been merged :-( ... if you grab that fix, then a couple of your errors will go away. I see some additional failures based on extra siblings... I don't know what is happening there. Cheers, -g On Mon, Oct 3, 2011 at 13:22, Honza Král wrote: > Hi everybody, > > I cannot get the test suite in the python client to run, I have tried > on Arch Linux on my notebook and then on an Ubunty Natty system on EC2 > with riak 1.0.0: > > wget http://downloads.basho.com/riak/riak-1.0.0/riak_1.0.0-1_amd64.deb > sudo dpkg -i riak_1.0.0-1_amd64.deb > sudo /etc/init.d/riak start > > sudo apt-get install python-protobuf > git clone git://github.com/basho/riak-python-client.git > > virtualenv riak > . riak/bin/activate > cd riak-python-client > python setup.py install > python setup.py test > > and I get: > FAILED (failures=6, errors=19) > > The full test output can be found at: > http://www.honzakral.com/riak_python_test.out > > If anybody can point out what I did wrong or how to make the tests > work I would be most grateful. > > Thanks! > > Honza Král > E-Mail: honza.k...@gmail.com > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?
I'm with Kyle on this one. Even better, my 'newhttp' branch on Github enables this kind of multiple-connection and automatic fail-over. That branch does have a basic sketch for automatic addition/removal of Riak nodes as you manipulate your cluster. I'll need it one day, but not "now", so I haven't finished it yet (the monitor.py background thread). Regarding security: it is the same for option A and B and C (you're just shifting stuff around, but it is pretty much all the same). Put your webservers in one security group, and the Riak nodes in another. Open the Riak ports *only* to the webserver security group and to each other. Avoiding two services on one machine (e.g web + riak) is also much easier to manage/maintain. Just have web machines and riak machines. Cheers, -g On Tue, Oct 4, 2011 at 17:09, Aphyr wrote: > Option C: Deploy your web servers with a list of hosts to connect to. Have > the clients fail over when a riak node goes down. Lower latency without > sacrificing availability. If you're using protobufs, this may not be as big > of an issue. > > --Kyle > > On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wrote: >> >> I am contemplating two different architectures for deploying Riak nodes >> and web servers. >> >> Option A: Riak nodes are in their own cluster of dedicated machines >> behind a load balancer. Web servers talk to the Riak nodes via the load >> balancer. (See diagram http://eamonn.org/i/riak-arch-A.png ) >> >> Option B: Each web server machine also has a Riak node, and there are also >> some Riak-only machines. Each web server only talks to its own localhost >> Riak node. (See diagram http://eamonn.org/i/riak-arch-B.png ) >> >> >> All machines will deployed as elastic cloud instances. I will want to >> spin up and spin down instances, particularly the web servers, as demand >> varies. Both load balancers are non-sticky. Web servers are currently >> talking to Riak via HTTP (though might change that to protocol buffers in >> the future). Currently Riak is configured with the default options. >> >> Here is my thinking of the comparative advantages: >> >> Option A: >> >> - Better for security, because can lock down the Riak load balancer to >> only open a single port and only for connections from the web servers. >> - Less churn for Riak of nodes entering and leaving the Riak cluster (as >> web servers spin up and down) >> - More flexibility in scaling storage and web tiers independently of each >> other >> >> Option B: >> >> - Faster localhost connection from web server to Riak >> >> I think availability is similar for the two options. >> >> The web server response time is the primary metric I want to optimize. >> Most web server requests will cause several requests to Riak. >> >> What other factors should I take into account? What measurements could I >> make to help me decide between the architectures? Are there other >> architectures I should consider? Should I add memcached? Does anyone have >> any experiences they could share in deploying such systems? >> >> Thanks. >> __ >> Eamonn >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?
On Oct 4, 2011 7:04 PM, "Mike Oxford" wrote: > > On Tue, Oct 4, 2011 at 3:59 PM, Greg Stein wrote: > > Regarding security: it is the same for option A and B and C (you're > > just shifting stuff around, but it is pretty much all the same). Put > > your webservers in one security group, and the Riak nodes in another. > > Open the Riak ports *only* to the webserver security group and to each > > other. > > Not quite the same. If you get rooted on a webhead you don't want your > data there (esp with an erl shell.) Ah. Yeah. Quite true. > > Avoiding two services on one machine (e.g web + riak) is also much > > easier to manage/maintain. Just have web machines and riak machines. > > I disagree; it's more work to maintain two machines correctly. However > the extra work is worth it for security/scalability. Note that his original description had two machine types: web+riak, and riak-only. My point was about that two service box being a pain. Given that you have two types, then break up the boxes into the two -only formats and increase your security. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?
On Oct 4, 2011 7:01 PM, "Mike Oxford" wrote: > > You'll want to run protobufs if you're looking to optimize your > response time; HTTP sockets (even to localhost) will require much more > overhead and time. Hmm? The protocol seems moot, compared to inter-node comms when r > 1. Protocol parsing just doesn't seem like much of a factor. On my laptop, I was seeing a 3ms response time against one node. I can't imagine that parsing was more than a few percent, no matter the protocol. (and no, I have no specific numbers to confirm/deny my thought experiment here) > Even better would be unix sockets if they're available, and you can > bypass the whole TCP stack. What? Is that even an option for Riak? I haven't seen anything about that. >... Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?
I don't see that multiplexing or TCP setup is specific to HTTP. The only difference between protobuf and HTTP is what goes on the wire. Not how the wire is managed. (and with that said, the Python client managed the wire in the most horrible ways imaginable for the HTTP Client; I've since fixed that on my branch) On Oct 4, 2011 11:37 PM, "Aphyr" wrote: > Internode times in our datacenter at SL are indistinguishible from > loopback; TCP/IP processing dominates. HTTP, on the other hand, involves > either in-depth connection management/multiplexing, or TCP/IP > setup/teardown latency at either end of a request. In read-write heavy > apps, protobufs outperforms HTTP in throughput by 2x or more, against > objects of 500-4000 bytes. That's with the ruby client; ymmv. > > --Kyle > > On 10/04/2011 07:18 PM, Greg Stein wrote: >> >> On Oct 4, 2011 7:01 PM, "Mike Oxford" > <mailto:moxf...@gmail.com>> wrote: >> > >> > You'll want to run protobufs if you're looking to optimize your >> > response time; HTTP sockets (even to localhost) will require much more >> > overhead and time. >> >> Hmm? The protocol seems moot, compared to inter-node comms when r > 1 >> Protocol parsing just doesn't seem like much of a factor. On my laptop, >> I was seeing a 3ms response time against one node. I can't imagine that >> parsing was more than a few percent, no matter the protocol. >> >> (and no, I have no specific numbers to confirm/deny my thought >> experiment here) >> >> > Even better would be unix sockets if they're available, and you can >> > bypass the whole TCP stack. >> >> What? Is that even an option for Riak? I haven't seen anything about that. >> >> >... >> >> Cheers, >> -g >> >> >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Hey, Python client users
Hey all, The Basho folks have been slow to integrate changes, given their busy schedule with the 1.0 release. I've had a couple branches hanging out for a while to deal with HTTP problems and to deal with Issue #53. They've been separate for better review/merging by Basho, but it finally created too many problems for my own work to keep them separate. I've just now merged them into a single branch so that I can get my own work done. This doesn't have timeout capabilities (Brett has this on a branch, but without future direction from Basho and how that timeout work interacts with future http work, it is unclear on where to go with developing timeouts). But if you can deal without timeouts right now, then I would state this branch is the best option for Python access to Riak. Since this is also divorced from my pull requests for issue 53 and fixing http, I might go ahead and add timeouts soon. Even if the timeout work isn't Right, it would still be useful. Anyhow, enough description. If you're using Python, then I'd highly recommend: https://github.com/gstein/riak-python-client/tree/proper I hope that helps, and let me know if you run into any problems with it. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Hey, Python client users
On Mon, Oct 17, 2011 at 15:13, Russell Brown wrote: > > On 16 Oct 2011, at 06:18, Greg Stein wrote: > >> Hey all, >> >> The Basho folks have been slow to integrate changes, given their busy >> schedule with the 1.0 release. > > This is true. Sorry. But we are now putting some time into the client > libraries in general and the python client in particular. No need to apologize! I'd much rather you work on the 1.0 release than the libraries! I can fix the libraries to my needs; I have *no* idea how to fix Riak itself. Go wild :-) >> I've had a couple branches hanging out >> for a while to deal with HTTP problems and to deal with Issue #53. >> They've been separate for better review/merging by Basho, but it >> finally created too many problems for my own work to keep them >> separate. I've just now merged them into a single branch so that I can >> get my own work done. > > Yeah, I started merging your P/Rs in a while back, but then the last 2 > conflict with a couple of P/Rs from Brett Hoerner. I guess at the point we > stalled as there is some decision to be made about the best way to handle > pooling. RIght. Timeouts are needed; Brett is spot-on with that. They're definitely needed for a production system. The open question is how to work them into the client, and that depends upon the development direction. > I'm reading over both Brett and your changes today. > > There is quite some work to merge the commits you've been doing into the > official repo, but I'd like to get that done rather than have a competing > fork. To be clear: I'm not attempting to create a long-term competing fork. I've been using my 'newhttp' branch for proper connections to Riak, but then I smashed into the lack of .store() returning an identifier. In the past, I kinda switched branches based on whether I needed good http work, or .store to work... but I got tired of that. So I created a new branch and merged it all. And I like to share my work, I like to help people, and I like feedback. If my fully-merged branch can help people? Great. Win. But no... I expect that when you guys (Basho) have cycles, that we'll figure out the right approach for connection management and get the committed, and my branches will disappear. >... >> Anyhow, enough description. If you're using Python, then I'd highly >> recommend: >> https://github.com/gstein/riak-python-client/tree/proper >> >> I hope that helps, and let me know if you run into any problems with it. > > Would you recommend taking this fork and merging it with the > basho/riak-python-client? I'm building a business based on that branch. So yeah: I have complete faith in it. I fully believe it is the correct direction to go for the Python client. But: I also sent the P/R to create a discussion on the approach (since Brett has a different approach, I felt discussion was warranted). I think it is the right approach, but I did not want to invest time in full documentation if it was to be rejected. So a merge would be great, but further documentation would be needed before the next client release. There will be some test updates since my code simplifies the set of transports. There is some deprecation of the old APIs that needs to happen, and I set up the prep-work for that, but didn't bother with all of that since I was waiting on selection. The short answer: my branch puts connection management into the transport, and Brett's patches puts that into the client. A decision needs to be made, and that will determine future work. I've previously emailed with an explanation for why I believe the transport class should handle this (via a connection manager), and why (IMO) the client should not be worrying about it. Making this decision removes blocks for future changes (eg. timeouts). Anyway... I'd merge 'proper' into the upstream master branch (to fix http and issue 53), and then I'd be on the hook for documentation, test changes, backwards compat work, etc. I believe it is totally the right direction. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Hey, Python client users
On Tue, Oct 18, 2011 at 10:53, Soren Hansen wrote: > 2011/10/16 Greg Stein : >> Anyhow, enough description. If you're using Python, then I'd highly >> recommend: >> https://github.com/gstein/riak-python-client/tree/proper >> >> I hope that helps, and let me know if you run into any problems with it. > > Thanks for this. Your changes look great. > > I take it you're not using Python 2.7? I'm using Python 2.7 in production, and 2.5 on my dev machine (Mac OS, Leopard). I've recently installed 2.7 on my dev box for its builtin 'ssl' module, and so more of my dev work is moving to 2.7. I need an HTTPSConnection subclass to be used by the Riak client, which should be possibly by subclassing some transport stuff; I'll be validating that work shortly. > "The unicode problem"[1][2] > makes it rather unusable for me with Python 2.7. :( (Not due to your > changes, but the Riak Python client in general) > > [1]: https://issues.basho.com/show_bug.cgi?id=649 > [2]: https://github.com/basho/riak-python-client/issues/32 What is it about 2.7 that makes the problem worse for you? Are you using Unicode keys or bucket names? Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Hey, Python client users
On Oct 18, 2011 11:12 AM, "Greg Stein" wrote: >... > The short answer: my branch puts connection management into the > transport, and Brett's patches puts that into the client. A decision > needs to be made, and that will determine future work. I've previously > emailed with an explanation for why I believe the transport class > should handle this (via a connection manager), and why (IMO) the > client should not be worrying about it. Making this decision removes > blocks for future changes (eg. timeouts). Thoughts, Russell? Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Hey, Python client users
On Oct 20, 2011 7:44 AM, "Russell Brown" wrote: >... > I had a chat with Reid Draper about it (he is much more pythonic than me), he concurs. I just need to do a bit more reading through the code, and unless anyone here objects (anyone? class? anyone?), I'll start work at merging your fork into the Basho repo so we can get back on track. I hope it is no work; I think that it is close to basho/master, and can merge clean. I can't check for a while, as I'm on planes for the next 22 hours :-( After you merge, I'll start three branches: one for doc/test fixes, another for deprecating bogus transports, and one to work with Brett on timeouts. I'd like his input on that part, as we both want timeout handling for our production systems. Thanks! -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: key garbage collection
On Thu, Nov 3, 2011 at 02:39, Justin Karneges wrote: >... > Say you have an operation that requires creating two keys, A and B, and you > succeed in creating A but fail in creating B. How do you delete A after the > fact? I have two ideas: > > 1) Run periodic MapReduce operations that do full db scans looking for garbage > keys and deleting them (this seems really horrible, but I'll admit I'm new to > distributed DBs and MapReduce). I believe that you will *always* need to do this. Without transactions, you can always end up with cruft. Best you can do is minimize how often you need to run the scavenge process. > 2) Maintain cleanup logs that explicitly identify possibly offending keys, for > optimized cleanup processing. These logs need to be stored *somewhere*, but that storage could also fail. That is why I believe you'll need a periodic full scan for garbage. (and note this applies whether "storage" is memory, disk, Riak, or whatever else) >... > So far so good. Now for handling cleanup. Periodically, we scan the > "cleanup" bucket for keys to process. Since keys only exist in this bucket at > the moment of a write (they are deleted immediately afterwards), in practice > there should hardly be any keys in here at any single point in time. We're > talking single digits here. Much better than a full db scan to find garbage > keys. Also, the keys to process can be narrowed down by time (e.g. > 5 > minutes ago) based on the key name. This will minimize your scans, but not eliminate them. You may not be able to write to the "cleanup" bucket because you've lost all network connectivity to the Riak cluster. Not a bad assumption, given that you could not write out B (what makes you think you could write to "cleanup"?). Personally, rather than attempting to write something else to a failing Riak cluster, I'd suggest keeping these keys in memory along with a background thread that periodically attempts to clean them up. You're gonna lose the keys if the client dies, but hey... as I said: best you can do is to minimize the full scans. >... Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problem installing Riak Python client
On Thu, Nov 10, 2011 at 11:51, Nate Lawson wrote: >... > BTW, are there any plans for the Riak python client to use the protobuf C > library directly via ctypes? The pure python implementation of protobuf seems > a little slow. Not that I've seen. I plan to use the HTTP interface because I can encrypt it, and I can avoid MitM attacks. That isn't possible with the protobuf interface. I think you'll need to find somebody that deploys heavy use of the protobuf interface to be interested enough to improve its speed. For the basic problem spawning this thread: I've issued a pull request for my "deprecate" branch which disables the protobuf requirement. You'll be able to install the client without needing protobuf to be installed. Of course, you'll need protobuf if you want to use that transport... but if you stick to HTTP, then you'll be fine. Note that I've sped up the Python HTTP transport. It is definitely faster (via persistent connections), but I haven't done a comparison against protobufs yet. Basho has a benchmarking tool that I might try. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problem installing Riak Python client
On Nov 10, 2011 4:26 PM, "Nate Lawson" wrote: > > > On Nov 10, 2011, at 4:04 PM, Greg Stein wrote: > > > On Thu, Nov 10, 2011 at 11:51, Nate Lawson wrote: > >> ... > >> BTW, are there any plans for the Riak python client to use the protobuf C library directly via ctypes? The pure python implementation of protobuf seems a little slow. > > > > Not that I've seen. I plan to use the HTTP interface because I can > > encrypt it, and I can avoid MitM attacks. That isn't possible with the > > protobuf interface. I think you'll need to find somebody that deploys > > heavy use of the protobuf interface to be interested enough to improve > > its speed. > > There should be an SSL option for Riak with protobufs, perhaps on an alternate port. No reason to go to http just to get SSL. Certainly, but I haven't heard anyone thinking of that either. Unless/until somebody codes that up, then I'm sticking to HTTP(S). > > Note that I've sped up the Python HTTP transport. It is definitely > > faster (via persistent connections), but I haven't done a comparison > > against protobufs yet. Basho has a benchmarking tool that I might try. > > I wonder if gzip encoding would also help for larger keys/values? It absolutely would. It is generally faster to compress/decompress than the time spent to transfer the extra bytes on the wire. I dunno Erlang, but I could certainly fix the Python client to deal with potential compression. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Luwak PUT Content-Range
On Nov 29, 2011 5:08 AM, "John Axel Eriksson" wrote: > > Is it possible to incrementally add to a file in Luwak using PUT and the Content-Range header. I just assumed that it was but I can't seem to > get the expected results, it just overwrites whatever the key contents were before. The reason I want to do this is because we have some pretty > large files I don't want to load fully into memory before PUTing them. Hmm? You can read from a file and write to the http socket. There is no reason or need to load the entire contents into memory. I don't know what client you're using, but I do know the Python client is broken in this regard. It erroneously loads the full content into memory. But there is nothing from the Riak server that demands such an approach. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pooling with the python-riak-client
On Fri, Jan 20, 2012 at 15:53, Michael Clemmons wrote: >... > I've decided to go back and cleanup that approach and reapply it to the > current master branch. To my surprise I found that the ConnectionManager > supports multiple connections and has the tools to add and remove them. > Looking at the simple case of the RiakHttpTransport layer and the > http_request method it looks like it should grab a new connection(or an old > one) and try it and if it fails move on to the next putting the host port > pair at the bottom of the list. Right. That is the intent. I will note that a failure does not automatically remove a host/port pair, but fails the particular request (after N retries). I'm not sure that is entirely "the best strategy" (see below, ref: many strategies), but the plumbing should be there for applications to decide the proper behavior. It may be possible to decide a best/default strategy so that (most) applications do not have to get involved. > So it looked like all I needed to do was update the client code to accept a > list of hostport pairs and it would just work, which sounded too easy to be > true. I tested it anyways and if I use one hostport that is a working riak > node it connects and everything works. If I include 2 nodes one working and > one a random port it fails no matter the order. So its not just trying to > connect to the first and failing its connecting to them all and failing if > any fail. Well, yeah. You started the thing up, saying all the host/port pairs were proper. It is telling you they are not :-) One question: when does the failure happen? At instantiation time, or later at request time? On the first request, or some later request? (as I recall, it should lazy-open all host/port pairs, so the failure should not happen until later... and only when the pair is attempted to be used) > Anyone have any idea of why its built this way and what other solutions The overall intent is to connect to (at least) one known working node. That node can then be queried for "all" other known working nodes, which are then added into the ConnectionManager (CM). The (long-lived) process can then continue to monitor the status of the ring and make corresponding updates to the CM. The code does not (yet) have a well-defined process for *removal* of non-working nodes. That is a complex application-level decision. Should it remove the host/port permanently? If it is just a network glitch, or the particular host is have transient issues, then maybe the pair should be kept around (but unused) and re-installed in a minute or two when the host starts replying again. Maybe you just remove the pair and wait for a general background monitoring thread to note their existence and reinstall the pair. For a single-threaded, short-lived application, the multiple host/port pair capability is not very useful. That functionality is really necessary for multiple threads and/or long-lived processes. In this scenario, as existing connections get used up, the CM will spin up new connections for threads to use to perform their operations. (the underlying connections are persistent and reused until the server decides to close them, where the client will attempt to reopen and reuse the connection again) What happens when you give the CM a list of *working* host/port pairs? Does that still fail for you? It is true that when one goes down, then some level of the stack should remove the pair, but "which level" just hasn't been decided. There is also a "monitor" concept that has been sketched out in the code, but not implemented. See riak/transports/monitor.py. That should be used in a long-running application to periodically hit the riak servers, querying what nodes are in the ring, and adding new ones and removing broken ones. I sketched it out but neither myself nor anybody else has further worked on such logic. > people have worked out on their own? My intent is to do this so it merges > cleanly with the current master, and doesn't introduce unnecessary change, > to increase the likelyhood of a successful pull request. For production code, some of this host/port pair management needs to be done. It would be nice to have the monitor (thread) completed, but that may not be appropriate for your application. I think that Brett's work on timeouts is necessary for production code. The key decision point here is Python compatibility support. If the library requires 2.7, then it should be quite easy to merge his changes. I think (but don't recall offhand) that the timeout parameter for HTTPConnection might be available in 2.6, but I definitely know it is not available for Python 2.5. When I began my work on the client, I was targeting 2.5 and made many compatibility changes with that in mind. This was primarily to support my 2.5-based dev environment, even though I was going to deploy to 2.7. I eventually upgraded my dev environment, so compatibility isn't a huge concern for me any more. I would leave
Re: Connection Pooling with the python-riak-client
On Tue, Jan 24, 2012 at 12:34, Michael Clemmons wrote: > Greg, > Your amazing thanks. In my application its failing on the start of the > application, I do not believe while trying to do a request but its possible > let me grok and get back to you with some trace backs. Sounds good. I just looked at the code and it "should" lazy-connect. You shouldn't see any failures at setup time. Only when you make a request and it tries to establish te connection. > As far as Im aware to define more than one hostport with the client you > still have to hack the client. Adding an optional hostports or servers > parameter would be simple. Yes. I was focusing on the lower-level code, and didn't hack RiakClient's API. I'm thinking two things: add hostports, and a start_monitor parameter (and code up the latter, of course). > Being able to define the connection manager as a kwarg might be a good > option. If the intent is to define the conextmanager by subclassing the > transport, things make more sense. There are a couple paths to take, at least. The current code says "subclass the transport and override the default_cm classvar". That is sufficient, but maybe there is a better/clearer approach. > I think for multiple nodes round robin > might be the most sane default for longterm or short term connections. It does imply that you'll distribute your request load across the ring. If N clients are running, then this is important (otherwise, all clients would just hit the first node in the config). There are certainly more sophisticated algorithms possible, but the round-robin used right now should work for most users. Note that if a particular request takes a while, the connection is NOT in the ConnectionManager. That will ensure that you don't back up a bunch of future requests behind a single, slow request. Only when the (slow) request completes will the connection be returned to the CM for usage by another request. > Thanks again for replying, I'll see what happens when I try this with > multiple live nodes, and get back with more thoughts. Excellent! Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python Client Thread Safe?
IIRC, that error was seen when connecting to Riak 0.14.0. What version of Riak are you connecting to? I might also suggest using the current, unreleased Python client from the master branch on GitHub. It has much better support for threads and persistent connections. Just don't use variant client_id values and PB connections (switch to Riak 1.1 and ignore client_id). One Client can be shared across threads, but not Objects. I don't recall whether Buckets can be shared. Cheers, -g On Feb 26, 2012 2:39 PM, "Jim Adler" wrote: > I'm getting the following error while using protocol buffers > (RiakPbcTransport) with more than one thread (stack trace below): > >Socket returned short packet length 0 - expected 4' > > I'm using the 1.3.0 Python client on Mac and Ubuntu 11.04 and have seen > the same error on both OS's. A single-thread works fine as does the > RiakHttpTransport. > > Anyone have this problem? > > Thanks, > Jim > > File > "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li > b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/bucket.py", line 260, > in get >return obj.reload(r) > File > "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li > b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py", line > 373, in reload >Result = t.get(self, r, vtag) > File > "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li > b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", > line 195, in get >msg_code, resp = self.send_msg(MSG_CODE_GET_REQ, req, None) > File > "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li > b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", > line 387, in send_msg >return self.recv_msg(conn, expect) > File > "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li > b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", > line 413, in recv_msg >self.recv_pkt(conn) > File > "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li > b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", > line 460, in recv_pkt >len(nmsglen)) > RiakError: 'Socket returned short packet length 0 - expected 4' > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python Client Thread Safe?
To clarify my thread-sharing comments: those are with respect to the unreleased client. And I've checked: the Bucket objects are sharable, too. Obviously, not if multiple threads are calling competing .set_r() or somesuch on the client or bucket. The short answer is share Client and Bucket objects to get access to Objects, and keep those per-thread. The underlying transport will manage multiple connections, persistently, in a thread-safe manner. (and I'm unclear on threading issues for the 1.3.0 python client) Cheers, -g On Mon, Feb 27, 2012 at 22:37, Greg Stein wrote: > IIRC, that error was seen when connecting to Riak 0.14.0. What version of > Riak are you connecting to? > > I might also suggest using the current, unreleased Python client from the > master branch on GitHub. It has much better support for threads and > persistent connections. Just don't use variant client_id values and PB > connections (switch to Riak 1.1 and ignore client_id). > > One Client can be shared across threads, but not Objects. I don't recall > whether Buckets can be shared. > > Cheers, > -g > > On Feb 26, 2012 2:39 PM, "Jim Adler" wrote: >> >> I'm getting the following error while using protocol buffers >> (RiakPbcTransport) with more than one thread (stack trace below): >> >> Socket returned short packet length 0 - expected 4' >> >> I'm using the 1.3.0 Python client on Mac and Ubuntu 11.04 and have seen >> the same error on both OS's. A single-thread works fine as does the >> RiakHttpTransport. >> >> Anyone have this problem? >> >> Thanks, >> Jim >> >> File >> >> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li >> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/bucket.py", line 260, >> in get >> return obj.reload(r) >> File >> >> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li >> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py", line >> 373, in reload >> Result = t.get(self, r, vtag) >> File >> >> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li >> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", >> line 195, in get >> msg_code, resp = self.send_msg(MSG_CODE_GET_REQ, req, None) >> File >> >> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li >> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", >> line 387, in send_msg >> return self.recv_msg(conn, expect) >> File >> >> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li >> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", >> line 413, in recv_msg >> self.recv_pkt(conn) >> File >> >> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li >> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py", >> line 460, in recv_pkt >> len(nmsglen)) >> RiakError: 'Socket returned short packet length 0 - expected 4' >> >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: licenses (was Re: riakkit, a python riak object mapper, has hit beta!(
Hey Andrey, I've spent well over a decade dealing with licensing issues. One thing that I've learned is that licensing is a personal choice and decision, and it is nearly impossible to alter somebody's philosophy. I find people fall into the GPL camp ("free software"), or the Apache/BSD camp ("permissive / open source"), so I always recommend GPLv3 or ALv2. (I find people choosing weak reciprocal licenses like LGPL, EPL, MPL, CDDL, etc should make up their mind and go to GPL or AL) In any case... license choice and arguments for one over the other is best left to personal email, rather than a public mailing list like riak-users. Changing minds doesn't happen on a mailing list :-) Cheers, -g On Fri, Mar 2, 2012 at 05:24, Andrey V. Martyanov wrote: > Hi Justin, > > Sorry for the late response, I didn't see your message! In fact, I know the > differences between the two. But, what is the profit of using it? Why don't > just use BSD, for example, like many open source projects do. The biggest > minus of LGPL is that many people think that it's the same as GPL and have > problems understanding it. Even your think that I don't know the difference! > :) Why? Because, it's a common practice. A lot of people really don't know > the difference. That's why I said before that (L)GPL is overcomplicated. If > you open the LGPL main page [1], first thing you will see is "Why you > shouldn't use the Lesser GPL for your next library". Is it normal? It > confuses people. There are a lot of profit in pulling back the changes > you've made - a lot of people see it, fix it, comment it, improve it and so > on. Why the license forces me to to that? It shouldn't. > > [1] http://www.gnu.org/licenses/lgpl.html > > Best regards, > Andrey Martyanov > > On Fri, Mar 2, 2012 at 8:29 AM, Justin Sheehy wrote: >> >> Hi, Andrey. >> >> On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov" >> wrote: >> >> > Sorry for GPL, it's a typo. I just don't like GPL-based licenses, >> > including LGPL. I think it's overcomplicated. >> >> You are of course free to dislike anything you wish, but it is worth >> mentioning that GPL and LGPL are very different licenses; the LGPL is >> missing infectious aspects of the GPL. >> >> There are many projects which could not use GPL code compatibly with their >> preferred license but which can safely use LGPL code. >> >> Justin >> >> > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak-python-client2, a rewrite of the official client
On Thu, Mar 15, 2012 at 11:13, Shuhao Wu wrote: >... > Erlang. In my client, a chunk of the code actually comes from the original > client as they work with a few adaptations. Yes, I noticed that you retain transports/connection.py, but why did you strip out all the comments?! That is definitely not a good path towards a long-term maintainable codebase. I put a lot of commentary into connection.py because there are explicit race conditions in there. The documentation was necessary to clarify where the races exist, what is being done about it, and why the code works properly. With your stripped-down version, all of that information is LOST. Future maintainers will not understand the issues and break it, or they will think there are issues that don't really exist. I could just see somebody saying "holy crap! in the presence of threads, this won't work" and go and introduce a Queue in there. Totally senseless and overkill. But because you stripped the code... no benefit, and probably harm. I'll take the somewhat-messy Basho client, over your version where you explicitly remove useful knowledge. I would also note that you have ALREADY introduced bugs. Even in the very simplest case. Consider the following: client = riak2.client.Client('server1') bucket = client.bucket('test') ob = bucket.get('key') data = ob.get_data() client = riak2.client.Client('server2') bucket = client.bucket('migrate') bucket.new('key', data) Pretty simple, hm? Migrate some data from one server to another. It doesn't work. Your get_http_cm() class method is a broken idea. It is effectively a global variable. Because the Client('server2') call does not provide a connection manager, it "reuses" the one created for the 'server1' connection. >... How long and how much energy are you willing to expend? Of your own time, and of others who may want to use your code? Also, consider that you've turned something like bucket.get_r() into just bucket.r. Sure, that looks good, but examine the current bucket code: if the bucket doesn't provide a value, then it defers to the client. These are not just simple attributes. What you really want is to use Python's properties. That'll make them *look* like attributes, but you can execute the fallback code. And you could do that on the *existing* client, and benefit everybody, without introducing a bunch of bugs by trying to do it yourself. You could then use riak.util.deprecated() to mark .get_r() as deprecated. In a future revision, then you could clear them out. The client that everybody uses would then benefit. >From a community/social dynamic, you're forking the project and going your own away. That doesn't help the broader community. A few people might use your client, but most will stick to Basho's client. It may feel nice and fun for you, but applying your efforts to the official client *will* help everybody. And after a round of deprecation, then you can clear out all of the stuff you find messy. But spending some time to do this right, and to *work with* the existing community will produce a much larger benefit to all. -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak-python-client2, a rewrite of the official client
On Thu, Mar 15, 2012 at 14:08, Armon Dadgar wrote: >> Message: 4 >> Date: Thu, 15 Mar 2012 11:13:03 -0400 >> From: Shuhao Wu >> To: "Andrey V. Martyanov" >> Cc: riak-users@lists.basho.com >> Subject: Re: riak-python-client2, a rewrite of the official client >> Message-ID: >> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> I'm looked into just modifying and contributing to the existing library, and >> found several issues with it. Here's my main motivation for a rewrite: >> >> ? 1. The current structure riak-python-client is somewhat messy. Everything >> depends on each other. Just look at things like RiakLink and >> RiakIndexEntry. >> They're unnecessary and overcomplicates the code. Furthermore, if you >> look >> the transports, it's very much dependent on things like RiakObject, and >> RiakObject is pretty much nothing without the transport. It's almost >> like a >> circular dependency. So instead, I redesigned the transports to operate >> independently from things like RiakObject. To do this, simply modifying >> is not feasible and it will result in an almost complete rewrite anyway >> due >> to the dependencies problems I described. >> 2. There's a lot of "bloat" in the current riak-python-client. A simple >> example would be get_ and set_, as well as things like RiakLink and >> RiakIndexEntry. To get rid of those would pretty much require a rewrite >> as >> well. >> 3. Basho currently do not have a dedicated python developer working on this. >> I don't know this for sure but I think their resources, in terms >> of clients, >> go mainly to java, ruby and javascript, though that's just my >> observations. >> >> My primary goal of having a rewrite is hopefully simplify the code base as >> well >> as improve some aspects of the python client (such as not using deprecated >> functions such as apply) and (hopefully) increase the speed of the client. >> After examining the code (which I had to do while rewriting), I don't think >> simply modifying the current codebase could fix its issues (there are more >> issues then what I've stated), and I don't think it will take as long as >> people >> think. The current code base has about 4k lines of python and 0.5k lines of >> Erlang. In my client, a chunk of the code actually comes from the original >> client as they work with a few adaptations. >> >> As far as road map goes, I'm currently just rewriting all the functionalities >> provided by the current python client, and here's a list of things that Sean >> would like to see accomplished, which I will work on once I have all the >> functionalities of the current client complete: >> >> https://gist.github.com/1959278 >> >> I hope I've answered all the questions. If there is any more >> questions/comments, >> feel free to shoot it my way. >> >> Shuhao >> > > I agree that the Python client does need some work to clean it up, > and make it more idiomatic Python, but I'm not sure that a total rewrite > is necessary. > > Most of the abstractions are good, but they just need a bit of cleanup. > I think there is always a tendency to assume things are unnecessarily messy, > but once you get to rewriting it you end up running into the pain points that > drove those design decisions. > > If there is interest, we could just formulate a roadmap for the a set of > breaking > changes to the existing client, and release it as a new version of the same > project. > > Things I would like to see: > * Cleanup the transport interface > * RiakObject needs to be simplified > * Support a RiakJSONObject subclass which has the encoding logic that is > inside RiakObject now > * Indexes need to be cleaned up > * MapReduce interfaces feel a bit dirty > * Much improved exception hierarchy > * RiakClient / RiakBucket interface needs to cleaned up Right. Evolution of a client that everybody uses. It isn't like Basho is refusing to accept pull requests from the community. If they were a total black hole, then there may be an argument for forking the project and evolving it. But that would at least be from a fork. I don't see a valid argument for start-from-scratch. Yet that isn't the case. Basho's responsiveness on the Python client might not be "awesome", but they *are* willing to engage. I met up with them last fall, have exchanged numerous emails with them, and even had a conference call about the Python client. Not to mention all the discussion and the pull requests that I provided, which got merged. They *are* working with the community. In my experience, communities exist to develop/maintain/focus on a code base. Multiple/competing codebases end up fracturing the community. That rarely leads to a long-term, sustainable, healthy outcome. Cheers, -g ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com