Python client: three issues

2011-09-04 Thread Greg Stein
Hi all,

I'm digging into Riak, and its Python client. I've discovered several
issues and would like to provide feedback and get some input, please.

1. I'd like Riak to generate keys for some of my objects, and I'm
using the HTTP API. Pretty simple: do a POST, and the Location: header
will tell me the key. However, looking at the code... it doesn't
appear that object.store() actually reads and returns that key. Am I
missing something here? Maybe there is a Proper Way to do this?

2. The HTTP transport code doesn't use httplib properly at all. When I
wrote the httplib library, my primary design point was to enable
persistent, HTTP/1.1 connections. That was the whole idea behind
HTTPConnection. The transport library opens a *new* connection for
every request. I'm not surprised that the performance suffers, since a
TCP setup handshake needs to occur on every request. ... So: are there
any plans to rebuild the HTTP support to use persistent connections?

3. The Luwak support reads the *entire* "file" into memory. My
understanding is that Luwak is for "large" files. If those files are
(say) a gigabyte, then the current code is going to impact clients in
a *very* bad way. I can see that the code wants to read the response,
then return the connection back to the pool... but that just isn't
workable. A connection needs to be held by the reader, which then
streams it to completion (or close), and only then should the
connection be returned. Now, if Luwak is defined for files less than
(say) 10 megabytes, then maybe this approach isn't wrong, but I
haven't seen any documentation about size limitations. Any thoughts on
this?

Thanks!
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python client: three issues

2011-09-05 Thread Greg Stein
On Sun, Sep 4, 2011 at 22:12, Brett Hoerner  wrote:
> On Sun, Sep 4, 2011 at 7:05 PM, Greg Stein  wrote:
>> 1. I'd like Riak to generate keys for some of my objects, and I'm
>> using the HTTP API. Pretty simple: do a POST, and the Location: header
>> will tell me the key. However, looking at the code... it doesn't
>> appear that object.store() actually reads and returns that key. Am I
>> missing something here? Maybe there is a Proper Way to do this?
>
> You're right, it looks like this functionality was added without the
> logic to pull the key back out of the response. It definitely should.

Okay. Just wanted to make sure that I wasn't missing something. I'll
craft up a patch.

>...
>> 2. The HTTP transport code doesn't use httplib properly at all. When I
>> wrote the httplib library, my primary design point was to enable
>> persistent, HTTP/1.1 connections. That was the whole idea behind
>> HTTPConnection. The transport library opens a *new* connection for
>> every request. I'm not surprised that the performance suffers, since a
>> TCP setup handshake needs to occur on every request. ... So: are there
>> any plans to rebuild the HTTP support to use persistent connections?
>
> I believe I've read that Riak itself doesn't support keep-alive.
> Someone else would have to comment for sure here. Regardless, this
> feature would be handy because many people use Riak over HTTP through
> a reverse proxy.

It looks like Riak supports persistent connections, according to this post:
  
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000744.html

I'd like to understand if there is a roadmap for the HTTP client, and if

>> 3. The Luwak support reads the *entire* "file" into memory. My
>> understanding is that Luwak is for "large" files. If those files are
>> (say) a gigabyte, then the current code is going to impact clients in
>> a *very* bad way. I can see that the code wants to read the response,
>> then return the connection back to the pool... but that just isn't
>> workable. A connection needs to be held by the reader, which then
>> streams it to completion (or close), and only then should the
>> connection be returned. Now, if Luwak is defined for files less than
>> (say) 10 megabytes, then maybe this approach isn't wrong, but I
>> haven't seen any documentation about size limitations. Any thoughts on
>> this?
>
> You're 100% right that it should spool to disk, Luwak is meant for
> very large files.

I see no reason to spool to disk. Just keep the socket open, and let
the application read as necessary. That said, thanks for validating my
understanding of Luwak files.

> The Python client has been pretty quiet in my experience (though I
> only started messing with it about a month ago). I'm sure patches for
> any/all of the above are welcome. I'm hoping the Basho guys have
> someone on it to review pull requests and such sooner than later.

Can do. I've already submitted a pull request to allow the client to
work with older versions of Python (e.g the Python 2.5 installed on my
Mac OS Leopard laptop).

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


more Python client questions

2011-09-05 Thread Greg Stein
I've got a couple more questions:

1. Why is "pycurl" used in the http transport? What benefits does it
offer? Based on reading the code, I see no additional functionality,
so I don't understand why this complexity exists. I'd like to submit a
changeset that just removes it, but would like to understand if there
is a reason/history for using pycurl.

2. What is the policy around the API on the transports? As a concrete
example, I'd like to remove the HOST and PORT parameters from
RiakHttpTransport.http_request() (and remove them from the return
values of build_rest_path()). The parameters are always self._host and
self._port. Given the semantics of the transport, these would *never*
change... even by third party code that might call into these
functions. Would removing these parameters pose problems w.r.t API
stability guarantees?


Thanks,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: more Python client questions

2011-09-05 Thread Greg Stein
Well... I'm looking at some work on transports/http.py and that host/port is
passed around *as if* there is a possibility/reason to vary the values. But
when you get down to it, .http_request can just use ._host and ._port, and
we can eliminate these spurious parameters from thought.

In particular, I'd like a transport that manages N persistent connections,
so requests will be using a "connection" parameter rather than a host/port
combination. This is why I ask about API guarantees. My assumption is that
transports/transport.py defines the only API that must be preserved, and all
other methods are open to change... but I wanted to ask for confirmation
first.

Cheers,
-g

On Mon, Sep 5, 2011 at 10:44, Jonathan Langevin
wrote:

> Re: 2. Not that I disagree with removing those values, but is there some
> additional benefit that you're expecting by removing the static values?
> (just curious)*
>
>  <http://www.loomlearning.com/>
>  Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
> www.loomlearning.com - Skype: intel352
> *
>
>
> On Mon, Sep 5, 2011 at 8:47 AM, Greg Stein  wrote:
>
>> I've got a couple more questions:
>>
>> 1. Why is "pycurl" used in the http transport? What benefits does it
>> offer? Based on reading the code, I see no additional functionality,
>> so I don't understand why this complexity exists. I'd like to submit a
>> changeset that just removes it, but would like to understand if there
>> is a reason/history for using pycurl.
>>
>> 2. What is the policy around the API on the transports? As a concrete
>> example, I'd like to remove the HOST and PORT parameters from
>> RiakHttpTransport.http_request() (and remove them from the return
>> values of build_rest_path()). The parameters are always self._host and
>> self._port. Given the semantics of the transport, these would *never*
>> change... even by third party code that might call into these
>> functions. Would removing these parameters pose problems w.r.t API
>> stability guarantees?
>>
>>
>> Thanks,
>> -g
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python client: three issues

2011-09-05 Thread Greg Stein
On Mon, Sep 5, 2011 at 16:14, Mark Phillips  wrote:
>...
> I just wanted to jump in here and reiterate that patches, issues, and
> opinions about how the python code works/should work are all
> encouraged. Mathias Meyer and Jared Morrow (among others) have been
> handling issues, pull requests, releases, etc., for the the last few
> months and should be able to do so moving forward. Between them, other
> members of the Basho dev team, and a quickly-growing group of python
> users, we should be able to handle pull requests, issues, and other
> contributions at a pretty good clip.

Thanks, Mark. I figured as much, but wanted to know what *kinds* of
changes follow the existing roadmap (if any). If I can submit patches
that improve things for my situation, *and* is deemed applicable to
the standard client... then we all win :-)

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


incorrect link, or missing download

2011-09-05 Thread Greg Stein
Hello,

I went to download Riak for Mac OS, and started to follow the instructions on:
  http://wiki.basho.com/Installing-on-Mac-OS-X.html

It says to download:
  http://downloads.basho.com/riak/riak-0.14/riak-0.14.2-osx-i386.tar.gz

However, that resource does not exist. There is only 0.14.0:
  http://downloads.basho.com/riak/riak-0.14/riak-0.14.0-osx-i386.tar.gz


Something needs to be fixed :-)

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: incorrect link, or missing download

2011-09-05 Thread Greg Stein
Thanks, Jared. Not a problem. Since this is simple testing, I can work
with 0.14.0. My production server (EC2/Ubuntu) is built from 0.14.2
source, so I have no serious concern here.

That said, I'd advise fixing the link on that web page, to avoid
future questions.

Cheers,
-g

On Mon, Sep 5, 2011 at 22:37, Jared Morrow  wrote:
> Riak 0.14.2 was not built for OSX unfortunately.   Before now, the OSX
> builds were simply done by hand by doing a 'make rel' and packaging the
> results of the 'rel/riak' directory.   This manual step was not done for
> 0.14.1 or 0.14.2.   OSX packaging has been added to riak mainline now
> though, so all future releases will be built for OSX.   For now, you can
> produce your own binaries in the way I mentioned above.
> Sorry for the inconvenience and confusion.
> -Jared
>
> On Mon, Sep 5, 2011 at 8:27 PM, Greg Stein  wrote:
>>
>> Hello,
>>
>> I went to download Riak for Mac OS, and started to follow the instructions
>> on:
>>  http://wiki.basho.com/Installing-on-Mac-OS-X.html
>>
>> It says to download:
>>  http://downloads.basho.com/riak/riak-0.14/riak-0.14.2-osx-i386.tar.gz
>>
>> However, that resource does not exist. There is only 0.14.0:
>>  http://downloads.basho.com/riak/riak-0.14/riak-0.14.0-osx-i386.tar.gz
>>
>>
>> Something needs to be fixed :-)
>>
>> Cheers,
>> -g
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Review request: revamped http transport

2011-09-07 Thread Greg Stein
Hi all,

I've got a rough first cut of a modified RiakHttpTransport class. You
can review it here:
  https://github.com/gstein/riak-python-client/commits/newhttp

(note that it includes changes from my 'relax-deps' and 'simplify'
branches; just look at the most recent commit(s))


This code has been tested in very simple form. I haven't done an
extensive series of testing. No multi-thread, multiple hosts, etc. I
*have* verified that the HTTP connection is persistent, as expected,
so it should be much faster than normal. But I don't have any
performance tests to run. ... my hope is that some more experience
Riak python developers can do some testing.

This is just a rough draft for initial review on the approach. It
needs comments and documentation, if people like the direction.

Some comments on the ConnectionManager:

* when creating a ConnectionManager, you can specify multiple hosts
(eg. all the riak servers)
* in a single-threaded environment, only one connection will be opened
right now. my next change will be to pre-open one connection to each
host.
* when new connections are needed (ie. in a multi-threaded
environment), it round-robins across the set of servers
* higher-level logic can remove hosts if they go down (I stlll need to
remove existing conns)
* similarly, hosts can be added as they are discovered or added to the ring


With some more work to pass along multiple hosts to the constructor of
RiakHttpTransport, then the full capability is reached and
RiakHttpReuseTransport and RiakHttpPoolTransport can be removed.

Similar changes can be made in pbc.py to use a ConnectionManager for
connections to the server(s), and RiakPbcCachedTransport can be
removed.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Review request: revamped http transport

2011-09-07 Thread Greg Stein
On Wed, Sep 7, 2011 at 20:42, Brett Hoerner  wrote:
> Greg, can you make a Github pull request from your branch?
>
> It'd be easier to review the commits together (well, git log is easy)
> but more importantly comment in-line.

Done: https://github.com/basho/riak-python-client/pull/56


Thanks!
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Python transports

2011-09-08 Thread Greg Stein
Hey, all...

After a couple comments on my recent work, and some archaeology on the
Python/Riak work over the past month... I've realized that I might
have a very different view of RiakTransport compared to what I'm
seeing in the current work. I figured it best to bring that to the
forefront and discuss:

In my view, RiakTransport is used by RiakClient (and others) to "talk
to the Riak server".

Some of the current work, and some proposed pull requests, seem to
take the position that a RiakTransport is "one connection to a server,
and the client should manage those".

Needless to say, I'm in favor of my own position :-) ... I think it is
best to transfer *all* responsibility for talking to the server(s) to
the transport layer. I really don't think the client/bucket/object
layers should know anything about talking to the server(s). I'd like
to see the transport layer be told about all server(s) available, and
then it Just Works.

I'm still a newbie with all this code, and need to keep plugging away
at the higher levels of functionality and compensation for problems.
I'd like to build up some code that contacts *one* given server, asks
for all of the ring servers, and then opens connections to those
servers. And then, it should (automatically) maintain client
connections based on what is happening with the Riak cluster. The
current (proposed) code manages connections to N servers, but has no
automatic add/remove based on changes in the cluster status. I think
this happens at a layer *just* above the actual transport. ie.
something tracks the changes in the ring status and its servers, and
transmits those changes into the transport layer, which alters its
communication with that cluster (regardless of whether that
communication is via HTTP or protobuf).


Okee doke. That's the end of my brain dump and future thoughts on the
transport and communication layer. I'd really like some feedback,
review, and thoughts.

Thanks!
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 1.0 pre-release series

2011-09-08 Thread Greg Stein
On Wed, Sep 7, 2011 at 19:24, David Smith  wrote:
>...
> The first stage is "pre-release" (aka PR) builds that represent the
> feature-complete Riak 1.0, with minor cleanup and tweaks still
> ongoing. You WILL find some rough edges with packaging and the
> features -- we're still working through these and will do pre-releases
> as often as necessary to get these issues resolved. We'll also be

Understood.

I just checked out the link (provided below) and did not find a
*source* distribution. Could you work that into the distribution?

In my particular situation, I'm working on Mac OS 10.5 (Leopard) and
Ubuntu 9.x. I can manage building if given a source distro, but would
certainly prefer pre-built packages (32-bit on both platforms).

>...
> We would love to get your feedback on these builds, good or bad. We're
> trying this new process to do a better job of incorporating community
> feedback BEFORE the release. :)

hehe... helpful :-) ... I'm not at production load, so I won't be able
to find server bugs for you. But I'll provide packaging feedback where
I can (as above). Most of my feedback will be on the client side, and
I'm already working to engage Basho devs on the riak-python-client
project.

>...
> creating duplicate issues. Also note that the docs on wiki.basho.com
> may still reflect 0.14.2; we'll be addressing those over next few
> weeks.

I had a one-line change for one of those pages last week, and sent
that to this mailing list (and it was promptly fixed; yay!). What is
the recommended "best" process for feedback? Should we all fork the
website, make changes, and send pull requests? Send mail here? File
issues?

>...

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 1.0 pre-release series

2011-09-08 Thread Greg Stein
On Thu, Sep 8, 2011 at 07:47, David Smith  wrote:
> On Thu, Sep 8, 2011 at 5:41 AM, Greg Stein  wrote:
>
>> I just checked out the link (provided below) and did not find a
>> *source* distribution. Could you work that into the distribution?
>
> Yes, there should be one of those -- will ping Jared.

Thanks!

>> In my particular situation, I'm working on Mac OS 10.5 (Leopard) and
>> Ubuntu 9.x. I can manage building if given a source distro, but would
>> certainly prefer pre-built packages (32-bit on both platforms).
>
> There is a i386 .deb which may work on ubuntu9; would be curious to
> know if it does. :)

I'm away [on vacation] from my ubuntu systems right now, but will try
it late next week and provide feedback.

>...
>> I had a one-line change for one of those pages last week, and sent
>> that to this mailing list (and it was promptly fixed; yay!). What is
>> the recommended "best" process for feedback? Should we all fork the
>> website, make changes, and send pull requests? Send mail here? File
>
> Pull requests are easiest from a procedural standpoint.

Will do.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python transports

2011-09-08 Thread Greg Stein
Yup, agreed. I can certainly see use-cases where a client may have
particular knowledge about data/server locality, and wants to open
connections to *just* those servers. The transports that I am
advocating open connections to servers described by one layer higher.
That layer could say "all Riak servers in the cluster", or it could
say "these three servers holding copies of  data".

An auto-discovery layer (to add and remove) in-between the client and
the transport would be handy. I can envision that, and my proposed
code makes it possible, but (at my current stage) I don't have a use
for it. I'd hope that others can build on my connection pooling work
to see that through.

Cheers,
-g

On Thu, Sep 8, 2011 at 07:49, Phil Stanhope  wrote:
> I like the idea of RiakTransport as you describe it. It opens the door to
> other potential underlying transports and isolating a client from knowledge
> of those (websockets, tcp, zeromq messaging come to mind). I'm not
> suggesting that RIAK will ever support these transports by this comment,
> however.
> I also agree with the need to have RiakRingAwareTransport as an additional
> layer that *might* be used by a client. There may be valid reasons why a
> client might want to force particular traffic onto a subset of the ring
> (e.g. M/R config, Search Config, forcing read/write traffic onto different
> nodes, etc). Again, I'm not suggesting that using a subset of the ring for
> particular operations is the best practice. But it may be necessary to do so
> in order to validate and do certain types of testing to prove or disprove
> certain access patterns.
> -phil
>
> On Thu, Sep 8, 2011 at 7:31 AM, Greg Stein  wrote:
>>
>> Hey, all...
>>
>> After a couple comments on my recent work, and some archaeology on the
>> Python/Riak work over the past month... I've realized that I might
>> have a very different view of RiakTransport compared to what I'm
>> seeing in the current work. I figured it best to bring that to the
>> forefront and discuss:
>>
>> In my view, RiakTransport is used by RiakClient (and others) to "talk
>> to the Riak server".
>>
>> Some of the current work, and some proposed pull requests, seem to
>> take the position that a RiakTransport is "one connection to a server,
>> and the client should manage those".
>>
>> Needless to say, I'm in favor of my own position :-) ... I think it is
>> best to transfer *all* responsibility for talking to the server(s) to
>> the transport layer. I really don't think the client/bucket/object
>> layers should know anything about talking to the server(s). I'd like
>> to see the transport layer be told about all server(s) available, and
>> then it Just Works.
>>
>> I'm still a newbie with all this code, and need to keep plugging away
>> at the higher levels of functionality and compensation for problems.
>> I'd like to build up some code that contacts *one* given server, asks
>> for all of the ring servers, and then opens connections to those
>> servers. And then, it should (automatically) maintain client
>> connections based on what is happening with the Riak cluster. The
>> current (proposed) code manages connections to N servers, but has no
>> automatic add/remove based on changes in the cluster status. I think
>> this happens at a layer *just* above the actual transport. ie.
>> something tracks the changes in the ring status and its servers, and
>> transmits those changes into the transport layer, which alters its
>> communication with that cluster (regardless of whether that
>> communication is via HTTP or protobuf).
>>
>>
>> Okee doke. That's the end of my brain dump and future thoughts on the
>> transport and communication layer. I'd really like some feedback,
>> review, and thoughts.
>>
>> Thanks!
>> -g
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 1.0 pre-release series

2011-09-08 Thread Greg Stein
No worries. That is what "pre-release" is all about :-) ... I'm
focusing on the Python support right now, and 0.14.x works just fine
for that, but I'll look into the 1.0-pre in the next week or two.

Thanks!
-g

On Thu, Sep 8, 2011 at 13:02, Jared Morrow  wrote:
> Sorry about the lack of a source package, that was my oversight.   I just
> uploaded one to downloads.basho.com.
> Thanks for the feedback,
> -Jared
> On Thu, Sep 8, 2011 at 5:52 AM, Greg Stein  wrote:
>>
>> On Thu, Sep 8, 2011 at 07:47, David Smith  wrote:
>> > On Thu, Sep 8, 2011 at 5:41 AM, Greg Stein  wrote:
>> >
>> >> I just checked out the link (provided below) and did not find a
>> >> *source* distribution. Could you work that into the distribution?
>> >
>> > Yes, there should be one of those -- will ping Jared.
>>
>> Thanks!
>>
>> >> In my particular situation, I'm working on Mac OS 10.5 (Leopard) and
>> >> Ubuntu 9.x. I can manage building if given a source distro, but would
>> >> certainly prefer pre-built packages (32-bit on both platforms).
>> >
>> > There is a i386 .deb which may work on ubuntu9; would be curious to
>> > know if it does. :)
>>
>> I'm away [on vacation] from my ubuntu systems right now, but will try
>> it late next week and provide feedback.
>>
>> >...
>> >> I had a one-line change for one of those pages last week, and sent
>> >> that to this mailing list (and it was promptly fixed; yay!). What is
>> >> the recommended "best" process for feedback? Should we all fork the
>> >> website, make changes, and send pull requests? Send mail here? File
>> >
>> > Pull requests are easiest from a procedural standpoint.
>>
>> Will do.
>>
>> Cheers,
>> -g
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python transports

2011-09-08 Thread Greg Stein
On my 'newhttp' branch, the ConnectionManager class handles all
connections to the server(s) for the transports (many connections to
many servers). The transport can worry about what goes onto the wire,
and the CM worries about the underlying connections.

Today, there are multiple transports, each attempting to manage
connections in various ways, and frankly... doing a poor job of it.
The HTTP transport opens a new connection for every request. To borrow
a phrase from Scott Conant on 'Chopped', that "drives me to anger" :-)
 So yes, my changes are a big improvement in that we now have
*persistent* connections to the server, with the commensurate
performance improvement.

The basic problem is that "RiakTransport" conflates wire formatting
*and* connection management. I'm trying to separate those two, in
order to improve the connections. Once they are separated, then
connection policies (as you describe) will be much easier to
implement.

Your timeout work is great, and I completely agree with you: timeouts
are a "must have" in a production environment. I don't see any problem
adding timeouts; my comment on your github commit was more about 2.5
compatibility, than denial. And I consider it my problem to figure out
how to make it work in a 2.5 environment.

Cheers,
-g

On Thu, Sep 8, 2011 at 13:47, Brett Hoerner  wrote:
> I own the pull request that adds (very basic) pooling logic to the
> Riak client. I left Transports alone (each is a single connection) and
> decided to have pools be another class you pick just like transport.
> This allows you instantly use any pooling logic (never remove down
> servers, delete down servers, round robin, whatever) with *any*
> transport class: Http, Http keep-alive, PBC, CachedPBC, etc.
>
> I've added pooling and timeouts and use both in production because I'm
> honestly not sure how you use Riak in a highly available way without
> them... so we need to make sure they both work well/easily under this
> new scheme.
>
> Do your changes make anything new possible or are they just about cleanup?
>
>
>
> On Thu, Sep 8, 2011 at 5:17 AM, Greg Stein  wrote:
>> Yup, agreed. I can certainly see use-cases where a client may have
>> particular knowledge about data/server locality, and wants to open
>> connections to *just* those servers. The transports that I am
>> advocating open connections to servers described by one layer higher.
>> That layer could say "all Riak servers in the cluster", or it could
>> say "these three servers holding copies of  data".
>>
>> An auto-discovery layer (to add and remove) in-between the client and
>> the transport would be handy. I can envision that, and my proposed
>> code makes it possible, but (at my current stage) I don't have a use
>> for it. I'd hope that others can build on my connection pooling work
>> to see that through.
>>
>> Cheers,
>> -g
>>
>> On Thu, Sep 8, 2011 at 07:49, Phil Stanhope  wrote:
>>> I like the idea of RiakTransport as you describe it. It opens the door to
>>> other potential underlying transports and isolating a client from knowledge
>>> of those (websockets, tcp, zeromq messaging come to mind). I'm not
>>> suggesting that RIAK will ever support these transports by this comment,
>>> however.
>>> I also agree with the need to have RiakRingAwareTransport as an additional
>>> layer that *might* be used by a client. There may be valid reasons why a
>>> client might want to force particular traffic onto a subset of the ring
>>> (e.g. M/R config, Search Config, forcing read/write traffic onto different
>>> nodes, etc). Again, I'm not suggesting that using a subset of the ring for
>>> particular operations is the best practice. But it may be necessary to do so
>>> in order to validate and do certain types of testing to prove or disprove
>>> certain access patterns.
>>> -phil
>>>
>>> On Thu, Sep 8, 2011 at 7:31 AM, Greg Stein  wrote:
>>>>
>>>> Hey, all...
>>>>
>>>> After a couple comments on my recent work, and some archaeology on the
>>>> Python/Riak work over the past month... I've realized that I might
>>>> have a very different view of RiakTransport compared to what I'm
>>>> seeing in the current work. I figured it best to bring that to the
>>>> forefront and discuss:
>>>>
>>>> In my view, RiakTransport is used by RiakClient (and others) to "talk
>>>> to the Riak server".
>>>>
>>>> Some of the current work, and some proposed pull requests, seem to
&g

Re: Python transports

2011-09-10 Thread Greg Stein
Hi all,

I've been putting more thought into this problem, particularly in
contrast to the "client manages N transport [connections]." I believe
that the latter is not very workable given the variance in underlying
transports. Not enough information is available to the client to
manage the connections properly without putting transport-specific
information into the client (eg. the difference between EPIPE and
ECONNREFUSED). I think it would be wrong to put connection-related
code into the client.

Here are the three layers that I see, and the direction my branch takes:

1. client: responsible for a high-level API for applications. It maps
this API into the underlying transport primitives (as defined by
riak.transports.transport.RiakTransport).

2. transport: maps the primitives into the appropriately formatted
wire request(s), and handles the response(s).

3. connection manager (CM): handles multiple connections to multiple
hosts for use by the transport.


The CM creates connection objects that understand the protocol (e.g
HTTPConnecction), which is also an object that the transport
understands how to use. The connection objects can signal errors to
the CM for removal when (say) the server goes down or is otherwise
unavailable.

The client does need to be aware of host/port pairs, and pass those
into the transport for provision to the CM. (or possibly the client
creates the appropriate CM, and passes that to the transport). The
different types of CMs (connection type, connection policies/params
etc) imply that the client may be the one to create this, with the
right params. Otherwise, the transport would get a list of host/port
pairs and CM options, and the transport would create the CM with
connection objects appropriate to the transport.

To follow up with my original concern, and to show a concrete example:

In connection.py, we would create a subclass of HTTPConnection that
overrides the .connect() method. If the superclass raises
ECONNRefused, then the subclass would remove the host from the CM.

The subclass does not have to manage EPIPE since it knows that
HTTPConnection can already manage that itself (except for certain
types of variant-sized requests, such as needs to be done for Luwak
requests).

There is a Socket class in connection.py for managing bare sockets for
the protobuf connections. That needs to create a .send() method that
manages EPIPE in some way. It would also have logic for ECONNREFUSED
similar to our HTTPConnection subclass: remove the host from the
available set.

Long-running client applications need to monitor the state of the
ring, and propagate join/leave changes into the available host/port
pairs in the CM. If one server returns ECONNREFUSED and is removed
from the available set, but it is determined that is *transient*, then
the client would need to recognize that and put it back into the set.
I do not have an answer for how the system can know the problem was
transient, or how it recognizes the server is back. Possibly, the host
moves to an "offline" list, and the CM periodically pings it to see if
it is alive (again). If the client removes it (due to a detected ring
change), then it removes it from the offline list. Possibly after time
period T, it is removed from the offline list. I believe these are all
workable details.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Question about Content-Type

2011-09-10 Thread Greg Stein
Hey all,

I just ran into a questionable situation. On the following doc page:
  http://wiki.basho.com/HTTP-Store-Object.html

It says:

"
Important headers:
Content-Type must be set for the stored object. Set what you expect to
receive back when next requesting it.
"

So I ran the following test via telnet against a 0.14.0 server. Note
the Content-Length is 5 given invisible characters: "def\r\n".


$ telnet 127.0.0.1 8098
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
POST /riak/test?returnbody=true HTTP/1.0
Content-Length: 5
Content-Type: application/octet-stream

def
HTTP/1.0 201 Created
X-Riak-Vclock: a85hYGBgzGDKBVIsbI0n/DOYEhnzWBmKDz84zpcFAA==
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.7.3 (participate in the frantic)
Location: /riak/test/F9uE8Heo6v0wtXY2rMKOktQbPGH
Link: ; rel="up"
Date: Sun, 11 Sep 2011 05:30:27 GMT
Content-Type: application/json
Content-Length: 5

def
Connection closed by foreign host.


My question is why the response says application/json when I gave the
server application/octet-stream. Did I miss some RTFM documentation?

Thanks,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[RFC] Python client: move to properties

2011-09-14 Thread Greg Stein
Hi all,

There are some non-Pythonic patterns in riak-python-client that should
be pretty easy to switch. Things like client.get_r() and
client.set_r() are kinda silly. Python long ago moved past the
getter/setter paradigm, with the notion of directly exposing instance
attributes. As Guido has said numerous times, "we're all adults here",
so we don't need to wrap simple attribute access/change in methods
which do exactly that anyways. Just allow apps to directly change it,
since the developers doing that *are* adults and know what they're
doing.

For something like bucket.get_r()... first: it should not have a
damned argument that is returned if you pass in a value. That is
nonsense. Don't call the function if you're going to pass an argument!
The remaining logic looks at a local value, or defers to the client
default value. We can make "bucket.r" a property, and create a getter
that does the right thing. Applications can then use "bucket.r" and
they will get the right logic, rather than the messier
"bucket.get_r()".

There are similar changes throughout (eg. get_transport).

But... this goes back to a question that I've raised earlier: what
kinds of API guarantees are being made on the interface? Can we simply
get rid of the .get_r() method? If we choose to do so, is there a
deprecation policy? With a policy in place, then it would be easier
for me (and others) to provide patches, knowing that they conform to
compatibility requirements. As it stands, I'd be happy to code a
patch, but am wary that my effort would be rejected per some
(unstated) policy.

I don't know how much of a compatibility policy lies with Basho or the
community. Dunno how to help there.

And back to the start: can we get the code simplified, and move
towards properties?

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [RFC] Python client: move to properties

2011-09-15 Thread Greg Stein
On Wed, Sep 14, 2011 at 10:40, Russell Brown  wrote:
>
> On 14 Sep 2011, at 12:37, Mathias Meyer wrote:
>
>> The short answer: yes, we can and we should. I had that on my radar for a 
>> while too, because it felt un-Pythonic.
>>
>> As for deprecation, there's no specific rule for the Python client yet. I'm 
>> happy to accept a patch for it for e.g. a version of the client 1.4.0 with 
>> an announcement that support for these getters/setters will be removed in 
>> said version. I'm not a fan of removing things in patch versions myself, but 
>> that's certainly up for discussion.
>
> I'd really like to see the python client be more pythonic. That said, I'd 
> really like current production users to keep running their code, and take 
> advantage of new features without recoding their use of existing features. 
> For the Java client I've deprecated a lot of the API, but it will stay on for 
> a couple of releases. No-one likes living with warts on software but 
> deprecating is a good way to show that the  un-idiomatic stuff is on the way 
> out, without pulling the rug from under users.
>
>> The Python client is quite old and has come a long way, so I'm all for 
>> getting rid of the cruft that came with it.
>>
>
> Me too, but after it has been deprecated for at least one release? That isn't 
> an official policy, it is just the one I have set for the Java client. 
> Deprecate for one release, remove in the next. Does that sound reasonable for 
> the Python client, too?

For at least three developers :-P ... there seems to be rough consensus here:

* Documented APIs are first marked as "deprecated" in
documentation/docstrings, and will produce appropriate warnings from
the code (but remain available and with their old semantics)
* One release is made with the deprecated; the next release afterwards
may remove all deprecated APIs

I've switched to this policy for my 'newhttp' branch. You'll note that
I have added an 'api' classvar to the transports. The client code
alters its invocations based on that API.

Also note that the transport APIs are *not* documented, so I have
taken the position of "okay to change". Third parties who *write*
their own transport class and pass to the "documented" transport_class
argument of RiakClient/RiakSearch may need to change their code. If
their transport inherits from RiakHttpTransport or RiakPbcTransport,
then it will "suddenly" jump from api==1 to api==2. If they wrote
their transport from scratch (and/or subclassing RiakTransport), then
it will remain as api==1 and function as before. I don't have a better
solution (without applying further thought), but am going to say "good
enough" with the position that RiakTransport and its subclasses are
not documented and (thus) subject to change.

Applications that simply pass a builtin transport will require no change.

For all those getters/setters, we can mark them as deprecated (using
[1]) and then simply add the new properties.

Cheers,
-g

[1] 
https://github.com/gstein/riak-python-client/commit/85e9d5460787d2ad6b3e07b106f8cd71e05d

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Anyone going to Surge?

2011-09-25 Thread Greg Stein
I'll be there Wednesday/Thursday. Not attending the conference itself,
but "around" (hanging in the hallway or the bar).

On Sun, Sep 25, 2011 at 20:00, Mark Phillips  wrote:
> I mentioned this in Friday's Recap but I figured it was worth asking
> again: anyone going to Surge [1] next week? I'll be there along with
> Sean Cribbs, Ian Plosker, Ryan Zezeski, and a few other members of the
> Basho team. If you're attending and haven't already notified me, let
> me know (or I'll have trouble buying you a beverage or two).
>
> Mark
>
> 1 - http://omniti.com/surge/2011
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Running the python client's test suite

2011-10-03 Thread Greg Stein
Your search/luwak tests are failing, presumably because those options
are not enabled in your Riak installation. You can disable them in the
test suite by doing:

$ SKIP_SEARCH=1 SKIP_LUWAK=1 python setup.py test

You also seem to be running into a problem with leftover keys in one
of the test buckets. That problem is fixed by pull request #67, which
has not been merged :-( ... if you grab that fix, then a couple of
your errors will go away.

I see some additional failures based on extra siblings... I don't know
what is happening there.

Cheers,
-g

On Mon, Oct 3, 2011 at 13:22, Honza Král  wrote:
> Hi everybody,
>
> I cannot get the test suite in the python client to run, I have tried
> on Arch Linux on my notebook and then on an Ubunty Natty system on EC2
> with riak 1.0.0:
>
> wget http://downloads.basho.com/riak/riak-1.0.0/riak_1.0.0-1_amd64.deb
> sudo dpkg -i riak_1.0.0-1_amd64.deb
> sudo /etc/init.d/riak start
>
> sudo apt-get install python-protobuf
> git clone git://github.com/basho/riak-python-client.git
>
> virtualenv riak
> . riak/bin/activate
> cd riak-python-client
> python setup.py install
> python setup.py test
>
> and I get:
> FAILED (failures=6, errors=19)
>
> The full test output can be found at:
> http://www.honzakral.com/riak_python_test.out
>
> If anybody can point out what I did wrong or how to make the tests
> work I would be most grateful.
>
> Thanks!
>
> Honza Král
> E-Mail: honza.k...@gmail.com
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
I'm with Kyle on this one. Even better, my 'newhttp' branch on Github
enables this kind of multiple-connection and automatic fail-over.

That branch does have a basic sketch for automatic addition/removal of
Riak nodes as you manipulate your cluster. I'll need it one day, but
not "now", so I haven't finished it yet (the monitor.py background
thread).

Regarding security: it is the same for option A and B and C (you're
just shifting stuff around, but it is pretty much all the same). Put
your webservers in one security group, and the Riak nodes in another.
Open the Riak ports *only* to the webserver security group and to each
other.

Avoiding two services on one machine (e.g web + riak) is also much
easier to manage/maintain. Just have web machines and riak machines.

Cheers,
-g

On Tue, Oct 4, 2011 at 17:09, Aphyr  wrote:
> Option C: Deploy your web servers with a list of hosts to connect to. Have
> the clients fail over when a riak node goes down. Lower latency without
> sacrificing availability. If you're using protobufs, this may not be as big
> of an issue.
>
> --Kyle
>
> On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wrote:
>>
>> I am contemplating two different architectures for deploying Riak nodes
>> and web servers.
>>
>> Option A:  Riak nodes are in their own cluster of dedicated machines
>> behind a load balancer.  Web servers talk to the Riak nodes via the load
>> balancer. (See diagram http://eamonn.org/i/riak-arch-A.png )
>>
>> Option B: Each web server machine also has a Riak node, and there are also
>> some Riak-only machines.  Each web server only talks to its own localhost
>> Riak node. (See diagram http://eamonn.org/i/riak-arch-B.png )
>>
>>
>> All machines will deployed as elastic cloud instances.  I will want to
>> spin up and spin down instances, particularly the web servers, as demand
>> varies.  Both load balancers are non-sticky.  Web servers are currently
>> talking to Riak via HTTP (though might change that to protocol buffers in
>> the future).  Currently Riak is configured with the default options.
>>
>> Here is my thinking of the comparative advantages:
>>
>> Option A:
>>
>>  - Better for security, because can lock down the Riak load balancer to
>> only open a single port and only for connections from the web servers.
>>  - Less churn for Riak of nodes entering and leaving the Riak cluster (as
>> web servers spin up and down)
>>  - More flexibility in scaling storage and web tiers independently of each
>> other
>>
>> Option B:
>>
>>  - Faster localhost connection from web server to Riak
>>
>> I think availability is similar for the two options.
>>
>> The web server response time is the primary metric I want to optimize.
>>  Most web server requests will cause several requests to Riak.
>>
>> What other factors should I take into account?  What measurements could I
>> make to help me decide between the architectures?  Are there other
>> architectures I should consider? Should I add memcached? Does anyone have
>> any experiences they could share in deploying such systems?
>>
>> Thanks.
>> __
>> Eamonn
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
On Oct 4, 2011 7:04 PM, "Mike Oxford"  wrote:
>
> On Tue, Oct 4, 2011 at 3:59 PM, Greg Stein  wrote:
> > Regarding security: it is the same for option A and B and C (you're
> > just shifting stuff around, but it is pretty much all the same). Put
> > your webservers in one security group, and the Riak nodes in another.
> > Open the Riak ports *only* to the webserver security group and to each
> > other.
>
> Not quite the same.  If you get rooted on a webhead you don't want your
> data there (esp with an erl shell.)

Ah. Yeah. Quite true.

> > Avoiding two services on one machine (e.g web + riak) is also much
> > easier to manage/maintain. Just have web machines and riak machines.
>
> I disagree; it's more work to maintain two machines correctly.  However
> the extra work is worth it for security/scalability.

Note that his original description had two machine types: web+riak, and
riak-only. My point was about that two service box being a pain. Given that
you have two types, then break up the boxes into the two -only formats and
increase your security.

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
On Oct 4, 2011 7:01 PM, "Mike Oxford"  wrote:
>
> You'll want to run protobufs if you're looking to optimize your
> response time; HTTP sockets (even to localhost) will require much more
> overhead and time.

Hmm? The protocol seems moot, compared to inter-node comms when r > 1.
Protocol parsing just doesn't seem like much of a factor. On my laptop, I
was seeing a 3ms response time against one node. I can't imagine that
parsing was more than a few percent, no matter the protocol.

(and no, I have no specific numbers to confirm/deny my thought experiment
here)

> Even better would be unix sockets if they're available, and you can
> bypass the whole TCP stack.

What? Is that even an option for Riak? I haven't seen anything about that.

>...

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
I don't see that multiplexing or TCP setup is specific to HTTP.

The only difference between protobuf and HTTP is what goes on the wire. Not
how the wire is managed.

(and with that said, the Python client managed the wire in the most horrible
ways imaginable for the HTTP Client; I've since fixed that on my branch)
On Oct 4, 2011 11:37 PM, "Aphyr"  wrote:
> Internode times in our datacenter at SL are indistinguishible from
> loopback; TCP/IP processing dominates. HTTP, on the other hand, involves
> either in-depth connection management/multiplexing, or TCP/IP
> setup/teardown latency at either end of a request. In read-write heavy
> apps, protobufs outperforms HTTP in throughput by 2x or more, against
> objects of 500-4000 bytes. That's with the ruby client; ymmv.
>
> --Kyle
>
> On 10/04/2011 07:18 PM, Greg Stein wrote:
>>
>> On Oct 4, 2011 7:01 PM, "Mike Oxford" > <mailto:moxf...@gmail.com>> wrote:
>> >
>> > You'll want to run protobufs if you're looking to optimize your
>> > response time; HTTP sockets (even to localhost) will require much more
>> > overhead and time.
>>
>> Hmm? The protocol seems moot, compared to inter-node comms when r > 1
>> Protocol parsing just doesn't seem like much of a factor. On my laptop,
>> I was seeing a 3ms response time against one node. I can't imagine that
>> parsing was more than a few percent, no matter the protocol.
>>
>> (and no, I have no specific numbers to confirm/deny my thought
>> experiment here)
>>
>> > Even better would be unix sockets if they're available, and you can
>> > bypass the whole TCP stack.
>>
>> What? Is that even an option for Riak? I haven't seen anything about
that.
>>
>> >...
>>
>> Cheers,
>> -g
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Hey, Python client users

2011-10-15 Thread Greg Stein
Hey all,

The Basho folks have been slow to integrate changes, given their busy
schedule with the 1.0 release. I've had a couple branches hanging out
for a while to deal with HTTP problems and to deal with Issue #53.
They've been separate for better review/merging by Basho, but it
finally created too many problems for my own work to keep them
separate. I've just now merged them into a single branch so that I can
get my own work done.

This doesn't have timeout capabilities (Brett has this on a branch,
but without future direction from Basho and how that timeout work
interacts with future http work, it is unclear on where to go with
developing timeouts). But if you can deal without timeouts right now,
then I would state this branch is the best option for Python access to
Riak. Since this is also divorced from my pull requests for issue 53
and fixing http, I might go ahead and add timeouts soon. Even if the
timeout work isn't Right, it would still be useful.

Anyhow, enough description. If you're using Python, then I'd highly recommend:
  https://github.com/gstein/riak-python-client/tree/proper

I hope that helps, and let me know if you run into any problems with it.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Hey, Python client users

2011-10-18 Thread Greg Stein
On Mon, Oct 17, 2011 at 15:13, Russell Brown  wrote:
>
> On 16 Oct 2011, at 06:18, Greg Stein wrote:
>
>> Hey all,
>>
>> The Basho folks have been slow to integrate changes, given their busy
>> schedule with the 1.0 release.
>
> This is true. Sorry. But we are now putting some time into the client 
> libraries in general and the python client in particular.

No need to apologize! I'd much rather you work on the 1.0 release than
the libraries! I can fix the libraries to my needs; I have *no* idea
how to fix Riak itself. Go wild :-)

>> I've had a couple branches hanging out
>> for a while to deal with HTTP problems and to deal with Issue #53.
>> They've been separate for better review/merging by Basho, but it
>> finally created too many problems for my own work to keep them
>> separate. I've just now merged them into a single branch so that I can
>> get my own work done.
>
> Yeah, I started merging your P/Rs in a while back, but then the last 2 
> conflict with a couple of P/Rs from Brett Hoerner. I guess at the point  we 
> stalled as there is some decision to be made about the best way to handle 
> pooling.

RIght. Timeouts are needed; Brett is spot-on with that. They're
definitely needed for a production system. The open question is how to
work them into the client, and that depends upon the development
direction.

> I'm reading over both Brett and your changes today.
>
> There is quite some work to merge the commits you've been doing into the 
> official repo, but I'd like to get that done rather than have a competing 
> fork.

To be clear: I'm not attempting to create a long-term competing fork.
I've been using my 'newhttp' branch for proper connections to Riak,
but then I smashed into the lack of .store() returning an identifier.
In the past, I kinda switched branches based on whether I needed good
http work, or .store to work... but I got tired of that. So I created
a new branch and merged it all. And I like to share my work, I like to
help people, and I like feedback. If my fully-merged branch can help
people? Great. Win.

But no... I expect that when you guys (Basho) have cycles, that we'll
figure out the right approach for connection management and get the
committed, and my branches will disappear.

>...
>> Anyhow, enough description. If you're using Python, then I'd highly 
>> recommend:
>>  https://github.com/gstein/riak-python-client/tree/proper
>>
>> I hope that helps, and let me know if you run into any problems with it.
>
> Would you recommend taking this fork and merging it with the 
> basho/riak-python-client?

I'm building a business based on that branch. So yeah: I have complete
faith in it. I fully believe it is the correct direction to go for the
Python client.

But: I also sent the P/R to create a discussion on the approach (since
Brett has a different approach, I felt discussion was warranted). I
think it is the right approach, but I did not want to invest time in
full documentation if it was to be rejected. So a merge would be
great, but further documentation would be needed before the next
client release. There will be some test updates since my code
simplifies the set of transports. There is some deprecation of the old
APIs that needs to happen, and I set up the prep-work for that, but
didn't bother with all of that since I was waiting on selection.

The short answer: my branch puts connection management into the
transport, and Brett's patches puts that into the client. A decision
needs to be made, and that will determine future work. I've previously
emailed with an explanation for why I believe the transport class
should handle this (via a connection manager), and why (IMO) the
client should not be worrying about it. Making this decision removes
blocks for future changes (eg. timeouts).

Anyway... I'd merge 'proper' into the upstream master branch (to fix
http and issue 53), and then I'd be on the hook for documentation,
test changes, backwards compat work, etc. I believe it is totally the
right direction.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Hey, Python client users

2011-10-18 Thread Greg Stein
On Tue, Oct 18, 2011 at 10:53, Soren Hansen  wrote:
> 2011/10/16 Greg Stein :
>> Anyhow, enough description. If you're using Python, then I'd highly 
>> recommend:
>>  https://github.com/gstein/riak-python-client/tree/proper
>>
>> I hope that helps, and let me know if you run into any problems with it.
>
> Thanks for this. Your changes look great.
>
> I take it you're not using Python 2.7?

I'm using Python 2.7 in production, and 2.5 on my dev machine (Mac OS,
Leopard). I've recently installed 2.7 on my dev box for its builtin
'ssl' module, and so more of my dev work is moving to 2.7. I need an
HTTPSConnection subclass to be used by the Riak client, which should
be possibly by subclassing some transport stuff; I'll be validating
that work shortly.

> "The unicode problem"[1][2]
> makes it rather unusable for me with Python 2.7. :( (Not due to your
> changes, but the Riak Python client in general)
>
> [1]: https://issues.basho.com/show_bug.cgi?id=649
> [2]: https://github.com/basho/riak-python-client/issues/32

What is it about 2.7 that makes the problem worse for you? Are you
using Unicode keys or bucket names?

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Hey, Python client users

2011-10-20 Thread Greg Stein
On Oct 18, 2011 11:12 AM, "Greg Stein"  wrote:
>...
> The short answer: my branch puts connection management into the
> transport, and Brett's patches puts that into the client. A decision
> needs to be made, and that will determine future work. I've previously
> emailed with an explanation for why I believe the transport class
> should handle this (via a connection manager), and why (IMO) the
> client should not be worrying about it. Making this decision removes
> blocks for future changes (eg. timeouts).

Thoughts, Russell?

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Hey, Python client users

2011-10-20 Thread Greg Stein
On Oct 20, 2011 7:44 AM, "Russell Brown"  wrote:
>...
> I had a chat with Reid Draper about it (he is much more pythonic than me),
he concurs. I just need to do a bit more reading through the code, and
unless anyone here objects (anyone? class? anyone?), I'll start work at
merging your fork into the Basho repo so we can get back on track.

I hope it is no work; I think that it is close to basho/master, and can
merge clean. I can't check for a while, as I'm on planes for the next 22
hours :-(

After you merge, I'll start three branches: one for doc/test fixes, another
for deprecating bogus transports, and one to work with Brett on timeouts.
I'd like his input on that part, as we both want timeout handling for our
production systems.

Thanks!
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: key garbage collection

2011-11-03 Thread Greg Stein
On Thu, Nov 3, 2011 at 02:39, Justin Karneges  wrote:
>...
> Say you have an operation that requires creating two keys, A and B, and you
> succeed in creating A but fail in creating B.  How do you delete A after the
> fact?  I have two ideas:
>
> 1) Run periodic MapReduce operations that do full db scans looking for garbage
> keys and deleting them (this seems really horrible, but I'll admit I'm new to
> distributed DBs and MapReduce).

I believe that you will *always* need to do this. Without
transactions, you can always end up with cruft. Best you can do is
minimize how often you need to run the scavenge process.

> 2) Maintain cleanup logs that explicitly identify possibly offending keys, for
> optimized cleanup processing.

These logs need to be stored *somewhere*, but that storage could also
fail. That is why I believe you'll need a periodic full scan for
garbage.

(and note this applies whether "storage" is memory, disk, Riak, or
whatever else)

>...
> So far so good.  Now for handling cleanup.  Periodically, we scan the
> "cleanup" bucket for keys to process.  Since keys only exist in this bucket at
> the moment of a write (they are deleted immediately afterwards), in practice
> there should hardly be any keys in here at any single point in time.  We're
> talking single digits here.  Much better than a full db scan to find garbage
> keys.  Also, the keys to process can be narrowed down by time (e.g. > 5
> minutes ago) based on the key name.

This will minimize your scans, but not eliminate them. You may not be
able to write to the "cleanup" bucket because you've lost all network
connectivity to the Riak cluster. Not a bad assumption, given that you
could not write out B (what makes you think you could write to
"cleanup"?).

Personally, rather than attempting to write something else to a
failing Riak cluster, I'd suggest keeping these keys in memory along
with a background thread that periodically attempts to clean them up.
You're gonna lose the keys if the client dies, but hey... as I said:
best you can do is to minimize the full scans.

>...

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Problem installing Riak Python client

2011-11-10 Thread Greg Stein
On Thu, Nov 10, 2011 at 11:51, Nate Lawson  wrote:
>...
> BTW, are there any plans for the Riak python client to use the protobuf C 
> library directly via ctypes? The pure python implementation of protobuf seems 
> a little slow.

Not that I've seen. I plan to use the HTTP interface because I can
encrypt it, and I can avoid MitM attacks. That isn't possible with the
protobuf interface. I think you'll need to find somebody that deploys
heavy use of the protobuf interface to be interested enough to improve
its speed.

For the basic problem spawning this thread: I've issued a pull request
for my "deprecate" branch which disables the protobuf requirement.
You'll be able to install the client without needing protobuf to be
installed. Of course, you'll need protobuf if you want to use that
transport... but if you stick to HTTP, then you'll be fine.

Note that I've sped up the Python HTTP transport. It is definitely
faster (via persistent connections), but I haven't done a comparison
against protobufs yet. Basho has a benchmarking tool that I might try.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Problem installing Riak Python client

2011-11-10 Thread Greg Stein
On Nov 10, 2011 4:26 PM, "Nate Lawson"  wrote:
>
>
> On Nov 10, 2011, at 4:04 PM, Greg Stein wrote:
>
> > On Thu, Nov 10, 2011 at 11:51, Nate Lawson  wrote:
> >> ...
> >> BTW, are there any plans for the Riak python client to use the
protobuf C library directly via ctypes? The pure python implementation of
protobuf seems a little slow.
> >
> > Not that I've seen. I plan to use the HTTP interface because I can
> > encrypt it, and I can avoid MitM attacks. That isn't possible with the
> > protobuf interface. I think you'll need to find somebody that deploys
> > heavy use of the protobuf interface to be interested enough to improve
> > its speed.
>
> There should be an SSL option for Riak with protobufs, perhaps on an
alternate port. No reason to go to http just to get SSL.

Certainly, but I haven't heard anyone thinking of that either.

Unless/until somebody codes that up, then I'm sticking to HTTP(S).

> > Note that I've sped up the Python HTTP transport. It is definitely
> > faster (via persistent connections), but I haven't done a comparison
> > against protobufs yet. Basho has a benchmarking tool that I might try.
>
> I wonder if gzip encoding would also help for larger keys/values?

It absolutely would. It is generally faster to compress/decompress than the
time spent to transfer the extra bytes on the wire.

I dunno Erlang, but I could certainly fix the Python client to deal with
potential compression.

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Luwak PUT Content-Range

2011-11-29 Thread Greg Stein
On Nov 29, 2011 5:08 AM, "John Axel Eriksson"  wrote:
>
> Is it possible to incrementally add to a file in Luwak using PUT and the
Content-Range header. I just assumed that it was but I can't seem to
> get the expected results, it just overwrites whatever the key contents
were before. The reason I want to do this is because we have some pretty
> large files I don't want to load fully into memory before PUTing them.

Hmm? You can read from a file and write to the http socket. There is no
reason or need to load the entire contents into memory.

I don't know what client you're using, but I do know the Python client is
broken in this regard. It erroneously loads the full content into memory.
But there is nothing from the Riak server that demands such an approach.

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Connection Pooling with the python-riak-client

2012-01-23 Thread Greg Stein
On Fri, Jan 20, 2012 at 15:53, Michael Clemmons  wrote:
>...
> I've decided to go back and cleanup that approach and reapply it to the
> current master branch.  To my surprise I found that the ConnectionManager
> supports multiple connections and has the tools to add and remove them.
> Looking at the simple case of the RiakHttpTransport layer and the
> http_request method it looks like it should grab a new connection(or an old
> one) and try it and if it fails move on to the next putting the host port
> pair at the bottom of the list.

Right. That is the intent.

I will note that a failure does not automatically remove a host/port
pair, but fails the particular request (after N retries). I'm not sure
that is entirely "the best strategy" (see below, ref: many
strategies), but the plumbing should be there for applications to
decide the proper behavior. It may be possible to decide a
best/default strategy so that (most) applications do not have to get
involved.

> So it looked like all I needed to do was update the client code to accept a
> list of hostport pairs and it would just work, which sounded too easy to be
> true.  I tested it anyways and if I use one hostport that is a working riak
> node it connects and everything works.  If I include 2 nodes one working and
> one a random port it fails no matter the order.  So its not just trying to
> connect to the first and failing its connecting to them all and failing if
> any fail.

Well, yeah. You started the thing up, saying all the host/port pairs
were proper. It is telling you they are not :-)

One question: when does the failure happen? At instantiation time, or
later at request time? On the first request, or some later request?

(as I recall, it should lazy-open all host/port pairs, so the failure
should not happen until later... and only when the pair is attempted
to be used)

> Anyone have any idea of why its built this way and what other solutions

The overall intent is to connect to (at least) one known working node.
That node can then be queried for "all" other known working nodes,
which are then added into the ConnectionManager (CM). The (long-lived)
process can then continue to monitor the status of the ring and make
corresponding updates to the CM.

The code does not (yet) have a well-defined process for *removal* of
non-working nodes. That is a complex application-level decision.
Should it remove the host/port permanently? If it is just a network
glitch, or the particular host is have transient issues, then maybe
the pair should be kept around (but unused) and re-installed in a
minute or two when the host starts replying again. Maybe you just
remove the pair and wait for a general background monitoring thread to
note their existence and reinstall the pair.

For a single-threaded, short-lived application, the multiple host/port
pair capability is not very useful. That functionality is really
necessary for multiple threads and/or long-lived processes. In this
scenario, as existing connections get used up, the CM will spin up new
connections for threads to use to perform their operations. (the
underlying connections are persistent and reused until the server
decides to close them, where the client will attempt to reopen and
reuse the connection again)

What happens when you give the CM a list of *working* host/port pairs?
Does that still fail for you? It is true that when one goes down, then
some level of the stack should remove the pair, but "which level" just
hasn't been decided.

There is also a "monitor" concept that has been sketched out in the
code, but not implemented. See riak/transports/monitor.py. That should
be used in a long-running application to periodically hit the riak
servers, querying what nodes are in the ring, and adding new ones and
removing broken ones. I sketched it out but neither myself nor anybody
else has further worked on such logic.

> people have worked out on their own?  My intent is to do this so it merges
> cleanly with the current master, and doesn't introduce unnecessary change,
> to increase the likelyhood of a successful pull request.

For production code, some of this host/port pair management needs to
be done. It would be nice to have the monitor (thread) completed, but
that may not be appropriate for your application.

I think that Brett's work on timeouts is necessary for production
code. The key decision point here is Python compatibility support. If
the library requires 2.7, then it should be quite easy to merge his
changes. I think (but don't recall offhand) that the timeout parameter
for HTTPConnection might be available in 2.6, but I definitely know it
is not available for Python 2.5. When I began my work on the client, I
was targeting 2.5 and made many compatibility changes with that in
mind. This was primarily to support my 2.5-based dev environment, even
though I was going to deploy to 2.7. I eventually upgraded my dev
environment, so compatibility isn't a huge concern for me any more. I
would leave

Re: Connection Pooling with the python-riak-client

2012-01-24 Thread Greg Stein
On Tue, Jan 24, 2012 at 12:34, Michael Clemmons  wrote:
> Greg,
> Your amazing thanks.  In my application its failing on the start of the
> application, I do not believe while trying to do a request but its possible
> let me grok and get back to you with some trace backs.

Sounds good. I just looked at the code and it "should" lazy-connect.
You shouldn't see any failures at setup time. Only when you make a
request and it tries to establish te connection.

> As far as Im aware to define more than one hostport with the client you
> still have to hack the client.  Adding an optional hostports or servers
> parameter would be simple.

Yes. I was focusing on the lower-level code, and didn't hack
RiakClient's API. I'm thinking two things: add hostports, and a
start_monitor parameter (and code up the latter, of course).

> Being able to define the connection manager as a kwarg might be a good
> option.  If the intent is to define the conextmanager by subclassing the
> transport, things make more sense.

There are a couple paths to take, at least. The current code says
"subclass the transport and override the default_cm classvar". That is
sufficient, but maybe there is a better/clearer approach.

> I think for multiple nodes round robin
> might be the most sane default for longterm or short term connections.

It does imply that you'll distribute your request load across the
ring. If N clients are running, then this is important (otherwise, all
clients would just hit the first node in the config). There are
certainly more sophisticated algorithms possible, but the round-robin
used right now should work for most users.

Note that if a particular request takes a while, the connection is NOT
in the ConnectionManager. That will ensure that you don't back up a
bunch of future requests behind a single, slow request. Only when the
(slow) request completes will the connection be returned to the CM for
usage by another request.

> Thanks again for replying, I'll see what happens when I try this with
> multiple live nodes, and get back with more thoughts.

Excellent!

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python Client Thread Safe?

2012-02-27 Thread Greg Stein
IIRC, that error was seen when connecting to Riak 0.14.0. What version of
Riak are you connecting to?

I might also suggest using the current, unreleased Python client from the
master branch on GitHub. It has much better support for threads and
persistent connections. Just don't use variant client_id values and PB
connections (switch to Riak 1.1 and ignore client_id).

One Client can be shared across threads, but not Objects. I don't recall
whether Buckets can be shared.

Cheers,
-g
On Feb 26, 2012 2:39 PM, "Jim Adler"  wrote:

> I'm getting the following error while using protocol buffers
> (RiakPbcTransport) with more than one thread (stack trace below):
>
>Socket returned short packet length 0 - expected 4'
>
> I'm using the 1.3.0 Python client on Mac and Ubuntu 11.04 and have seen
> the same error on both OS's. A single-thread works fine as does the
> RiakHttpTransport.
>
> Anyone have this problem?
>
> Thanks,
> Jim
>
> File
> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/bucket.py", line 260,
> in get
>return obj.reload(r)
>  File
> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py", line
> 373, in reload
>Result = t.get(self, r, vtag)
>  File
> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
> line 195, in get
>msg_code, resp = self.send_msg(MSG_CODE_GET_REQ, req, None)
>  File
> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
> line 387, in send_msg
>return self.recv_msg(conn, expect)
>  File
> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
> line 413, in recv_msg
>self.recv_pkt(conn)
>  File
> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
> line 460, in recv_pkt
>len(nmsglen))
> RiakError: 'Socket returned short packet length 0 - expected 4'
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python Client Thread Safe?

2012-02-27 Thread Greg Stein
To clarify my thread-sharing comments: those are with respect to the
unreleased client. And I've checked: the Bucket objects are sharable,
too. Obviously, not if multiple threads are calling competing .set_r()
or somesuch on the client or bucket. The short answer is share Client
and Bucket objects to get access to Objects, and keep those
per-thread.

The underlying transport will manage multiple connections,
persistently, in a thread-safe manner.

(and I'm unclear on threading issues for the 1.3.0 python client)

Cheers,
-g

On Mon, Feb 27, 2012 at 22:37, Greg Stein  wrote:
> IIRC, that error was seen when connecting to Riak 0.14.0. What version of
> Riak are you connecting to?
>
> I might also suggest using the current, unreleased Python client from the
> master branch on GitHub. It has much better support for threads and
> persistent connections. Just don't use variant client_id values and PB
> connections (switch to Riak 1.1 and ignore client_id).
>
> One Client can be shared across threads, but not Objects. I don't recall
> whether Buckets can be shared.
>
> Cheers,
> -g
>
> On Feb 26, 2012 2:39 PM, "Jim Adler"  wrote:
>>
>> I'm getting the following error while using protocol buffers
>> (RiakPbcTransport) with more than one thread (stack trace below):
>>
>>        Socket returned short packet length 0 - expected 4'
>>
>> I'm using the 1.3.0 Python client on Mac and Ubuntu 11.04 and have seen
>> the same error on both OS's. A single-thread works fine as does the
>> RiakHttpTransport.
>>
>> Anyone have this problem?
>>
>> Thanks,
>> Jim
>>
>> File
>>
>> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
>> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/bucket.py", line 260,
>> in get
>>    return obj.reload(r)
>>  File
>>
>> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
>> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py", line
>> 373, in reload
>>    Result = t.get(self, r, vtag)
>>  File
>>
>> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
>> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
>> line 195, in get
>>    msg_code, resp = self.send_msg(MSG_CODE_GET_REQ, req, None)
>>  File
>>
>> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
>> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
>> line 387, in send_msg
>>    return self.recv_msg(conn, expect)
>>  File
>>
>> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
>> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
>> line 413, in recv_msg
>>    self.recv_pkt(conn)
>>  File
>>
>> "/usr/local/Cellar/python/2.7.2/Frameworks/Python.framework/Versions/2.7/li
>> b/python2.7/site-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
>> line 460, in recv_pkt
>>    len(nmsglen))
>> RiakError: 'Socket returned short packet length 0 - expected 4'
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: licenses (was Re: riakkit, a python riak object mapper, has hit beta!(

2012-03-04 Thread Greg Stein
Hey Andrey,

I've spent well over a decade dealing with licensing issues. One thing
that I've learned is that licensing is a personal choice and decision,
and it is nearly impossible to alter somebody's philosophy. I find
people fall into the GPL camp ("free software"), or the Apache/BSD
camp ("permissive / open source"), so I always recommend GPLv3 or
ALv2. (I find people choosing weak reciprocal licenses like LGPL, EPL,
MPL, CDDL, etc should make up their mind and go to GPL or AL)

In any case... license choice and arguments for one over the other is
best left to personal email, rather than a public mailing list like
riak-users. Changing minds doesn't happen on a mailing list :-)

Cheers,
-g

On Fri, Mar 2, 2012 at 05:24, Andrey V. Martyanov  wrote:
> Hi Justin,
>
> Sorry for the late response, I didn't  see your message! In fact, I know the
> differences between the two. But, what is the profit of using it? Why don't
> just use BSD, for example, like many open source projects do. The biggest
> minus of LGPL is that many people think that it's the same as GPL and have
> problems understanding it. Even your think that I don't know the difference!
> :) Why? Because, it's a common practice. A lot of people really don't know
> the difference. That's why I said before that (L)GPL is overcomplicated. If
> you open the LGPL main page [1], first thing you will see is "Why you
> shouldn't use the Lesser GPL for your next library". Is it normal? It
> confuses people. There are a lot of profit in pulling back the changes
> you've made - a lot of people see it, fix it, comment it, improve it and so
> on. Why the license forces me to to that? It shouldn't.
>
> [1] http://www.gnu.org/licenses/lgpl.html
>
> Best regards,
> Andrey Martyanov
>
> On Fri, Mar 2, 2012 at 8:29 AM, Justin Sheehy  wrote:
>>
>> Hi, Andrey.
>>
>> On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov" 
>> wrote:
>>
>> > Sorry for GPL, it's a typo. I just don't like GPL-based licenses,
>> > including LGPL. I think it's overcomplicated.
>>
>> You are of course free to dislike anything you wish, but it is worth
>> mentioning that GPL and LGPL are very different licenses; the LGPL is
>> missing infectious aspects of the GPL.
>>
>> There are many projects which could not use GPL code compatibly with their
>> preferred license but which can safely use LGPL code.
>>
>> Justin
>>
>>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak-python-client2, a rewrite of the official client

2012-03-16 Thread Greg Stein
On Thu, Mar 15, 2012 at 11:13, Shuhao Wu  wrote:
>...
> Erlang. In my client, a chunk of the code actually comes from the original
> client as they work with a few adaptations.

Yes, I noticed that you retain transports/connection.py, but why did
you strip out all the comments?! That is definitely not a good path
towards a long-term maintainable codebase. I put a lot of commentary
into connection.py because there are explicit race conditions in
there. The documentation was necessary to clarify where the races
exist, what is being done about it, and why the code works properly.
With your stripped-down version, all of that information is LOST.
Future maintainers will not understand the issues and break it, or
they will think there are issues that don't really exist. I could just
see somebody saying "holy crap! in the presence of threads, this won't
work" and go and introduce a Queue in there. Totally senseless and
overkill.

But because you stripped the code... no benefit, and probably harm.

I'll take the somewhat-messy Basho client, over your version where you
explicitly remove useful knowledge.

I would also note that you have ALREADY introduced bugs. Even in the
very simplest case. Consider the following:

  client = riak2.client.Client('server1')
  bucket = client.bucket('test')
  ob = bucket.get('key')
  data = ob.get_data()

  client = riak2.client.Client('server2')
  bucket = client.bucket('migrate')
  bucket.new('key', data)

Pretty simple, hm? Migrate some data from one server to another.

It doesn't work.

Your get_http_cm() class method is a broken idea. It is effectively a
global variable. Because the Client('server2') call does not provide a
connection manager, it "reuses" the one created for the 'server1'
connection.

>...

How long and how much energy are you willing to expend? Of your own
time, and of others who may want to use your code?

Also, consider that you've turned something like bucket.get_r() into
just bucket.r. Sure, that looks good, but examine the current bucket
code: if the bucket doesn't provide a value, then it defers to the
client. These are not just simple attributes. What you really want is
to use Python's properties. That'll make them *look* like attributes,
but you can execute the fallback code. And you could do that on the
*existing* client, and benefit everybody, without introducing a bunch
of bugs by trying to do it yourself.

You could then use riak.util.deprecated() to mark .get_r() as
deprecated. In a future revision, then you could clear them out. The
client that everybody uses would then benefit.


>From a community/social dynamic, you're forking the project and going
your own away. That doesn't help the broader community. A few people
might use your client, but most will stick to Basho's client. It may
feel nice and fun for you, but applying your efforts to the official
client *will* help everybody. And after a round of deprecation, then
you can clear out all of the stuff you find messy. But spending some
time to do this right, and to *work with* the existing community will
produce a much larger benefit to all.

-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak-python-client2, a rewrite of the official client

2012-03-16 Thread Greg Stein
On Thu, Mar 15, 2012 at 14:08, Armon Dadgar  wrote:
>> Message: 4
>> Date: Thu, 15 Mar 2012 11:13:03 -0400
>> From: Shuhao Wu 
>> To: "Andrey V. Martyanov" 
>> Cc: riak-users@lists.basho.com
>> Subject: Re: riak-python-client2, a rewrite of the official client
>> Message-ID:
>>       
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> I'm looked into just modifying and contributing to the existing library, and
>> found several issues with it. Here's my main motivation for a rewrite:
>>
>> ? 1. The current structure riak-python-client is somewhat messy. Everything
>>     depends on each other. Just look at things like RiakLink and
>> RiakIndexEntry.
>>     They're unnecessary and overcomplicates the code. Furthermore, if you 
>> look
>>     the transports, it's very much dependent on things like RiakObject, and
>>     RiakObject is pretty much nothing without the transport. It's almost 
>> like a
>>     circular dependency. So instead, I redesigned the transports to operate
>>     independently from things like RiakObject. To do this, simply modifying
>>     is not feasible and it will result in an almost complete rewrite anyway 
>> due
>>     to the dependencies problems I described.
>>  2. There's a lot of "bloat" in the current riak-python-client. A simple
>>     example would be get_ and set_, as well as things like RiakLink and
>>     RiakIndexEntry. To get rid of those would pretty much require a rewrite 
>> as
>>     well.
>>  3. Basho currently do not have a dedicated python developer working on this.
>>     I don't know this for sure but I think their resources, in terms
>> of clients,
>>     go mainly to java, ruby and javascript, though that's just my 
>> observations.
>>
>> My primary goal of having a rewrite is hopefully simplify the code base as 
>> well
>> as improve some aspects of the python client (such as not using deprecated
>> functions such as apply) and (hopefully) increase the speed of the client.
>> After examining the code (which I had to do while rewriting), I don't think
>> simply modifying the current codebase could fix its issues (there are more
>> issues then what I've stated), and I don't think it will take as long as 
>> people
>> think. The current code base has about 4k lines of python and 0.5k lines of
>> Erlang. In my client, a chunk of the code actually comes from the original
>> client as they work with a few adaptations.
>>
>> As far as road map goes, I'm currently just rewriting all the functionalities
>> provided by the current python client, and here's a list of things that Sean
>> would like to see accomplished, which I will work on once I have all the
>> functionalities of the current client complete:
>>
>>    https://gist.github.com/1959278
>>
>> I hope I've answered all the questions. If there is any more 
>> questions/comments,
>> feel free to shoot it my way.
>>
>> Shuhao
>>
>
> I agree that the Python client does need some work to clean it up,
> and make it more idiomatic Python, but I'm not sure that a total rewrite
> is necessary.
>
> Most of the abstractions are good, but they just need a bit of cleanup.
> I think there is always a tendency to assume things are unnecessarily messy,
> but once you get to rewriting it you end up running into the pain points that
> drove those design decisions.
>
> If there is interest, we could just formulate a roadmap for the a set of 
> breaking
> changes to the existing client, and release it as a new version of the same 
> project.
>
> Things I would like to see:
>  * Cleanup the transport interface
>  * RiakObject needs to be simplified
>  * Support a RiakJSONObject subclass which has the encoding logic that is 
> inside RiakObject now
>  * Indexes need to be cleaned up
>  * MapReduce interfaces feel a bit dirty
>  * Much improved exception hierarchy
>  * RiakClient / RiakBucket interface needs to cleaned up

Right. Evolution of a client that everybody uses.

It isn't like Basho is refusing to accept pull requests from the
community. If they were a total black hole, then there may be an
argument for forking the project and evolving it. But that would at
least be from a fork. I don't see a valid argument for
start-from-scratch.

Yet that isn't the case. Basho's responsiveness on the Python client
might not be "awesome", but they *are* willing to engage. I met up
with them last fall, have exchanged numerous emails with them, and
even had a conference call about the Python client. Not to mention all
the discussion and the pull requests that I provided, which got
merged. They *are* working with the community.

In my experience, communities exist to develop/maintain/focus on a
code base. Multiple/competing codebases end up fracturing the
community. That rarely leads to a long-term, sustainable, healthy
outcome.

Cheers,
-g

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com