Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Mike Bayer

On Aug 10, 2014, at 9:59 AM, Amrith Kumar  wrote:

>  
> To Mike Bayer’s point about data distribution and transaction management; 
> yes, we handle all the details relating to handling data consistency and 
> providing atomic transactions during Insert/Update/Delete operations.
>  
> As a company, we at Tesora are committed to OpenStack and are significant 
> participants in Trove (the database-as-a-service project for OpenStack). You 
> can verify this yourself on Stackalytics [7] or [8]. If you would like to 
> consider it as a part of your solution to oslo.db, we’d be thrilled to work 
> with the OpenStack community to make this work, both from a technical and a 
> business/licensing perspective. You can catch most of our dev team on either 
> #openstack-trove or #tesora.
>  
> Some of us from Tesora, Percona and Mirantis are planning an ops panel 
> similar to the one at Atlanta, for the Summit in Paris. I would definitely 
> like to meet with more of you in Paris and discuss how we address issues of 
> scale in the database that powers an OpenStack implementation.


OK well just to be clear, oslo.db is Python code that basically provides 
in-application helpers and patterns to work with databases, primarily through 
SQLAlchemy.   So it’s essentially openstack-specific patterns and recipes on 
top of SQLAlchemy. It has very little to do with the use of special 
database backends that know how to partition among shards and/or master/slaves 
(I thought the original proposal was for master/slave).So the Tesora 
product would be 99% “drop in”, with at most some configurational flags set up 
on the Python side, and everything else being configurational. Since the 
proposal here is for “transparent”, which is taken to mean, “no app changes are 
needed”.   My only point was that, an application-layer reader/writer 
distribution approach would need to work at the level of transactions, not 
statements, and therefore would need to know at transaction start time what the 
nature of the transaction would be (and thus requires some small declaration at 
the top, hence code changes…code changes that I think are a good thing as 
explicit declaration of reader/writer methods up top can be handy in other ways 
too).


>  
> Thanks,
>  
> -amrith
>  
> --
>  
> Amrith Kumar, CTO Tesora (www.tesora.com)
>  
> Twitter: @amrithkumar 
> IRC: amrith @freenode
>  
>  
> [1] http://www.tesora.com/solutions/database-virtualization-engine
> [2] http://www.tesora.com/solutions/downloads/products
> [3] 
> http://www.mysqlperformanceblog.com/2014/06/24/benchmarking-tesoras-database-virtualisation-engine-sysbench/
> [4] 
> http://www.tesora.com/blog/perconas-evaluation-our-database-virtualization-engine
> [5] http://resources.tesora.com/site/download/percona-benchmark-whitepaper
> [6] 
> http://www.tesora.com/blog/ingesting-over-100-rows-second-mysql-aws-cloud
> [7] http://stackalytics.com/?module=trove-group&metric=commits
> [8] http://stackalytics.com/?module=trove-group&metric=marks
>  
>  
>  
>  
>  
> From: Mike Wilson [mailto:geekinu...@gmail.com] 
> Sent: Friday, August 08, 2014 7:35 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation
>  
> Li Ma,
>  
> This is interesting, In general I am in favor of expanding the scope of any 
> read/write separation capabilities that we have. I'm not clear what exactly 
> you are proposing, hopefully you can answer some of my questions inline. The 
> thing I had thought of immediately was detection of whether an operation is 
> read or write and integrating that into oslo.db or sqlalchemy. Mike Bayer has 
> some thoughts on that[1] and there are other approaches around that can be 
> copied/learned from. These sorts of things are clear to me and while moving 
> towards more transparency for the developer, still require context. Please, 
> share with us more details on your proposal.
>  
> -Mike
>  
> [1] 
> http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html
> [2] 
> http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/
>  
> 
> On Thu, Aug 7, 2014 at 10:03 PM, Li Ma  wrote:
> Getting a massive amount of information from data storage to be displayed is
> where most of the activity happens in OpenStack. The two activities of reading
> data and writing (creating, updating and deleting) data are fundamentally
> different.
> 
> The optimization for these two opposite database activities can be done by
> physically separating the databases that service these two different
> activities. All the writes go to database servers, which then replicates the
> written data to the datab

Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Mike Bayer

On Aug 10, 2014, at 11:17 AM, Li Ma  wrote:

> 
> How about Galera multi-master cluster? As Mike Bayer said, it is virtually 
> synchronous by default. It is still possible that outdated rows are queried 
> that make results not stable.

not sure if I said that :).  I know extremely little about galera.


> 
> 
> Let's move forward to synchronous replication, like Galera with causal-reads 
> on. The dominant advantage is that it has consistent relational dataset 
> support. The disadvantage are that it uses optimistic locking and its 
> performance sucks (also said by Mike Bayer :-). For optimistic locking 
> problem, I think it can be dealt with by retry-on-deadlock. It's not the 
> topic here.

I *really* don’t think I said that, because I like optimistic locking, and I’ve 
never used Galera ;).

Where I am ignorant here is of what exactly occurs if you write some rows 
within a transaction with Galera, then do some reads in that same transaction.  
 I’d totally guess that Galera would need to first have SELECTs come from a 
slave node, then the moment it sees any kind of DML / writing, it transparently 
switches the rest of the transaction over to a writer node.   No idea, but it 
has to be something like that?   


> 
> 
> So, the transparent read/write separation is dependent on such an 
> environment. SQLalchemy tutorial provides code sample for it [1]. Besides, 
> Mike Bayer also provides a blog post for it [2].

So this thing with the “django-style routers”, the way that example is, it 
actually would work poorly with a Session that is not in “autocommit” mode, 
assuming you’re working with regular old databases that are doing some simple 
behind-the-scenes replication.   Because again, if you do a flush, those rows 
go to the master, if the transaction is still open, then reading from the 
slaves you won’t see the rows you just inserted.So in reality, that example 
is kind of crappy, if you’re in a transaction (which we are) you’d really need 
to be doing session.using_bind(“master”) all over the place, and that is 
already way too verbose and hardcoded.   I’m wondering why I didn’t make a huge 
note of that in the post.  The point of that article was more to show that hey, 
you *can* control it at this level if you want to but you need to know what 
you’re doing.

Just to put it out there, this is what I think good high/level master/slave 
separation in the app level (reiterating: *if we want it in the app level at 
all*) should approximately look like:

@transaction.writer
def read_and_write_something(arg1, arg2, …):
# …

@transaction.reader
def only_read_something(arg1, arg2, …):
# …

that way there is no awareness of master/slave anything, the underlying system 
can decide what “reader” and “writer” means.   Do in-app switching between two 
databases, send out some magic signals to some commercial clustering service, 
have the “readers” work in “autocommit” mode, or do nothing, whatever.  The 
code doesn’t decide this imperatively.But it isn’t 100% “transparent”, this 
small amount of declaration per-method is needed.


> 
> What I did is to re-implement it in OpenStack DB API modules in my 
> development environment, using Galera cluster(causal-reads on). It has been 
> running perfectly for more than a week. The routing session manager works 
> well while maintaining data consistency.

OK so Galera would perhaps have some way to make this happen, and that’s great. 
   My understanding is that people are running Openstack already with Galera, 
that’s why we’re hitting issues with some of those SELECT..FOR UPDATEs that are 
being replaced with optimistic approaches as you mention. But beyond that 
this isn’t any kind of “change” to oslo.db or anything else.   Run Openstack 
with whatever database backend you want, ideally (that is my primary agenda, 
sorry MySQL vendors!).


> Finally, I think if we can integrate it into oslo.db, it is a perfect plus 
> for those who would like to deploy Galera (or other similar technology) as DB 
> backend.

this (the word “integrate”, and what does that mean) is really the only thing 
making me nervous.  If the integration here is the django blog post I have, 
it’s not going to work with transactions.   Either the system is magical enough 
that a single transaction can read/write from both sources midway and there is 
no “integration” needed, or the transaction has to be declared up front as 
reader or writer.  Or you don’t use transactions except for writers, which is 
essentially the same as “declaration up front”.

> 
> [1] 
> http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#custom-vertical-partitioning
> [2] 
> http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/
> [3] Galera replication method: http://galeracluster.com/products/technology/
> 
> 
> _

Re: [openstack-dev] [nova][core] Expectations of core reviewers

2014-08-13 Thread Mike Bayer

On Aug 13, 2014, at 1:44 PM, Russell Bryant  wrote:

> On 08/13/2014 01:09 PM, Dan Smith wrote:
>> Expecting cores to be at these sorts of things seems pretty reasonable
>> to me, given the usefulness (and gravity) of the discussions we've been
>> having so far. Companies with more cores will have to send more or make
>> some hard decisions, but I don't want to cut back on the meetings until
>> their value becomes unjustified.
> 
> I disagree.  IMO, *expecting* people to travel, potentially across the
> globe, 4 times a year is an unreasonable expectation, and quite
> uncharacteristic of open source projects.  If we can't figure out a way
> to have the most important conversations in a way that is inclusive of
> everyone, we're failing with our processes.
> 
> By all means, if a subset wants to meet up and make progress on some
> things, I think that's fine.  I don't think anyone think it's not
> useful.  However, discussions need to be summarized and taken back to
> the list for discussion before decisions are made.  That's not the way
> things are trending here, and I think that's a problem.

Count me in on the “not requiring travel” team here.   I have multiple issues 
with travel, including that it is very stressful and tends to ruin my 
productivity for weeks leading up to it, and lots of us also have family 
responsibilities that are difficult and potentially expensive to arrange for an 
absense, such as child care.

It’s difficult to compare OpenStack to other open source projects, in that it 
is on such a more massive and high velocity scale than almost any others 
(perhaps the Linux kernel is similar).   It is certainly true that F2F meetings 
encourage better communications and sparking of new ideas and directions that 
wouldn’t otherwise have occurred, but then again I will also suggest that the 
difference in collaborative productivities for different individuals between 
F2F and remote probably varies highly based on the individual, including their 
social proclivities, specific projects and focus, and working styles.In 
this sense I’m really voting for an “all of the above” approach, in that yes we 
should do what we can to facilitate travel, we should do what we can to 
facilitate remote meetings over conferences and I love the idea of telepresence 
meetups, and we should give room to those who are very productive remotely and 
have difficulties with regular travel. The telepresence idea in particular 
opens the door to people meeting up in a semi-F2F style many more than four 
times per year, in fact.  I wouldn’t mind at all going to an office every 
Friday to have our oslo.db meeting over a nice telepresence system.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] Acceptable methods for establishing per-test-suite behaviors

2014-08-22 Thread Mike Bayer
Hi all -

I’ve spent many weeks on a series of patches for which the primary goal is to 
provide very efficient patterns for tests that use databases and schemas within 
those databases, including compatibility with parallel tests, transactional 
testing, and scenario-driven testing (e.g. a test that runs multiple times 
against different databases).

To that end, the current two patches that achieve this behavior in a rudimental 
fashion are part of oslo.db and are at: 
https://review.openstack.org/#/c/110486/ and 
https://review.openstack.org/#/c/113153/.They have been in the queue for 
about four weeks now.  The general theory of operation is that within a 
particular Python process, a fixed database identifier is established 
(currently via an environment variable).   As tests request the services of 
databases, such as a Postgresql database or a MySQL database, the system will 
provision a database within that backend of that fixed identifier and return 
it.   The test can then request that it make use of a particular “schema” - for 
example, Nova’s tests may request that they are using the “nova schema”, which 
means that the schema for Nova’s model will be created within this database, 
and will them remain permanently across the span of many tests which use this 
same schema.  Only when a test requests that it wants a different schema, or no 
schema, will the tables be dropped.To ensure the schema is “clean” for 
every test, the provisioning system ensures that each test runs within a 
transaction, which at test end is rolled back.In order to accommodate tests 
that themselves need to roll back, the test additionally runs within the 
context of a SAVEPOINT.   This system is entirely working, and for those that 
are wondering, yes it works with SQLite as well (see 
https://review.openstack.org/#/c/113152/).

And as implied earlier, to ensure the operations upon this schema don’t 
conflict with parallel test runs, the whole thing is running within a database 
that is specific to the Python process.

So instead of the current behavior of generating the entire nova schema for 
every test and being hardcoded to Sqlite, a particular test will be able to run 
itself against any specific backend or all available backends in series, 
without needing to do a CREATE for the whole schema on every test.   It will 
greatly expand database coverage as well as allow database tests to run 
dramatically faster, using entirely consistent systems for setting up schemas 
and database connectivity.

The “transactional test” system is one I’ve used extensively in other projects. 
 SQLAlchemy itself now runs tests against a py.test-specific variant which runs 
under parallel testing and generates ad-hoc schemas per Python process.The 
patches above achieve these patterns successfully and transparently in the 
context of Openstack tests, only the “scenarios” support for a single test to 
run repeatedly against multiple backends is still a todo.

However, the first patch has just been -1’ed by Robert Collins, the publisher 
of many of the various “testtools” libraries that are prevalent within 
Openstack projects.

Robert suggests that the approach integrate with the testresources library: 
https://pypi.python.org/pypi/testresources.   I’ve evaluated this system and 
after some initial resistance I can see that it would in fact work very nicely 
with the system I have, in that it provides the OptimisingTestSuite - a special 
unittest test suite that will take tests like the above which are marked 
needing particular resources, and then sort them such that individual resources 
are set up and torn down a minimal number of times.It has heavy algorithmic 
logic to accomplish this which is certainly far beyond what would be 
appropriate to home-roll within oslo.db.

I like the idea of integrating this optimization a lot, however it runs into a 
particular issue which I also hit upon with my more simplistic approach.   

The issue is that being able to use a resource like a database schema across 
many tests requires that some kind of logic has access to the test run as a 
whole.At the very least, a hook that indicates “the tests are done, lets 
tear down these ad-hoc databases” is needed.

For my first iteration, I observed that Openstack tests are generally run 
either via testr, or via a shell script.  So to that end I expanded upon an 
approach that was already present in oslo.db, that is to use scripts which 
provision the names of databases to create, and then drop them at the end of 
all tests run.   For testr, I used the “instance_execute”, “instance_dispose”, 
and “instance_provision” hooks in testr.conf to call upon these sub-scripts:

instance_provision=${PYTHON:-python} -m oslo.db.sqlalchemy.provision echo 
$INSTANCE_COUNT
instance_dispose=${PYTHON:-python} -m oslo.db.sqlalchemy.provision drop 
--conditional $INSTANCE_IDS
instance_execute=OSLO_SCHEMA_TOKEN=$INSTANCE_ID $COMMAND

Re: [openstack-dev] [Openstack-stable-maint] [Neutron][stable] How to backport database schema fixes

2014-08-29 Thread Mike Bayer

On Aug 29, 2014, at 7:23 AM, Alan Pevec  wrote:

>> It seems that currently it's hard to backport any database schema fix to
>> Neutron [1] which uses alembic to manage db schema version. Nova has the
>> same issue before
>> and a workaround is to put some placeholder files before each release.
>> So first do we allow db schema fixes to be backport to stable for Neutron ?
> 
> DB schema backports was a topic at StableBranch session last design
> summit [*] and policy did not change: not allowed in general but
> exceptions could always be discussed on stable-maint list.
> 
>> If we do, then how about put some placeholder files similar to Nova at the
>> end of each release cycle? or we have some better solution for alembic.
> 
> AFAIK you can't have placeholders in alembic, there was an action item
> from design session for Mark to summarize his best practices for db
> backports.


Alembic doesn’t need “placeholder” files, if we’re referring to the practice 
with migrate to have empty migration files present so that new migrations can 
be spliced in.   Alembic migrations can be spliced anywhere in the series.  The 
only current limitation, which is on deck to be opened up, is that the 
migrations ultimately have to be arranged linearly in some way (e.g. if two 
different environments are the product of two branches, and need to run the 
same series of migrations, but one needs to skip certain files and the other 
needs to skip others, only those migrations that are needed on each are 
applied.  But SQLalchemy-migrate certainly has no capability for that either).  
 If this issue needs to be fast-tracked I can move my efforts there.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Kilo Cycle Goals Exercise

2014-09-08 Thread Mike Bayer

On Sep 7, 2014, at 9:27 PM, Anita Kuno  wrote:

> On 09/07/2014 09:12 PM, Angus Salkeld wrote:
>> Lets prevent blogs like this: http://jimhconsulting.com/?p=673 by making
>> users happy.
> I don't understand why you would encourage writers of blog posts you
> disagree with by sending them traffic.

silencing users who have issues with your project is a really bad idea.If 
you want to create something great you absolutely need to be obsessed with your 
detractors and the weight of what they have to say.  Because unless they are a 
competitor engaged in outright slander, there will be some truth in it.   
Ignore criticism at your peril.Someone who takes the time to write out an 
even somewhat well reasoned criticism is doing your project a service.

I found the above blog post very interesting as I’d like to get more data on 
what the large, perceived issues are.




> 
> Anita.
>> 
>> 1) Consistent/easy upgrading.
>> all projects should follow a consistent model to the way they approach
>> upgrading.
>> it should actually work.
>> - REST versioning
>> - RPC versioning
>> - db (data) migrations
>> - ordering of procedures and clear documentation of it.
>>[this has been begged for by operators, but not sure how we have
>> delivered]
>> 
>> 2) HA
>>  - ability to continue operations after been restated
>>  - functional tests to prove the above?
>> 
>> 3) Make it easier for small business to "give OpenStack a go"
>>  - produce standard docker images as part of ci with super simple
>> instructions on running them.
>> 
>> -Angus
>> 
>> 
>> 
>> On Thu, Sep 4, 2014 at 1:37 AM, Joe Gordon  wrote:
>> 
>>> As you all know, there has recently been several very active discussions
>>> around how to improve assorted aspects of our development process. One idea
>>> that was brought up is to come up with a list of cycle goals/project
>>> priorities for Kilo [0].
>>> 
>>> To that end, I would like to propose an exercise as discussed in the TC
>>> meeting yesterday [1]:
>>> Have anyone interested (especially TC members) come up with a list of what
>>> they think the project wide Kilo cycle goals should be and post them on
>>> this thread by end of day Wednesday, September 10th. After which time we
>>> can begin discussing the results.
>>> The goal of this exercise is to help us see if our individual world views
>>> align with the greater community, and to get the ball rolling on a larger
>>> discussion of where as a project we should be focusing more time.
>>> 
>>> 
>>> best,
>>> Joe Gordon
>>> 
>>> [0]
>>> http://lists.openstack.org/pipermail/openstack-dev/2014-August/041929.html
>>> [1]
>>> http://eavesdrop.openstack.org/meetings/tc/2014/tc.2014-09-02-20.04.log.html
>>> 
>>> ___
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> 
>>> 
>> 
>> 
>> 
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Kilo Cycle Goals Exercise

2014-09-08 Thread Mike Bayer

On Sep 7, 2014, at 8:14 PM, Monty Taylor  wrote:

> 
> 
> 2. Less features, more win
> 
> In a perfect world, I'd argue that we should merge exactly zero new features 
> in all of kilo, and instead focus on making the ones we have work well. Some 
> of making the ones we have work well may wind up feeling just like writing 
> features, as I imagine some of our features are probably only half features 
> in the first place.
> 
> 3. Deleting things
> 
> We should delete a bunch of code. Deleting code is fun, and it makes your 
> life better, because it means you have less code. So we should start doing 
> it. In particular, we should look for places where we wrote something as part 
> of OpenStack because the python community did not have a thing already, but 
> now there is one. In those cases, we should delete ours and use theirs. Or we 
> should contribute to theirs if it's not quite good enough yet. Or we should 
> figure out how to make more of the oslo libraries things that can truly 
> target non-OpenStack things.
> 

I have to agree that “Deleting things” is the best, best thing.  Anytime you 
can refactor around things and delete more code, a weight is lifted, your code 
becomes easier to understand, maintain, and expand upon.Simpler code then 
gives way to refactorings that you couldn’t even see earlier, and sometimes you 
can even get a big performance boost once a bunch of supporting code now 
reveals itself to be superfluous.  This is most critical for Openstack as 
Openstack is written in Python, and for as long as we have to stay on the 
cPython interpreter, number of function calls is directly proportional to how 
slow something is.  Function calls are enormously expensive in Python.

Something that helps greatly with the goal of “Deleting things” is to reduce 
dependencies between systems. In SQLAlchemy, the kind of change I’m usually 
striving for is one where we take a module that does one Main Thing, but then 
has a bunch of code spread throughout it to do some Other Thing, that is really 
much less important, but complicates the Main Thing.   What we do is reorganize 
the crap out of it and get the Other Thing out of the core Main Thing, move it 
out to a totally optional “extension” module that bothers noone, and we 
essentially forget about it because nobody ever uses it (examples include 
http://docs.sqlalchemy.org/en/rel_0_9/changelog/migration_08.html#instrumentationmanager-and-alternate-class-instrumentation-is-now-an-extension,
 
http://docs.sqlalchemy.org/en/rel_0_9/changelog/migration_08.html#mutabletype). 
   When we make these kinds of changes, major performance enhancements come 
right in - the Main Thing no longer has to worry about those switches and left 
turns introduced by the Other Thing, and tons of superfluous logic can be 
thrown away.SQLAlchemy’s architecture gains from these kinds of changes 
with every major release and 1.0 is no exception.

This is not quite the same as “Deleting things” but it has more or less the 
same effect; you isolate code that everyone uses from code that only some 
people occasionally use.   In SQLAlchemy specifically, we have the issue of 
individual database dialects that are still packaged along; e.g. there is 
sqlalchemy.dialects.mysql, sqlalchemy.dialects.postgresql, etc.  However, a few 
years back I went through a lot of effort to modernize the system by which 
users can provide their own database backends; while you can not only provide 
your own custom backend using setuptools entry points, I also made a major 
reorganization of SQLAlchemy’s test suite to produce the “dialect test suite”; 
so that when you write your custom dialect, you can actually run a large, 
pre-fabricated test suite out of SQLAlchemy’s core against your dialect, 
without the need for your dialect to be actually *in* SQLAlchemy.  There 
were many wins from this system, including that it forced me to write lots of 
tests that were very well focused on testing specifically what a dialect needs 
to do, in isolation of anything SQLAlchemy itself needs to do.   It allowed a 
whole batch of new third party dialects like that for Amazon Redshift, 
FoundationDB, MonetDB, and also was a huge boon to IBM’s DB2 driver who I 
helped to get onto the new system.   And since then I’ve been able to go into 
SQLAlchemy and dump out lots of old dialects that are much better off being 
maintained separately, at a different level of velocity and hopefully by 
individual contributors who are interested in them, like MS Access, Informix, 
MaxDB, and Drizzle.   Having all these dialects in one big codebase only served 
as a weight on the project, and theoretically it wouldn’t be a bad idea for 
SQLA to have *all* dialects as separate projects, but we’re not there yet.

The only reason I’m rambling on about a SQLAlchemy’s Core/Dialect dichotomy is 
just that I was very much *reminded* of it by the thread regarding Nova and the 
various “virt” drivers.  I know nothi

Re: [openstack-dev] Kilo Cycle Goals Exercise

2014-09-08 Thread Mike Bayer

On Sep 8, 2014, at 11:30 AM, Anita Kuno  wrote:

> Wow, we are really taking liberties with my question today.
> 
> What part of any of my actions current or previous have led you to
> believe that I want to now or ever have silenced anyone? I am curious
> what led you to believe that silencing users was the motivation for my
> question of Angus.

I was only replying to your single message in isolation of the full 
conversation; the notion that one would not want to send traffic to a blog 
because they disagree with it, at face value seems like a bad idea.  Apparently 
that isn’t the meaning you wished to convey, so I apologize for missing the 
larger context.




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-08 Thread Mike Bayer
Hi All - 

Joe had me do some quick memory profiling on nova, just an FYI if anyone wants 
to play with this technique, I place a little bit of memory profiling code 
using Guppy into nova/api/__init__.py, or anywhere in your favorite app that 
will definitely get imported when the thing first runs:

from guppy import hpy
import signal
import datetime

def handler(signum, frame):
print "guppy memory dump"

fname = "/tmp/memory_%s.txt" % 
datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
prof = hpy().heap()
with open(fname, 'w') as handle:
prof.dump(handle)
del prof

signal.signal(signal.SIGUSR2, handler)



Then, run nova-api, run some API calls, then you hit the nova-api process with 
a SIGUSR2 signal, and it will dump a profile into /tmp/ like this:

http://paste.openstack.org/show/108536/

Now obviously everyone is like, oh boy memory lets go beat up SQLAlchemy 
again…..which is fine I can take it.  In that particular profile, there’s a 
bunch of SQLAlchemy stuff, but that is all structural to the classes that are 
mapped in Nova API, e.g. 52 classes with a total of 656 attributes mapped.   
That stuff sets up once and doesn’t change.   If Nova used less ORM,  e.g. 
didn’t map everything, that would be less.  But in that profile there’s no 
“data” lying around.

But even if you don’t have that many objects resident, your Python process 
might still be using up a ton of memory.  The reason for this is that the 
cPython interpreter has a model where it will grab all the memory it needs to 
do something, a time consuming process by the way, but then it really doesn’t 
ever release it  (see 
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
 for the “classic” answer on this, things may have improved/modernized in 2.7 
but I think this is still the general idea).

So in terms of SQLAlchemy, a good way to suck up a ton of memory all at once 
that probably won’t get released is to do this:

1. fetching a full ORM object with all of its data

2. fetching lots of them all at once


So to avoid doing that, the answer isn’t necessarily that simple.   The quick 
wins to loading full objects are to …not load the whole thing!   E.g. assuming 
we can get Openstack onto 0.9 in requirements.txt, we can start using 
load_only():

session.query(MyObject).options(load_only(“id”, “name”, “ip”))

or with any version, just load those columns - we should be using this as much 
as possible for any query that is row/time intensive and doesn’t need full ORM 
behaviors (like relationships, persistence):

session.query(MyObject.id, MyObject.name, MyObject.ip)

Another quick win, if we *really* need an ORM object, not a row, and we have to 
fetch a ton of them in one big result, is to fetch them using yield_per():

   for obj in session.query(MyObject).yield_per(100):
# work with obj and then make sure to lose all references to it

yield_per() will dish out objects drawing from batches of the number you give 
it.   But it has two huge caveats: one is that it isn’t compatible with most 
forms of eager loading, except for many-to-one joined loads.  The other is that 
the DBAPI, e.g. like the MySQL driver, does *not* stream the rows; virtually 
all DBAPIs by default load a result set fully before you ever see the first 
row.  psycopg2 is one of the only DBAPIs that even offers a special mode to 
work around this (server side cursors).

Which means its even *better* to paginate result sets, so that you only ask the 
database for a chunk at a time, only storing at most a subset of objects in 
memory at once.  Pagination itself is tricky, if you are using a naive 
LIMIT/OFFSET approach, it takes awhile if you are working with a large OFFSET.  
It’s better to SELECT into windows of data, where you can specify a start and 
end criteria (against an indexed column) for each window, like a timestamp.

Then of course, using Core only is another level of fastness/low memory.  
Though querying for individual columns with ORM is not far off, and I’ve also 
made some major improvements to that in 1.0 so that query(*cols) is pretty 
competitive with straight Core (and Core is…well I’d say becoming visible in 
raw DBAPI’s rear view mirror, at least….).

What I’d suggest here is that we start to be mindful of memory/performance 
patterns and start to work out naive ORM use into more savvy patterns; being 
aware of what columns are needed, what rows, how many SQL queries we really 
need to emit, what the “worst case” number of rows will be for sections that 
really need to scale.  By far the hardest part is recognizing and 
reimplementing when something might have to deal with an arbitrarily large 
number of rows, which means organizing that code to deal with a “streaming” 
pattern where you never have all the rows in memory at once - on other projects 
I’ve had tasks that would normally take about a day, but in order to organize 
it to “scale”, took weeks - such as being a

Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-09 Thread Mike Bayer
yes.  guppy seems to have some nicer string formatting for this dump as well, 
but i was unable to figure out how to get this string format to write to a 
file, it seems like the tool is very geared towards interactive console use.   
We should pick a nice memory formatter we like, there’s a bunch of them, and 
then add it to our standard toolset.


On Sep 9, 2014, at 10:35 AM, Doug Hellmann  wrote:

> 
> On Sep 8, 2014, at 8:12 PM, Mike Bayer  wrote:
> 
>> Hi All - 
>> 
>> Joe had me do some quick memory profiling on nova, just an FYI if anyone 
>> wants to play with this technique, I place a little bit of memory profiling 
>> code using Guppy into nova/api/__init__.py, or anywhere in your favorite app 
>> that will definitely get imported when the thing first runs:
>> 
>> from guppy import hpy
>> import signal
>> import datetime
>> 
>> def handler(signum, frame):
>> print "guppy memory dump"
>> 
>> fname = "/tmp/memory_%s.txt" % 
>> datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
>> prof = hpy().heap()
>> with open(fname, 'w') as handle:
>> prof.dump(handle)
>> del prof
>> 
>> signal.signal(signal.SIGUSR2, handler)
> 
> This looks like something we could build into our standard service startup 
> code. Maybe in 
> http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/service.py
>  for example?
> 
> Doug
> 
>> 
>> 
>> 
>> Then, run nova-api, run some API calls, then you hit the nova-api process 
>> with a SIGUSR2 signal, and it will dump a profile into /tmp/ like this:
>> 
>> http://paste.openstack.org/show/108536/
>> 
>> Now obviously everyone is like, oh boy memory lets go beat up SQLAlchemy 
>> again…..which is fine I can take it.  In that particular profile, there’s a 
>> bunch of SQLAlchemy stuff, but that is all structural to the classes that 
>> are mapped in Nova API, e.g. 52 classes with a total of 656 attributes 
>> mapped.   That stuff sets up once and doesn’t change.   If Nova used less 
>> ORM,  e.g. didn’t map everything, that would be less.  But in that profile 
>> there’s no “data” lying around.
>> 
>> But even if you don’t have that many objects resident, your Python process 
>> might still be using up a ton of memory.  The reason for this is that the 
>> cPython interpreter has a model where it will grab all the memory it needs 
>> to do something, a time consuming process by the way, but then it really 
>> doesn’t ever release it  (see 
>> http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
>>  for the “classic” answer on this, things may have improved/modernized in 
>> 2.7 but I think this is still the general idea).
>> 
>> So in terms of SQLAlchemy, a good way to suck up a ton of memory all at once 
>> that probably won’t get released is to do this:
>> 
>> 1. fetching a full ORM object with all of its data
>> 
>> 2. fetching lots of them all at once
>> 
>> 
>> So to avoid doing that, the answer isn’t necessarily that simple.   The 
>> quick wins to loading full objects are to …not load the whole thing!   E.g. 
>> assuming we can get Openstack onto 0.9 in requirements.txt, we can start 
>> using load_only():
>> 
>> session.query(MyObject).options(load_only(“id”, “name”, “ip”))
>> 
>> or with any version, just load those columns - we should be using this as 
>> much as possible for any query that is row/time intensive and doesn’t need 
>> full ORM behaviors (like relationships, persistence):
>> 
>> session.query(MyObject.id, MyObject.name, MyObject.ip)
>> 
>> Another quick win, if we *really* need an ORM object, not a row, and we have 
>> to fetch a ton of them in one big result, is to fetch them using yield_per():
>> 
>>for obj in session.query(MyObject).yield_per(100):
>> # work with obj and then make sure to lose all references to it
>> 
>> yield_per() will dish out objects drawing from batches of the number you 
>> give it.   But it has two huge caveats: one is that it isn’t compatible with 
>> most forms of eager loading, except for many-to-one joined loads.  The other 
>> is that the DBAPI, e.g. like the MySQL driver, does *not* stream the rows; 
>> virtually all DBAPIs by default load a result set fully before you ever see 
>> the first row.  psycopg2 is one of the only DBAPIs that even offers a 
>> special mode to work around this (server side cursors).
>> 
>> Which means its even *better* to paginate result sets

Re: [openstack-dev] [all] i need some help on this bug Bug #1365892

2014-09-10 Thread Mike Bayer

On Sep 10, 2014, at 4:11 AM, Li Tianqing  wrote:

> After some research, i find the reason for the cycle reference. In closure, 
> the _fix_paswords.func_closre reference the _fix_passwords. So the
> cycle reference happened. 
> And  in https://thp.io/2012/python-gc/python_gc_final_2012-01-22.pdf page 5, 
> it says that 
> We observe that Python implementations with distinct GCs behave differently: 
> CPython does not even try to get the order of finalizers right, and
> simply puts uncollectable objects into the global list of garbage for the 
> developer to deal with manually.
> So the gc can not handle cycle reference, then the memory leak happened. 

An object is only uncollectable in Python if it has a __del__ method and is 
part of an unreachable cycle.  Based on a grep, there is only one class in 
oslo.messaging that has a __del__ and that is ConnectionContext in 
_drivers/amqp.py.   

Removing the __del__ method from this object would be the best approach.
There’s nothing wrong with reference cycles in Python as long as we don’t use 
__del__, which IMHO has no legitimate use case.   Additionally, it is very 
difficult to be 100% vigilant against reference cycles reappearing in many 
cases, but it is extremely simple to be 100% vigilant about disallowing __del__.

In this case it appears to be a “safety” in case someone uses the 
ConnectionContext object outside of being a context manager.  I’d fix that and 
require that it be used as a context manager only.   Guessing that the user is 
going to mis-use an object and provide __del__ to protect the user is a bad 
idea - and if you genuinely need this pattern, you use a weakref callback 
instead:

import weakref


class MyCleanupThing(object):
def __init__(self):
self.connection = connection = "Some Connection"
self._ref = weakref.ref(
# key: the weakref callback *cannot* refer to self
self, lambda ref: MyCleanupThing._cleanup(connection))

@staticmethod
def _cleanup(connection):
print("Cleaning up %s!" % connection)


mc = MyCleanupThing()

print("about to clean up...")
del mc
print("should be clean!")

output:

about to clean up...
Cleaning up Some Connection!
should be clean!


 


> 
> If there is something wrong, please fix it. Thanks
> 
> --
> Best
> Li Tianqing
> 
> 在 2014-09-10 11:52:28,"Li Tianqing"  写道:
> Hello,
> I use backdoor of eventlet to enable gc.DEBUG_LEAK, and after wait a few 
> minutes, i can sure that there will some objects that can not be collected by 
> gc.collect in gc.garbage. 
> Those looks like this (catched in ceilometer-collector)
> 
> ['_context_auth_token', 'auth_token', 'new_pass'],
>  (,
>   ),
>  ,
>  ,
>  ,
>  ['_context_auth_token', 'auth_token', 'new_pass'],
>  (,
>   ),
>  ,
>  ,
>  ,
>  ['_context_auth_token', 'auth_token', 'new_pass'],
>  (,
>   ),
>  ,
>  ,
>  ,
> 
> and i suspect those code in oslo.messaging
> 
> def _safe_log(log_func, msg, msg_data):
> """Sanitizes the msg_data field before logging."""
> SANITIZE = ['_context_auth_token', 'auth_token', 'new_pass']
> 
> def _fix_passwords(d):
> """Sanitizes the password fields in the dictionary."""
> for k in d.iterkeys():
> if k.lower().find('password') != -1:
> d[k] = ''
> elif k.lower() in SANITIZE:
> d[k] = ''
> elif isinstance(d[k], dict):
> _fix_passwords(d[k])
> return d
> 
> return log_func(msg, _fix_passwords(copy.deepcopy(msg_data)))
> 
> i can resolve this problem by add _fix_passwords = None before _safe_log 
> returns.
> 
> But i do not really understand why this problem happened, and in depth why 
> the gc can not collect those object. Although i can make those uncollectable 
> objects disappeared.
> But this is not good enough, because if you do not understand it you will 
> write out some code like this in future, and then also has memory leak too.
> 
> So can some one helps me give some detailed on recursive closure used like 
> the code above, and on why gc can not collect them.
> Thanks a lot lot ..
> 
> --
> Best
> Li Tianqing
> 
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 7:20 AM, Sean Dague  wrote:

> 
> Because we are in Feature Freeze. Now is the time for critical bug fixes
> only, as we start to stabalize the tree. Releasing dependent libraries
> that can cause breaks, for whatever reason, should be soundly avoided.
> 
> If this was August, fine. But it's feature freeze.

I agree with this, changing the MySQL driver now is not an option.That 
train has left the station, I think it’s better we all take the whole Kilo 
cycle to get used to mysql-connector and its quirks before launching it on the 
world, as there will be many more.

However for Kilo, I think those “COMMIT” phrases should be removed and overall 
we need to make a very hard and fast rule that we *do not put multiple 
statements in an execute*.I’ve seen a bunch of these come through so far, 
and for some of them (more the in-Python ones) it seems like the underlying 
reason is a lack of understanding of what exactly a SQLAlchemy “Engine” is and 
what features it supports.

So first, let me point folks to the documentation for this, which anyone 
writing code involving Engine objects should read first:

http://docs.sqlalchemy.org/en/rel_0_9/core/connections.html

Key to this is that while engine supports an “.execute()” method, in order to 
do anything that intends to work on a single connection and typically a single 
transaction, you procure a Connection and usually a Transaction from the 
Engine, most easily like this:

with engine.begin() as conn:
   conn.execute(statement 1)
   conn.execute(statement 2)
   conn.execute(statement 3)
   .. etc


Now let me apologize for the reason this misunderstanding exists in the first 
place:  it’s because in 2005 I put the “.execute()” convenience method on the 
Engine itself (well in fact we didn’t have the Engine/Connection dichotomy back 
then), and I also thought that “implicit execution”, e.g. statement.execute(), 
would be a great idea.Tons of other people still think it’s a great idea 
and even though I’ve buried this whole thing in the docs, they still use it 
like candy….until they have the need to control the scope of connectivity.  

*Huge* mistake, it’s my fault, but not something that can really be changed 
now.   Also, in 2005, Python didn’t have context managers.So we have all 
kinds of klunky patterns like “trans = conn.begin()”, kind of J2EE style, etc., 
but these days, the above pattern is your best bet when you want to invoke 
multiple statements.engine.execute() overall should just be avoided as it 
only leads to misunderstanding.   When we all move all of our migrate stuff to 
Alembic, there won’t be an Engine provided to a migration script, it will be a 
Connection to start with.




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] battling stale .pyc files

2014-09-12 Thread Mike Bayer
I’ve just found https://bugs.launchpad.net/nova/+bug/1368661, "Unit tests 
sometimes fail because of stale pyc files”.

The issue as stated in the report refers to the phenomenon of .pyc files that 
remain inappropriately, when switching branches or deleting files.

Specifically, the kind of scenario that in my experience causes this looks like 
this.  One version of the code has a setup like this:

   mylibrary/mypackage/somemodule/__init__.py

Then, a different version we switch to changes it to this:

   mylibrary/mypackage/somemodule.py

But somemodule/__init__.pyc will still be sitting around, and then things break 
- the Python interpreter skips the module (or perhaps the other way around. I 
just ran a test by hand and it seems like packages trump modules in Python 2.7).

This is an issue for sure, however the fix that is proposed I find alarming, 
which is to use the PYTHONDONTWRITEBYTECODE=1 flag written directly into the 
tox.ini file to disable *all* .pyc file writing, for all environments 
unconditionally, both human and automated.

I think that approach is a mistake.  .pyc files have a definite effect on the 
behavior of the interpreter.   They can, for example, be the factor that causes 
a dictionary to order its elements in one way versus another;  I’ve had many 
relying-on-dictionary-ordering issues (which make no mistake, are bugs) smoked 
out by the fact that a .pyc file would reveal the issue..pyc files also 
naturally have a profound effect on performance.   I’d hate for the Openstack 
community to just forget that .pyc files ever existed, our tox.ini’s safely 
protecting us from them, and then we start seeing profiling results getting 
published that forgot to run the Python interpreter in it’s normal state of 
operation.  If we put this flag into every tox.ini, it means the totality of 
openstack testing will not only run more slowly, it also means our code will 
never be run within the Python runtime environment that will actually be used 
when code is shipped.   The Python interpreter is incredibly stable and 
predictable and a small change like this is hardly something that we’d usually 
notice…until something worth noticing actually goes wrong, and automated 
testing is where that should be found, not after shipment.

The issue of the occasional unmatched .pyc file whose name happens to still be 
imported by the application is not that frequent, and can be solved by just 
making sure unmatched .pyc files are deleted ahead of time.I’d favor a 
utility such as in oslo.utils which performs this simple step of finding all 
unmatched .pyc files and deleting (taking care to be aware of __pycache__ / 
pep3147), and can be invoked from tox.ini as a startup command.

But guess what - suppose you totally disagree and you really want to not have 
any .pyc files in your dev environment.   Simple!  Put 
PYTHONDONTWRITEBYTECODE=1 into *your* environment - it doesn’t need to be in 
tox.ini, just stick it in your .profile.   Let’s put it up on the wikis, let’s 
put it into the dev guides, let’s go nuts.   Banish .pyc files from your 
machine all you like.   But let’s *not* do this on our automated test 
environments, and not force it to happen in *my* environment. 

I also want to note that the issue of stale .pyc files should only apply to 
within the library subject to testing as it lives in its source directory.  
This has nothing to do with the packages that are installed under .tox as those 
are full packages, unless there’s some use case I’m not aware of (possible), we 
don’t checkout code into .tox nor do we manipulate files there as a matter of 
course.

Just my 2.5c on this issue as to the approach I think is best.   Leave the 
Python interpreter’s behavior as much as “normal” as possible in our default 
test environment.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 7:39 AM, Sean Dague  wrote:

> I assume you, gentle OpenStack developers, often find yourself in a hair
> tearing out moment of frustration about why local unit tests are doing
> completely insane things. The code that it is stack tracing on is no
> where to be found, and yet it fails.
> 
> And then you realize that part of oslo doesn't exist any more
> except there are still pyc files laying around. Gah!
> 
> I've proposed the following to Nova and Python novaclient -
> https://review.openstack.org/#/c/121044/
> 
> Which sets PYTHONDONTWRITEBYTECODE=true in the unit tests.

my VPN was down and I didn’t get this thread just now, but I am strongly -1 on 
this as added to tox.ini, my response is 
http://lists.openstack.org/pipermail/openstack-dev/2014-September/045873.html.

Short answer: if you want this feature, put PYTHONDONTWRITEBYTECODE into *your* 
environment.  Don’t force it on our automated tests or on my environment.   
.pyc files make a difference in behavior, and if we banish them from all 
testing, then our code is never tested within the environment that it will 
normally be run in after shipment.

I’d far prefer a simple script added to tox.ini which deletes orphaned .pyc 
files only, if a change to tox.ini must be made.




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 10:40 AM, Ihar Hrachyshka  wrote:

> Signed PGP part
> On 12/09/14 16:33, Mike Bayer wrote:
>> I agree with this, changing the MySQL driver now is not an option.
> 
> That was not the proposal. The proposal was to introduce support to
> run against something different from MySQLdb + a gate job for that
> alternative. The next cycle was supposed to do thorough regression
> testing, benchmarking, etc. to decide whether we're ok to recommend
> that alternative to users.

ah, well that is a great idea.  But we can have that throughout Kilo anyway, 
why not ?





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 11:24 AM, Sean Dague  wrote:

> On 09/12/2014 11:21 AM, Mike Bayer wrote:
>> 
>> On Sep 12, 2014, at 7:39 AM, Sean Dague  wrote:
>> 
>>> I assume you, gentle OpenStack developers, often find yourself in a hair
>>> tearing out moment of frustration about why local unit tests are doing
>>> completely insane things. The code that it is stack tracing on is no
>>> where to be found, and yet it fails.
>>> 
>>> And then you realize that part of oslo doesn't exist any more
>>> except there are still pyc files laying around. Gah!
>>> 
>>> I've proposed the following to Nova and Python novaclient -
>>> https://review.openstack.org/#/c/121044/
>>> 
>>> Which sets PYTHONDONTWRITEBYTECODE=true in the unit tests.
>> 
>> my VPN was down and I didn’t get this thread just now, but I am strongly -1 
>> on this as added to tox.ini, my response is 
>> http://lists.openstack.org/pipermail/openstack-dev/2014-September/045873.html.
>> 
>> Short answer: if you want this feature, put PYTHONDONTWRITEBYTECODE into 
>> *your* environment.  Don’t force it on our automated tests or on my 
>> environment.   .pyc files make a difference in behavior, and if we banish 
>> them from all testing, then our code is never tested within the environment 
>> that it will normally be run in after shipment.
>> 
>> I’d far prefer a simple script added to tox.ini which deletes orphaned .pyc 
>> files only, if a change to tox.ini must be made.
> 
> Your example in the other thread includes the random seed behavior,
> which is already addressed in new tox. So I don't see that as an issue.

Will these patches all be accompanied by corresponding PYTHONHASHSEED settings? 
  Also why don’t you want to place PYTHONDONTWRITEBYTECODE into your own 
environment?I don’t want this flag on my machine.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 11:33 AM, Mike Bayer  wrote:

> 
> On Sep 12, 2014, at 11:24 AM, Sean Dague  wrote:
> 
>> On 09/12/2014 11:21 AM, Mike Bayer wrote:
>>> 
>>> On Sep 12, 2014, at 7:39 AM, Sean Dague  wrote:
>>> 
>>>> I assume you, gentle OpenStack developers, often find yourself in a hair
>>>> tearing out moment of frustration about why local unit tests are doing
>>>> completely insane things. The code that it is stack tracing on is no
>>>> where to be found, and yet it fails.
>>>> 
>>>> And then you realize that part of oslo doesn't exist any more
>>>> except there are still pyc files laying around. Gah!
>>>> 
>>>> I've proposed the following to Nova and Python novaclient -
>>>> https://review.openstack.org/#/c/121044/
>>>> 
>>>> Which sets PYTHONDONTWRITEBYTECODE=true in the unit tests.
>>> 
>>> my VPN was down and I didn’t get this thread just now, but I am strongly -1 
>>> on this as added to tox.ini, my response is 
>>> http://lists.openstack.org/pipermail/openstack-dev/2014-September/045873.html.
>>> 
>>> Short answer: if you want this feature, put PYTHONDONTWRITEBYTECODE into 
>>> *your* environment.  Don’t force it on our automated tests or on my 
>>> environment.   .pyc files make a difference in behavior, and if we banish 
>>> them from all testing, then our code is never tested within the environment 
>>> that it will normally be run in after shipment.
>>> 
>>> I’d far prefer a simple script added to tox.ini which deletes orphaned .pyc 
>>> files only, if a change to tox.ini must be made.
>> 
>> Your example in the other thread includes the random seed behavior,
>> which is already addressed in new tox. So I don't see that as an issue.
> 
> Will these patches all be accompanied by corresponding PYTHONHASHSEED 
> settings?   Also why don’t you want to place PYTHONDONTWRITEBYTECODE into 
> your own environment?I don’t want this flag on my machine.

not to mention PYTHONHASHSEED only works on Python 3.  What is the issue in tox 
you’re referring to ?




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 11:56 AM, Johannes Erdfelt  wrote:

> On Fri, Sep 12, 2014, Doug Hellmann  wrote:
>> I don’t think we will want to retroactively change the migration scripts
>> (that’s not something we generally like to do),
> 
> We don't allow semantic changes to migration scripts since people who
> have already run it won't get those changes. However, we haven't been
> shy about fixing bugs that prevent the migration script from running
> (which this change would probably fall into).

fortunately BEGIN/ COMMIT are not semantic directives. The migrations 
semantically indicated by the script are unaffected in any way by these 
run-environment settings.


> 
>> so we should look at changes needed to make sqlalchemy-migrate deal with
>> them (by ignoring them, or working around the errors, or whatever).
> 
> That said, I agree that sqlalchemy-migrate shouldn't be changing in a
> non-backwards compatible way.

on the sqlalchemy-migrate side, the handling of it’s ill-conceived “sql script” 
feature can be further mitigated here by parsing for the “COMMIT” line when it 
breaks out the SQL and ignoring it, I’d favor that it emits a warning also.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 11:55 AM, Julien Danjou  wrote:

> On Fri, Sep 12 2014, Sean Dague wrote:
> 
>> Which sets PYTHONDONTWRITEBYTECODE=true in the unit tests.
>> 
>> This prevents pyc files from being writen in your git tree (win!). It
>> doesn't seem to impact what pip installs... and if anyone knows how to
>> prevent those pyc files from getting created, that would be great.
>> 
>> But it's something which hopefully causes less perceived developer
>> fragility of the system.
> 
> I understand it's generating .pyc could be something, but I don't really
> like that patch.
> 
> I guess the problem is more likely that testrepository load the tests
> From the source directory whereas maybe we could make it load them from
> what's installed into the venv?

we do this in oslo.db by doing an install within tox.ini and then making sure 
we don’t set usedevelop.  However, oslo.db does this because there’s issues 
with using namespace packages (e.g. oslo/db, oslo/utils, etc.) when you mix up 
installs with develop installations.   I hate it, and I’d like to someday solve 
that problem differently (locally I will often manually craft a test 
environment with PYTHONPATH just to avoid it).  I run different test runners 
based on what I’m trying to do and I skip tox for running individual tests, so 
this behavior gets in my way constantly, whereas the .pyc file issue almost 
never.

So IMO this approach as a means to get around the infrequent .pyc annoyance 
introduces lots more inconvenience than it saves.












> 
> -- 
> Julien Danjou
> /* Free Software hacker
>   http://julien.danjou.info */
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 12:03 PM, Sean Dague  wrote:

> On 09/12/2014 11:33 AM, Mike Bayer wrote:
>> 
>> On Sep 12, 2014, at 11:24 AM, Sean Dague  wrote:
>> 
>>> On 09/12/2014 11:21 AM, Mike Bayer wrote:
>>>> 
>>>> On Sep 12, 2014, at 7:39 AM, Sean Dague  wrote:
>>>> 
>>>>> I assume you, gentle OpenStack developers, often find yourself in a hair
>>>>> tearing out moment of frustration about why local unit tests are doing
>>>>> completely insane things. The code that it is stack tracing on is no
>>>>> where to be found, and yet it fails.
>>>>> 
>>>>> And then you realize that part of oslo doesn't exist any more
>>>>> except there are still pyc files laying around. Gah!
>>>>> 
>>>>> I've proposed the following to Nova and Python novaclient -
>>>>> https://review.openstack.org/#/c/121044/
>>>>> 
>>>>> Which sets PYTHONDONTWRITEBYTECODE=true in the unit tests.
>>>> 
>>>> my VPN was down and I didn’t get this thread just now, but I am strongly 
>>>> -1 on this as added to tox.ini, my response is 
>>>> http://lists.openstack.org/pipermail/openstack-dev/2014-September/045873.html.
>>>> 
>>>> Short answer: if you want this feature, put PYTHONDONTWRITEBYTECODE into 
>>>> *your* environment.  Don’t force it on our automated tests or on my 
>>>> environment.   .pyc files make a difference in behavior, and if we banish 
>>>> them from all testing, then our code is never tested within the 
>>>> environment that it will normally be run in after shipment.
>>>> 
>>>> I’d far prefer a simple script added to tox.ini which deletes orphaned 
>>>> .pyc files only, if a change to tox.ini must be made.
>>> 
>>> Your example in the other thread includes the random seed behavior,
>>> which is already addressed in new tox. So I don't see that as an issue.
>> 
>> Will these patches all be accompanied by corresponding PYTHONHASHSEED 
>> settings?   Also why don’t you want to place PYTHONDONTWRITEBYTECODE into 
>> your own environment?I don’t want this flag on my machine.
> 
> This was the set of tox changes that went in in August.

corresponding to PYTHONHASHSEED, right?  That whole thing is Python 3 only.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 12:13 PM, Jeremy Stanley  wrote:

> On 2014-09-12 11:36:20 -0400 (-0400), Mike Bayer wrote:
> [...]
>> not to mention PYTHONHASHSEED only works on Python 3.  What is the
>> issue in tox you’re referring to ?
> 
> Huh? The overrides we added to numerous projects' tox.ini files to
> stem the breakage in Python 2.x unit tests from hash seed
> randomization in newer tox releases would seem to contradict your
> assertion. Also documentation...
> 
> https://docs.python.org/2.7/using/cmdline.html#envvar-PYTHONHASHSEED
> 
> (New in version 2.6.8.)

Python 3’s documentation says “new in version 3.2.3”, so, confusing that they 
backported it to 2.6 at the same time but google searches tend to point you 
right here:

https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONHASHSEED



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini

2014-09-12 Thread Mike Bayer

On Sep 12, 2014, at 12:29 PM, Daniel P. Berrange  wrote:

> On Fri, Sep 12, 2014 at 04:23:09PM +, Jeremy Stanley wrote:
>> On 2014-09-12 17:16:11 +0100 (+0100), Daniel P. Berrange wrote:
>> [...]
>>> Agreed, the problem with stale .pyc files is that it never occurs to
>>> developers that .pyc files are causing the problem until after you've
>>> wasted (potentially hours of) time debugging the problem. Avoiding
>>> this pain for all developers out of the box is a clear win overall
>>> and makes openstack development less painful.
>> 
>> I've been bitten by similar issues often enough that I regularly git
>> clean -dfx my checkouts or at least pass -r to tox so that it will
>> recreate its virtualenvs from scratch. Yes it does add some extra
>> time to the next test run, but you can iterate fairly tightly after
>> that as long as you're not actively moving stuff around while you
>> troubleshoot (and coupled with a git hook like Doug described for
>> cleaning on topic branch changes would be a huge boon as well).
> 
> I'm not debating whether there are ways to clean up your env to avoid
> the problem /after/ it occurs. The point is to stop the problem occuring
> in the first place to avoid placing this uneccessary clean up burden
> on devs.  Intentionally leaving things setup so that contributors hit
> bugs like stale .pyc files is just user hostile.

if we’re going to start diluting the test environment to suit developer 
environments, then the CI builds should use a different tox target that does 
*not* specify this environment variable.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Octavia] Which threading library?

2014-09-13 Thread Mike Bayer

On Sep 12, 2014, at 5:53 PM, Eichberger, German  
wrote:

> Hi,
>  
> I think the “standard” threading library for OpenStack is eventlet – however, 
> it seems that Oslo is spearheading efforts to get to a more compatible one 
> (see http://techs.enovance.com/6562/asyncio-openstack-python3) I am now 
> wondering since we are starting fresh if we should embrace (a potential) 
> future or stick with eventlet and all its flaws?

the so-called “flaws” of implicit async are up for debate.A “flaw” of 
asyncio is that it requires a full rewrite of code that uses it, as well as all 
libraries which it consumes that also happen to use IO.This includes the 
particularly salient topic that none of the Python database APIs that exist, 
other than psycopg2, have any support for true non-blocking code at the IO 
level, without using monkeypatching.  the standard Python DBAPI (pep 249) has 
no support for explicit async, so any APIs that do so are ad-hoc (see 
http://initd.org/psycopg/docs/advanced.html#async-support for psycopg2’s very 
well done, but entirely non-standard API in this regard).

Once you’re on explicit async, simple imperative code that happens to make 
calls which imply IO is no longer possible.   This is kind of a blocker to 
end-to-end integration of asyncio in all of Openstack, rather than just making 
use of it in those areas where it is already directly applicable.

This has been discussed in depth at 
lists.openstack.org/pipermail/openstack-dev/2014-July/039291.html.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] battling stale .pyc files

2014-09-15 Thread Mike Bayer

On Sep 15, 2014, at 7:34 AM, Lucas Alvares Gomes  wrote:

> 
> So this ordering thing, I don't think that it's caused by the
> PYTHONDONTWRITEBYTECODE, I googled that but couldn't find anything
> relating this option to the way python hash things (please point me to
> a document/code if I'm wrong). Are you sure you're not confusing it
> with the PYTHONHASHSEED option?


not at all, as I have many years of experience with this phenomenon,  but the 
fact that PYTHONHASHSEED can be used means this particular point is probably 
moot.

> 
> 
> About the performance, this also doesn't seem to be true. I don't
> think .pyc affects the performance we run things at all, pyc are not
> meant to be an optimization in python. It DOES affect the startup of
> the application tho, because it will have to regenerate the bytecode
> all the time, see [4]:

it is true in the case that an application consumes Python modules in a 
non-standard way, such that the .py/.pyc files may be read more frequently than 
just once, depending on how the library is used.   I happen to be the author of 
two such libraries, one is Mako templates and the other is Alembic, which is 
used by many openstack projects.

If a test suite for example runs through all of its Alembic migration scripts 
on every setup/teardown of a test, the presence of .pyc files generated for 
migration scripts will have a definite effect on the speed of the tests.  
Alembic does not place the modules into sys.modules and loads them as they are 
processed for a migration operation.

Within Alembic, there’s all kinds of features involving .pyc files, such as 
sourceless setups, non-pyc setups, so to that extent, Alembic’s has behaviors 
that are explicitly linked directly to whether or not .pyc files are present.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use "vendorized" versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-17 Thread Mike Bayer

On Sep 17, 2014, at 2:46 PM, Clint Byrum  wrote:

> Excerpts from Davanum Srinivas's message of 2014-09-17 10:15:29 -0700:
>> I was trying request-ifying oslo.vmware and ran into this as well:
>> https://review.openstack.org/#/c/121956/
>> 
>> And we don't seem to have urllib3 in global-requirements either.
>> Should we do that first?
> 
> Honestly, after reading this:
> 
> https://github.com/kennethreitz/requests/pull/1812
> 
> I think we might want to consider requests a poor option. Its author
> clearly doesn't understand the role a _library_ plays in software
> development and considers requests an application, not a library.
> 
> For instance, why is requests exposing internal implementation details
> at all?  It should be wrapping any exceptions or objects to avoid
> forcing users to make this choice at all.

that link is horrifying.   I’m really surprised Requests does this, and that 
nobody has complained very loudly about it.   It’s wrong on every level not the 
least of which is the huge security implications.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use "vendorized" versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-17 Thread Mike Bayer

On Sep 17, 2014, at 3:42 PM, Ian Cordasco  wrote:

> 
> Circling back to the issue of vendoring though: it’s a conscious decision
> to do this, and in the last two years there have been 2 CVEs reported for
> requests. There have been none for urllib3 and none for chardet. (Frankly
> I don’t think either urllib3 or chardet have had any CVEs reported against
> them, but let’s ignore that for now.) While security is typically the
> chief concern with vendoring, none of the libraries we use have had
> security issues rendering it a moot point in my opinion.

That’s just amazing.  Requests actually deals with security features 
*directly*, certificates, TLS connections, everything; to take the attitude 
that “well there’ve been hardly any security issues in a *whole two years*, so 
I’m not so concerned” is really not one that is acceptable by serious 
development teams.

Wouldn’t it be a problem for *you* if Requests itself were vendored?   You fix 
a major security hole, but your consuming projects don’t respond, their 
developers are on vacation, sorry, so that hole just keeps right on going.   
People make sure to upgrade their Requests libraries locally, but for all those 
poor saps who have *no idea* they have widely used apps that are bundling it 
silently, they remain totally open to vulnerabilities and the black hats have 
disneyland at their disposal.   The blame keeps going right to you as well.  Is 
that really how things should be done?



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use "vendorized" versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-17 Thread Mike Bayer

On Sep 17, 2014, at 4:31 PM, Ian Cordasco  wrote:

> Project X pins a version of requests. Alice doesn’t know anything about
> requests and does pip install X. Until Alice takes a more active role in
> the development of Project X and looks into requests, she will never know
> she’s installed software that has exposures in it.

If a vulnerability is reported in urllib3 1.9.1, Alice, as well as me and 
everyone else who is not a novice, will at least know we need to run:

$ pip show urllib3 
---
Name: urllib3
Version: 1.9.1
Location: 
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Requires: 


and we know right there we have to upgrade.  We upgrade, and we’re done.If 
we see that some library is pinning it, we will know.  We will complain loudly 
to that library’s author and/or replace that library.   The tools are there to 
give us what we need to be aware and to escalate the problem.

When a library silently bundles the source code and bypasses any normal means 
of us knowing it’s present unless we read the source code or scour the 
documentation, we have no way to know we’re affected.Some applications, 
particularly pip, have to do this, however, it should only be for technical 
reasons.  It should not be because you don’t want novice users to have to learn 
something, or because you’re angling to have lots of downloads on pypi.


>> People make sure to upgrade their Requests libraries locally, but for all
>> those poor saps who have *no idea* they have widely used apps that are
>> bundling it silently, they remain totally open to vulnerabilities and the
>> black hats have disneyland at their disposal.
> 
> I think more applications bundle it than you realize. You’re likely using
> one daily that does it.


SQLAlchemy itself vendorizes Queue and some fragments of six, but that is of a 
much smaller scale, and is for technical reasons, rather than appeasing-newbie 
reasons.   But HTTP has a lot of security-critical surface area.   If I were to 
just bundle my own fork of an HMAC library with a few of my own special 
enhancements, that would be seen as a problem.


> And yeah, we’ll continue to take the blame for the mistake that was made
> for those two exposures. As for “Is that how things should be done?”
> that’s not for me to answer. More than enough projects do it and do it out
> of necessity. The reality is that by vendoring its dependencies, requests
> allows its users more flexibility than other projects.

I haven’t seen the technical reason for Requests doing this, I’ve only seen 
this reason: "I want my users to be free to not use packaging if they don't 
won't to. They can just grab the tarball and go.”.   If that’s really the only 
reason, then I fail to see how that reason has anything to do with flexibility, 
other than the flexibility to remain lazy and ignorant of basic computer 
programming skills - and Requests is a library *for programmers*.  It doesn’t 
do anything without typing code.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][db] Ensuring db isolation between tests

2014-09-18 Thread Mike Bayer
I’ve done a lot of work on this issue and from my perspective, the code is 
mostly ready to go, however we’re in an extended phase of getting folks to sign 
off as well as that I’m waiting for some last-minute fixup from Robert Collins. 
  Patch: [1]  Blueprint, which is to be moved to Kilo: [2]

A “nested” transaction can actually mean one of two things, either a real 
SAVEPOINT, or a logical transaction block.  However, either kind of block will 
be removed if a ROLLBACK is emitted, which also rolls back the actual 
transactional state all the way to the end.The patch above makes this work 
as the result of two fixes.  One is that I replaced the system by which we do 
transactions with the pysqlite driver so that SAVEPOINT actually works [3].  
The other is that I created a comprehensive fixture in [1] that will maintain a 
transaction + savepoint block at all times, working smoothly with tests that 
call commit or rollback any number of times.

From an isolation perspective, we create on-demand databases per process, so 
that each serial test process uses a distinct database per backend.   The 
entire process is managed by testresources which will organize the order of 
tests to most effectively leave a single schema in place with minimal 
teardown/setup.

I’m hoping that my patches can go right in at the top of Kilo and we can begin 
rolling it out in projects that are participating in oslo.db, with the hopes 
that consuming projects will be able to remove a lot of boilerplate 
setup/teardown code. 


1: https://review.openstack.org/#/c/120870/  
2: https://review.openstack.org/#/c/117335/ 
3: https://review.openstack.org/#/c/113152/


On Sep 18, 2014, at 5:59 AM, Salvatore Orlando  wrote:

> Nested commits in sqlalchemy should be seen as a single transaction on the 
> backend, shouldn't they?
> I don't know anything about this specific problem, but the fact that unit 
> tests use sqlite might be a reason, since it's not really a full DBMS...
> 
> I think that wrapping tests in transaction also will require some changes in 
> the architecture of the tests themselves, as many tests call the API router 
> or the plugin which then gets a db session and open a new transaction. 
> Furthermore, it will change the test behaviour possibly hiding errors; some 
> operations indeed perform several distinct transactions, which in this case 
> will be seen a single transaction.
> 
> What Kevin is doing here I think was the original way we used to do that in 
> Neutron (Folsom). Then at some point we realised that due to plugin schema 
> differences we were laving tables behind and switched to drop_all and 
> rebuilding the schema using autogeneration at each test.
> 
> I think it should be ok to merge this patch. I will hold off the +A to give 
> other core reviewers a chance to look at it.
> 
> Salvatore
> 
> 
> On 18 September 2014 11:44, Maru Newby  wrote:
> For legacy reasons, the Neutron test suite creates and destroys a db for each 
> test.  There is a patch proposed to create the tables once and then ensure 
> the tables are wiped at the end of each test [1], providing a performance 
> improvement of ~10%.  I was wondering if this is the best way to provide 
> isolation, since I’ve heard that isolation via per-test transactions should 
> also work.  The patch author reported problems with this approach - 
> apparently nested commits were not being rolled back.  Is there some trick to 
> isolating with transactions that wouldn’t be immediately obvious?
> 
> Thanks,
> 
> 
> Maru
> 
> 1: https://review.openstack.org/#/c/122028/
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] oslo.db 1.1.0 released

2014-11-18 Thread Mike Bayer

> On Nov 18, 2014, at 11:47 AM, Sean Dague  wrote:
> 
> 
> Also can I request that when deprecating methods in oslo libraries we
> use a standard deprecation mechanism so that warnings are emitted when
> this method is used.

+1 for DeprecationWarnings, I noticed oslo.db doesn’t seem to use these





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

> On Nov 18, 2014, at 1:38 PM, Eugene Nikanorov  wrote:
> 
> Hi neutron folks,
> 
> There is an ongoing effort to refactor some neutron DB logic to be compatible 
> with galera/mysql which doesn't support locking (with_lockmode('update')).
> 
> Some code paths that used locking in the past were rewritten to retry the 
> operation if they detect that an object was modified concurrently.
> The problem here is that all DB operations (CRUD) are performed in the scope 
> of some transaction that makes complex operations to be executed in atomic 
> manner.
> For mysql the default transaction isolation level is 'REPEATABLE READ' which 
> means that once the code issue a query within a transaction, this query will 
> return the same result while in this transaction (e.g. the snapshot is taken 
> by the DB during the first query and then reused for the same query).
> In other words, the retry logic like the following will not work:
> 
> def allocate_obj():
> with session.begin(subtrans=True):
>  for i in xrange(n_retries):
>   obj = session.query(Model).filter_by(filters)
>   count = session.query(Model).filter_by(id=obj.id 
> ).update({'allocated': True})
>   if count:
>return obj
> 
> since usually methods like allocate_obj() is called from within another 
> transaction, we can't simply put transaction under 'for' loop to fix the 
> issue.

has this been confirmed?  the point of systems like repeatable read is not just 
that you read the “old” data, it’s also to ensure that updates to that data 
either proceed or fail explicitly; locking is also used to prevent concurrent 
access that can’t be reconciled.  A lower isolation removes these advantages.  

I ran a simple test in two MySQL sessions as follows:

session 1:

mysql> create table some_table(data integer) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into some_table(data) values (1);
Query OK, 1 row affected (0.00 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select data from some_table;
+--+
| data |
+--+
|1 |
+--+
1 row in set (0.00 sec)


session 2:

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> update some_table set data=2 where data=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

then back in session 1, I ran:

mysql> update some_table set data=3 where data=1;

this query blocked;  that’s because session 2 has placed a write lock on the 
table.  this is the effect of repeatable read isolation.

while it blocked, I went to session 2 and committed the in-progress transaction:

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

then session 1 unblocked, and it reported, correctly, that zero rows were 
affected:

Query OK, 0 rows affected (7.29 sec)
Rows matched: 0  Changed: 0  Warnings: 0

the update had not taken place, as was stated by “rows matched":

mysql> select * from some_table;
+--+
| data |
+--+
|1 |
+--+
1 row in set (0.00 sec)

the code in question would do a retry at this point; it is checking the number 
of rows matched, and that number is accurate.

if our code did *not* block at the point of our UPDATE, then it would have 
proceeded, and the other transaction would have overwritten what we just did, 
when it committed.   I don’t know that read committed is necessarily any better 
here.

now perhaps, with Galera, none of this works correctly.  That would be a 
different issue in which case sure, we should use whatever isolation is 
recommended for Galera.  But I’d want to potentially peg it to the fact that 
Galera is in use, or not.

would love also to hear from Jay Pipes on this since he literally wrote the 
book on MySQL ! :)


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db][nova] NovaObject.save() needs its own DB transaction

2014-11-19 Thread Mike Bayer

> On Nov 19, 2014, at 11:46 AM, Matthew Booth  wrote:
> 
> We currently have a pattern in Nova where all database code lives in
> db/sqla/api.py[1]. Database transactions are only ever created or used
> in this module. This was an explicit design decision:
> https://blueprints.launchpad.net/nova/+spec/db-session-cleanup .
> 
> However, it presents a problem when we consider NovaObjects, and
> dependencies between them. For example, take Instance.save(). An
> Instance has relationships with several other object types, one of which
> is InstanceInfoCache. Consider the following code, which is amongst what
> happens in spawn():
> 
> instance = Instance.get_by_uuid(uuid)
> instance.vm_state = vm_states.ACTIVE
> instance.info_cache.network_info = new_nw_info
> instance.save()
> 
> instance.save() does (simplified):
>  self.info_cache.save()
>  self._db_save()
> 
> Both of these saves happen in separate db transactions.
> 

> I don't think we can reasonably remove the cascading save() above due to
> the deliberate design of objects. Objects don't correspond directly to
> their datamodels, so save() does more work than just calling out to the
> DB. We need a way to allow cascading object saves to happen within a
> single DB transaction.

So this is actually part of what https://review.openstack.org/#/c/125181/ aims 
to solve.If it isn’t going to achieve this (and I think I see what the 
problem is), we need to fix it.

> 
> Note that we also have a separate problem, which is that the DB api's
> internal use of transactions is wildly inconsistent. A single db api
> call can result in multiple concurrent db transactions from the same
> thread, and all the deadlocks that implies. This needs to be fixed, but
> it doesn't require changing our current assumption that DB transactions
> live only within the DB api.
> 
> Note that there is this recently approved oslo.db spec to make
> transactions more manageable:
> 
> https://review.openstack.org/#/c/125181/11/specs/kilo/make-enginefacade-a-facade.rst,cm
> 
> Again, while this will be a significant benefit to the DB api, it will
> not solve the problem of cascading object saves without allowing
> transaction management at the level of NovaObject.save(): we need to
> allow something to call a db api with an existing session, and we need
> to allow something to pass an existing db transaction to NovaObject.save().

OK so here is why EngineFacade as described so far doesn’t work, because if it 
is like this:

def some_api_operation ->

novaobject1.save() ->

   @writer
   def do_some_db_thing()

novaobject2.save() ->

   @writer
   def do_some_other_db_thing()

then yes, those two @writer calls aren’t coordinated.   So yes, I think 
something that ultimately communicates the same meaning as @writer needs to be 
at the top:

@something_that_invokes_writer_without_exposing_db_stuff
def some_api_operation ->

# … etc

If my decorator is not clear enough, let me clarify that a decorator that is 
present at the API/ nova objects layer will interact with the SQL layer through 
some form of dependency injection, and not any kind of explicit import; that 
is, when the SQL layer is invoked, it registers some kind of state onto the 
@something_that_invokes_writer_without_exposing_db_stuff system that causes its 
“cleanup”, in this case the commit(), to occur at the end of that topmost 
decorator.


> I think the following pattern would solve it:
> 
> @remotable
> def save():
>session = 
>try:
>r = self._save(session)
>session.commit() (or reader/writer magic from oslo.db)
>return r
>except Exception:
>session.rollback() (or reader/writer magic from oslo.db)
>raise
> 
> @definitelynotremotable
> def _save(session):
>previous contents of save() move here
>session is explicitly passed to db api calls
>cascading saves call object._save(session)

so again with EngineFacade rewrite, the @definitelynotremotable system should 
also interact such that if @writer is invoked internally, an error is raised, 
just the same as when @writer is invoked within @reader.


> 
> Whether we wait for the oslo.db updates or not, we need something like
> the above. We could implement this today by exposing
> db.sqla.api.get_session().

EngineFacade is hoped to be ready for Kilo and obviously Nova is very much 
hoped to be my first customer for integration. It would be great if folks 
want to step up and help implement it, or at least take hold of a prototype I 
can build relatively quickly and integration test it and/or work on a real nova 
integration.

> 
> Thoughts?
> 
> Matt
> 
> [1] At a slight tangent, this looks like an artifact of some premature
> generalisation a few years ago. It seems unlikely that anybody is going
> to rewrite the db api using an ORM other than sqlalchemy, so we should
> probably ditch it and promote it to db/api.py.

funny you should menti

Re: [openstack-dev] [oslo.db][nova] NovaObject.save() needs its own DB transaction

2014-11-19 Thread Mike Bayer

> On Nov 19, 2014, at 12:59 PM, Boris Pavlovic  wrote:
> 
> Matthew, 
> 
> 
> LOL ORM on top of another ORM 
> 
> https://img.neoseeker.com/screenshots/TW92aWVzL0RyYW1h/inception_image33.png 
> 

I know where you stand on this Boris, but I fail to see how this is a 
productive contribution to the discussion.  Leo Dicaprio isn’t going to solve 
our issue here and I look forward to iterating on what we have today.




> 
> 
> 
> Best regards,
> Boris Pavlovic 
> 
> On Wed, Nov 19, 2014 at 8:46 PM, Matthew Booth  > wrote:
> We currently have a pattern in Nova where all database code lives in
> db/sqla/api.py[1]. Database transactions are only ever created or used
> in this module. This was an explicit design decision:
> https://blueprints.launchpad.net/nova/+spec/db-session-cleanup 
>  .
> 
> However, it presents a problem when we consider NovaObjects, and
> dependencies between them. For example, take Instance.save(). An
> Instance has relationships with several other object types, one of which
> is InstanceInfoCache. Consider the following code, which is amongst what
> happens in spawn():
> 
> instance = Instance.get_by_uuid(uuid)
> instance.vm_state = vm_states.ACTIVE
> instance.info_cache.network_info = new_nw_info
> instance.save()
> 
> instance.save() does (simplified):
>   self.info_cache.save()
>   self._db_save()
> 
> Both of these saves happen in separate db transactions. This has at
> least 2 undesirable effects:
> 
> 1. A failure can result in an inconsistent database. i.e. info_cache
> having been persisted, but instance.vm_state not having been persisted.
> 
> 2. Even in the absence of a failure, an external reader can see the new
> info_cache but the old instance.
> 
> This is one example, but there are lots. We might convince ourselves
> that the impact of this particular case is limited, but there will be
> others where it isn't. Confidently assuring ourselves of a limited
> impact also requires a large amount of context which not many
> maintainers will have. New features continue to add to the problem,
> including numa topology and pci requests.
> 
> I don't think we can reasonably remove the cascading save() above due to
> the deliberate design of objects. Objects don't correspond directly to
> their datamodels, so save() does more work than just calling out to the
> DB. We need a way to allow cascading object saves to happen within a
> single DB transaction. This will mean:
> 
> 1. A change will be persisted either entirely or not at all in the event
> of a failure.
> 
> 2. A reader will see either the whole change or none of it.
> 
> We are not talking about crossing an RPC boundary. The single database
> transaction only makes sense within the context of a single RPC call.
> This will always be the case when NovaObject.save() cascades to other
> object saves.
> 
> Note that we also have a separate problem, which is that the DB api's
> internal use of transactions is wildly inconsistent. A single db api
> call can result in multiple concurrent db transactions from the same
> thread, and all the deadlocks that implies. This needs to be fixed, but
> it doesn't require changing our current assumption that DB transactions
> live only within the DB api.
> 
> Note that there is this recently approved oslo.db spec to make
> transactions more manageable:
> 
> https://review.openstack.org/#/c/125181/11/specs/kilo/make-enginefacade-a-facade.rst,cm
>  
> 
> 
> Again, while this will be a significant benefit to the DB api, it will
> not solve the problem of cascading object saves without allowing
> transaction management at the level of NovaObject.save(): we need to
> allow something to call a db api with an existing session, and we need
> to allow something to pass an existing db transaction to NovaObject.save().
> 
> An obvious precursor to that is removing N309 from hacking, which
> specifically tests for db apis which accept a session argument. We then
> need to consider how NovaObject.save() should manage and propagate db
> transactions.
> 
> I think the following pattern would solve it:
> 
> @remotable
> def save():
> session = 
> try:
> r = self._save(session)
> session.commit() (or reader/writer magic from oslo.db)
> return r
> except Exception:
> session.rollback() (or reader/writer magic from oslo.db)
> raise
> 
> @definitelynotremotable
> def _save(session):
> previous contents of save() move here
> session is explicitly passed to db api calls
> cascading saves call object._save(session)
> 
> Whether we wait for the oslo.db updates or not, we need something like
> the above. We could implement this today by exposing
> db.sqla.api.get_session().
> 
> Thoughts?
> 
> Matt

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

> On Nov 19, 2014, at 1:49 PM, Ryan Moats  wrote:
> 
> I was waiting for this because I think I may have a slightly different (and 
> outside of the box) view on how to approach a solution to this.
> 
> Conceptually (at least in my mind) there isn't a whole lot of difference 
> between how the example below (i.e. updates from two concurrent threads) is 
> handled
> and how/if neutron wants to support a multi-master database scenario (which 
> in turn lurks in the background when one starts thinking/talking about 
> multi-region support).
> 
> If neutron wants to eventually support multi-master database scenarios, I see 
> two ways to go about it:
> 
> 1) Defer multi-master support to the database itself.
> 2) Take responsibility for managing the conflict resolution inherent in 
> multi-master scenarios itself.
> 
> The first approach is certainly simpler in the near term, but it has the down 
> side of restricting the choice of databases to those that have solved 
> multi-master and further, may lead to code bifurcation based on possibly 
> different solutions to the conflict resolution scenarios inherent in 
> multi-master.
> 
> The second approach is certainly more complex as neutron assumes more 
> responsibility for its own actions, but it has the advantage that (if done 
> right) would be transparent to the underlying databases (with all that 
> implies)
> 
multi-master is a very advanced use case so I don’t see why it would be 
unreasonable to require a multi-master vendor database.   Reinventing a complex 
system like that in the application layer is an unnecessary reinvention.

As far as working across different conflict resolution scenarios, while there 
may be differences across backends, these differences will be much less 
significant compared to the differences against non-clustered backends in which 
we are inventing our own multi-master solution.   I doubt a home rolled 
solution would insulate us at all from “code bifurcation” as this is already a 
fact of life in targeting different backends even without any implication of 
clustering.   Even with simple things like transaction isolation, we see that 
different databases have different behavior, and if you look at the logic in 
oslo.db inside of 
https://github.com/openstack/oslo.db/blob/master/oslo/db/sqlalchemy/exc_filters.py
 
<https://github.com/openstack/oslo.db/blob/master/oslo/db/sqlalchemy/exc_filters.py>
 you can see an example of just how complex it is to just do the most 
rudimental task of organizing exceptions into errors that mean the same thing.


> My reason for asking this question here is that if the community wants to 
> consider #2, then these problems are the place to start crafting that 
> solution - if we solve the conflicts inherent with the  two conncurrent 
> thread scenarios, then I think we will find that we've solved the 
> multi-master problem essentially "for free”.
> 

Maybe I’m missing something, if we learn how to write out a row such that a 
concurrent transaction against the same row doesn’t throw us off, where is the 
part where that data is replicated to databases running concurrently on other 
IP numbers in a way that is atomic come out of that effort “for free” ?   A 
home-rolled “multi master” scenario would have to start with a system that has 
multiple create_engine() calls, since we need to communicate directly to 
multiple database servers. From there it gets really crazy.  Where’s all that ?




> 
> Ryan Moats
> 
> Mike Bayer  wrote on 11/19/2014 12:05:35 PM:
> 
> > From: Mike Bayer 
> > To: "OpenStack Development Mailing List (not for usage questions)" 
> > 
> > Date: 11/19/2014 12:05 PM
> > Subject: Re: [openstack-dev] [Neutron] DB: transaction isolation and
> > related questions
> > 
> > On Nov 18, 2014, at 1:38 PM, Eugene Nikanorov  
> > wrote:
> > 
> > Hi neutron folks,
> > 
> > There is an ongoing effort to refactor some neutron DB logic to be 
> > compatible with galera/mysql which doesn't support locking 
> > (with_lockmode('update')).
> > 
> > Some code paths that used locking in the past were rewritten to 
> > retry the operation if they detect that an object was modified concurrently.
> > The problem here is that all DB operations (CRUD) are performed in 
> > the scope of some transaction that makes complex operations to be 
> > executed in atomic manner.
> > For mysql the default transaction isolation level is 'REPEATABLE 
> > READ' which means that once the code issue a query within a 
> > transaction, this query will return the same result while in this 
> > transaction (e.g. the snapshot is taken by the DB during the first 
> > query and then

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

> On Nov 19, 2014, at 2:58 PM, Jay Pipes  wrote:
> 
> 
>> In other words, the retry logic like the following will not work:
>> 
>> def allocate_obj():
>> with session.begin(subtrans=True):
>>  for i in xrange(n_retries):
>>   obj = session.query(Model).filter_by(filters)
>>   count = session.query(Model).filter_by(id=obj.id
>> ).update({'allocated': True})
>>   if count:
>>return obj
>> 
>> since usually methods like allocate_obj() is called from within another
>> transaction, we can't simply put transaction under 'for' loop to fix the
>> issue.
> 
> Exactly. The above code, from here:
> 
> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/helpers.py#L98
> 
> has no chance of working at all under the existing default isolation levels 
> for either MySQL or PostgreSQL. If another session updates the same row in 
> between the time the first session began and the UPDATE statement in the 
> first session starts, then the first session will return 0 rows affected. It 
> will continue to return 0 rows affected for each loop, as long as the same 
> transaction/session is still in effect, which in the code above, is the case.

oh, because it stays a zero, right.  yeah I didn’t understand that that was the 
failure case before.  should have just pinged you on IRC to answer the question 
without me wasting everyone’s time! :)

> 
> The design of the Neutron plugin code's interaction with the SQLAlchemy 
> session object is the main problem here. Instead of doing all of this within 
> a single transactional container, the code should instead be changed to 
> perform the SELECT statements in separate transactions/sessions.
> 
> That means not using the session parameter supplied to the 
> neutron.plugins.ml2.drivers.helpers.TypeDriverHelper.allocate_partially_specified_segment()
>  method, and instead performing the SQL statements in separate transactions.
> 
> Mike Bayer's EngineFacade blueprint work should hopefully unclutter the 
> current passing of a session object everywhere, but until that hits, it 
> should be easy enough to simply ensure that you don't use the same session 
> object over and over again, instead of changing the isolation level.

OK but EngineFacade was all about unifying broken-up transactions into one big 
transaction.   I’ve never been partial to the “retry something inside of a 
transaction” approach i usually prefer to have the API method raise and retry 
it’s whole series of operations all over again.  How do you propose 
EngineFacade’s transaction-unifying behavior with 
separate-transaction-per-SELECT (and wouldn’t that need to include the UPDATE 
as well? )  Did you see it as having the “one main transaction” with separate 
“ad-hoc, out of band” transactions as needed?




> 
> All the best,
> -jay
> 
>> Your feedback is appreciated.
>> 
>> Thanks,
>> Eugene.
>> 
>> 
>> 
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

> On Nov 19, 2014, at 3:47 PM, Ryan Moats  wrote:
> 
> > 
> BTW, I view your examples from oslo as helping make my argument for
> me (and I don't think that was your intent :) )
> 

I disagree with that as IMHO the differences in producing MM in the app layer 
against arbitrary backends (Postgresql vs. DB2 vs. MariaDB vs. ???)  will incur 
a lot more “bifurcation” than a system that targets only a handful of existing 
MM solutions.  The example I referred to in oslo.db is dealing with distinct, 
non MM backends.   That level of DB-specific code and more is a given if we are 
building a MM system against multiple backends generically.

It’s not possible to say which approach would be better or worse at the level 
of “how much database specific application logic do we need”, though in my 
experience, no matter what one is trying to do, the answer is always, “tons”; 
we’re dealing not just with databases but also Python drivers that have a vast 
amount of differences in behaviors, at every level.On top of all of that, 
hand-rolled MM adds just that much more application code to be developed and 
maintained, which also claims it will do a better job than mature (ish?) 
database systems designed to do the same job against a specific backend.



> 
> > > My reason for asking this question here is that if the community 
> > > wants to consider #2, then these problems are the place to start 
> > > crafting that solution - if we solve the conflicts inherent with the
> > > two conncurrent thread scenarios, then I think we will find that 
> > > we've solved the multi-master problem essentially "for free”.
> >  
> > Maybe I’m missing something, if we learn how to write out a row such
> > that a concurrent transaction against the same row doesn’t throw us 
> > off, where is the part where that data is replicated to databases 
> > running concurrently on other IP numbers in a way that is atomic 
> > come out of that effort “for free” ?   A home-rolled “multi master” 
> > scenario would have to start with a system that has multiple 
> > create_engine() calls, since we need to communicate directly to 
> > multiple database servers. From there it gets really crazy.  Where’sall 
> > that ?
> 
> Boiled down, what you are talking about here w.r.t. concurrent
> transactions is really conflict resolution, which is the hardest
> part of implementing multi-master (as a side note, using locking in
> this case is the equivalent of option #1).  
> 
> All I wished to point out is that there are other ways to solve the
> conflict resolution that could then be leveraged into a multi-master
> scenario.
> 
> As for the parts that I glossed over, once conflict resolution is
> separated out, replication turns into a much simpler problem with
> well understood patterns and so I view that part as coming
> "for free."
> 
> Ryan
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

> On Nov 19, 2014, at 4:14 PM, Clint Byrum  wrote:
> 
> 
> One simply cannot rely on multi-statement transactions to always succeed.

agree, but the thing you want is that the transaction either succeeds or 
explicitly fails, the latter hopefully in such a way that a retry can be added 
which has a chance at succeeding, if needed.  We have transaction replay logic 
in place in nova for example based on known failure conditions like concurrency 
exceptions, and this replay logic works, because it starts a new transaction.   
In this specific case, since it’s looping within a transaction where the data 
won’t change, it’ll never succeed, and the retry mechanism is useless.   But 
the isolation mode change won’t really help here as pointed out by Jay; 
discrete transactions have to be used instead.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-21 Thread Mike Bayer
w.
>> 
>> 
>>What this means is that for this *very particular* case, setting the
>>transaction isolation level to READ COMMITTTED will work presumably
>>most of the time on MySQL, but it's not an appropriate solution for
>>the generalized problem domain of the SELECT FOR UPDATE. If you need
>>to issue a SELECT and an UPDATE in a retry loop, and you are
>>attempting to update the same row or rows (for instance, in the
>>quota reservation or resource allocation scenarios), this solution
>>will not work, even with READ COMMITTED. This is why I say it's not
>>really appropriate, and a better general solution is to use separate
>>transactions for each loop in the retry mechanic.
>> 
>> By saying 'this solution will not work' what issues do you mean what
>> exactly?
>> Btw, I agree on using separate transaction for each loop, the problem is
>> that transaction is usually not 'local' to the method where the retry
>> loop resides.
>> 
>> 
>> 
>>The issue is about doing the retry within a single transaction.
>>That's not what I recommend doing. I recommend instead doing short
>>separate transactions instead of long-lived, multi-statement
>>transactions and relying on the behaviour of the DB's isolation
>>level (default or otherwise) to "solve" the problem of reading
>>changes to a record that you intend to update.
>> 
>> " instead of long-lived, multi-statement transactions" - that's really
>> what would require quite large code redesign.
>> So far finding a way to bring retry logic upper to the stack of nesting
>> transactions seems more appropriate.
>> 
>> Thanks,
>> Eugene.
>> 
>> 
>>Cheers,
>>-jay
>> 
>>Also, thanks Clint for clarification about example scenario
>>described by
>>Mike Bayer.
>>Initially the issue was discovered with concurrent tests on
>>multi master
>>environment with galera as a DB backend.
>> 
>>Thanks,
>>Eugene
>> 
>>On Thu, Nov 20, 2014 at 12:20 AM, Mike Bayer ><mailto:mba...@redhat.com>
>><mailto:mba...@redhat.com <mailto:mba...@redhat.com>>> wrote:
>> 
>> 
>> On Nov 19, 2014, at 3:47 PM, Ryan Moats
>>mailto:rmo...@us.ibm.com>
>> <mailto:rmo...@us.ibm.com <mailto:rmo...@us.ibm.com>>>
>>wrote:
>> 
>> >
>> BTW, I view your examples from oslo as helping make my
>>argument for
>> me (and I don't think that was your intent :) )
>> 
>> 
>> I disagree with that as IMHO the differences in producing
>>MM in the
>> app layer against arbitrary backends (Postgresql vs. DB2
>>vs. MariaDB
>> vs. ???)  will incur a lot more “bifurcation” than a system
>>that
>> targets only a handful of existing MM solutions.  The example I
>> referred to in oslo.db is dealing with distinct, non MM
>>backends.
>> That level of DB-specific code and more is a given if we are
>> building a MM system against multiple backends generically.
>> 
>> It’s not possible to say which approach would be better or
>>worse at
>> the level of “how much database specific application logic
>>do we
>> need”, though in my experience, no matter what one is
>>trying to do,
>> the answer is always, “tons”; we’re dealing not just with
>>databases
>> but also Python drivers that have a vast amount of
>>differences in
>> behaviors, at every level.On top of all of that,
>>hand-rolled MM
>> adds just that much more application code to be developed and
>> maintained, which also claims it will do a better job than
>>mature
>> (ish?) database systems designed to do the same job against a
>> specific backend.
>> 
>> 
>> 
>> 
>> > > My reason for asking this question here is that if
>>the community
>> > > wants to consider #2, then these problems are the
>>place to start
>> > > crafting

[openstack-dev] Alembic 0.7.0 - hitting Pypi potentially Sunday night

2014-11-21 Thread Mike Bayer
Hi all -

I’d like to announce / give a prominent heads up that Alembic 0.7.0 is ready 
for release.  Over the past several weeks I’ve completed the initial 
implementations for two critical features, SQLite migrations and multiple 
branch/ merge support, as well as merged over a dozen bug fixes and smaller 
features.This release is probably the most dramatic set of changes Alembic 
has had since some early refactorings long before Openstack was making use of 
it, and as such I think it’s appropriate to make sure that this is anticipated.

Given that I’ve seen some grumbling about other projects recently releasing on 
weekends, assuming no new issues are found, I hope to release Alembic 0.7.0 on 
Sunday night or Monday.I have run a subset of Openstack related tests as is 
part of my usual continuous environment, including all of oslo.db (there’s one 
test fix for oslo.db pending) as well as Neutron’s 
“neutron.tests.unit.db.test_migration” which if I understand it correctly 
exercises Alembic considerably.

Version 0.7.0 includes a lot of tweaks and enhancements in the usual 
“autogenerate” category as well as to some operational directives, but most 
prominently includes multiple branch mode, a series of new commands and some 
changes to existing commands, as well as “batch mode” which is geared towards 
SQLite. The SQLite batch mode enhancement exists as a new operational 
directive and series of behaviors that are optional, so code which doesn’t make 
explicit use of it will not be exposed to the new code.  However the multiple 
branch/merge support involves a fundamental rewrite of the versioning system 
from the ground up.  It was implemented in such a way that there should be no 
backwards-compatibility problems of any kind - existing environments, script 
templates, migration scripts, and alembic_version tables will continue to work 
in the exact same manner as they did before, in the absence of using the newer 
branch features which introduces some mild changes to the workflow when used.   
But the system does run on a new set of algorithms for which the currently 
supported use case of a “single linear stream of revisions” is the so-called 
“degenerate” case (an odd term for me to use as I am so not a math person).   

The release does not yet include two new features that I’d like to get in a 
subsequent release, maybe even 0.7.1 if things go smoothly, including Ann 
Kamyshnikova’s foreign key constraint autogeneration feature and a separate 
autogeneration feature for primary key constraints.

To those out there wondering what steps to take, here they are:

1. read about the new features, particularly the branch support, and please let 
me know of any red flags/concerns you might have over the coming 
implementation, at 
http://alembic.readthedocs.org/en/latest/tutorial.html#running-batch-migrations-for-sqlite-and-other-databases
 and 
http://alembic.readthedocs.org/en/latest/tutorial.html#working-with-branches.

2. if your project uses Alembic already (I know Neutron does but I’m not sure 
who else yet), fire up a tox environment and install Alembic from master at 
https://github.com/zzzeek/alembic/, run the tests and please alert me to any 
breakages.

3. Keep a lookout for the release, and

4. don’t panic!   I’ve really tried to test this to a huge extent and if there 
are problems, I can fix them quickly.

thanks all for reading!

0.7 changelog: 
http://alembic.readthedocs.org/en/latest/changelog.html#change-0.7.0



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Alembic 0.7.0 - hitting Pypi potentially Sunday night

2014-11-21 Thread Mike Bayer

> On Nov 21, 2014, at 7:35 PM, Kevin Benton  wrote:
> 
> This is great! I'm not sure if you have been following some of the discussion 
> about the separation of vendor drivers in Neutron, but one of the things we 
> decided was to leave the vendor data models in the main repo so we have a 
> nice linear migration.
> It looks like branching support may solve our problem. However, looking 
> through the docs I didn't notice anything about where the migration 
> definitions need to live. Can migrations be sourced from multiple locations 
> easily?

that’s another TODO, which is  
https://bitbucket.org/zzzeek/alembic/issue/124/multiple-versions-directories 
. 
  Skip down to the bottom comments as for the longest time I wasn’t really 
getting how this would work b.c. the multiple branch thing wasn’t in place.

Without that issue and even without the branching thing, you can, right now, 
have multiple alembic environments entirely separately, sharing only the 
alembic.ini file, in which you refer to each environment by name, which is what 
I was getting at in this issue up earlier.   Within one database they can have 
separate alembic_version tables using version_table_name.  That means that each 
environment has its own set of revisions entirely independent of each other, so 
if there are cross dependencies between these streams, then this approach 
doesn’t work.

With multiple branches in one revision stream as is provided with the new 
feature, now you have the ability to have cross-dependencies between these 
streams, if needed.   But that means they need to share the env.py since that’s 
the invocation for them, and needs to be able to run any of those scripts.

Having a single env.py as we do now, and just being able to put the actual 
revision files in multiple places, is very easy, even for Sunday, if we say 
that we can stick with the single base script_directory, one env.py and one 
home base, and then just load the actual revision files from multiple places.   
This would be trivial.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Alembic 0.7.0 - hitting Pypi potentially Sunday night

2014-11-22 Thread Mike Bayer

> On Nov 21, 2014, at 8:07 PM, Mike Bayer  wrote:
> 
> 
>> On Nov 21, 2014, at 7:35 PM, Kevin Benton > <mailto:blak...@gmail.com>> wrote:
>> 
>> This is great! I'm not sure if you have been following some of the 
>> discussion about the separation of vendor drivers in Neutron, but one of the 
>> things we decided was to leave the vendor data models in the main repo so we 
>> have a nice linear migration.
>> It looks like branching support may solve our problem. However, looking 
>> through the docs I didn't notice anything about where the migration 
>> definitions need to live. Can migrations be sourced from multiple locations 
>> easily?
> 
> that’s another TODO, which is  
> https://bitbucket.org/zzzeek/alembic/issue/124/multiple-versions-directories 
> <https://bitbucket.org/zzzeek/alembic/issue/124/multiple-versions-directories>.
>Skip down to the bottom comments as for the longest time I wasn’t really 
> getting how this would work b.c. the multiple branch thing wasn’t in place.

OK, sleeping on this I thought more deeply about the “multiple, 
semi-independent roots” case and thought of a few more glitches, so I got fixes 
in for those today and I implemented #124 as well.  Another new concept, 
“depends_on”, is added to the system to accommodate the case where independent 
roots need to refer to each other as dependencies, but not as “merge points”.   
I think this might work really well, so far it seems that way.

As the one-page docs have gotten super-long I’ve broken them out; a revised 
section that goes into how to use multiple roots and file directories now 
starts at 
http://alembic.readthedocs.org/en/latest/branches.html#working-with-multiple-bases
 
<http://alembic.readthedocs.org/en/latest/branches.html#working-with-multiple-bases>.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-23 Thread Mike Bayer

> On Nov 23, 2014, at 6:13 PM, Robert Collins  wrote:
> 
> 
> So - the technical bits of the plan sound fine.

> 
> On WSGI - if we're in an asyncio world,

*looks around*, we are?   when did that happen?Assuming we’re talking 
explicit async. Rewriting all our code as verbose, “inside out” code, vast 
library incompatibility, and…some notion of “correctness” that somehow is 
supposed to be appropriate for a high level scripting language and can’t be 
achieved though simple, automated means such as gevent.

> I don't think WSGI has any
> relevance today -

if you want async + wsgi, use gevent.wsgi.   It is of course not explicit 
async but if the whole world decides that we all have to explicitly turn all of 
our code inside out to appease the concept of “oh no, IO IS ABOUT TO HAPPEN! 
ARE WE READY! ”,  I am definitely quitting programming to become a cheese 
maker.   If you’re writing some high performance TCP server thing, fine 
(…but... why are you writing a high performance server in Python and not 
something more appropriate like Go?).  If we’re dealing with message queues as 
I know this thread is about, fine.

But if you’re writing “receive a request, load some data, change some of it 
around, store it again, and return a result”, I don’t see why this has to be 
intentionally complicated.   Use implicit async that can interact with the 
explicit async messaging stuff appropriately.   That’s purportedly one of the 
goals of asyncIO (which Nick Coghlan had to lobby pretty hard for; source: 
http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html#gevent-and-pep-3156
  ).

> it has no async programming model.

neither do a *lot* of things, including all traditional ORMs.I’m fine with 
Ceilometer dropping SQLAlchemy support as they prefer MongoDB and their 
relational database code is fairly wanting.   Per 
http://aiogreen.readthedocs.org/openstack.html, I’m not sure how else they will 
drop eventlet support throughout the entire app.   


> While is has
> incremental apis and supports generators, thats not close enough to
> the same thing: so we're going to have to port our glue code to
> whatever container we end up with. As you know I'm pushing on a revamp
> of WSGI right now, and I'd be delighted to help put together a
> WSGI-for-asyncio PEP, but I think its best thought of as a separate
> thing to WSGI per se.

given the push for explicit async, seems like lots of effort will need to be 
spent on this. 

> It might be a profile of WSGI2 though, since
> there is quite some interest in truely async models.
> 
> However I've a bigger picture concern. OpenStack only relatively
> recently switched away from an explicit async model (Twisted) to
> eventlet.

hooray.   efficient database access for explicit async code would be impossible 
otherwise as there are no explicit async APIs to MySQL, and only one for 
Postgresql which is extremely difficult to support.

> 
> I'm worried that this is switching back to something we switched away
> from (in that Twisted and asyncio have much more in common than either
> Twisted and eventlet w/magic, or asyncio and eventlet w/magic).

In the C programming world, when you want to do something as simple as create a 
list of records, it’s not so simple: you have to explicitly declare memory 
using malloc(), and organize your program skillfully and carefully such that 
this memory is ultimately freed using free().   It’s tedious and error prone.   
So in the scripting language world, these tedious, low level and entirely 
predictable steps are automated away for us; memory is declared automatically, 
and freed automatically.  Even reference cycles are cleaned out for us without 
us even being aware.  This is why we use “scripting languages” - they are 
intentionally automated to speed the pace of development and produce code that 
is far less verbose than low-level C code and much less prone to low-level 
errors, albeit considerably less efficient.   It’s the payoff we make; 
predictable bookkeeping of the system’s resources are automated away.
There’s a price; the Python interpreter uses a ton of memory and tends to not 
free memory once large chunks of it have been used by the application.   The 
implicit allocation and freeing of memory has a huge tradeoff, in that the 
Python interpreter uses lots of memory pretty quickly.  However, this tradeoff, 
Python’s clearly inefficient use of memory because it’s automating the 
management of it away for us, is one which nobody seems to mind at all.   

But when it comes to IO, the implicit allocation of IO and deferment of 
execution done by gevent has no side effect anywhere near as harmful as the 
Python interpreter’s huge memory consumption.  Yet we are so afraid of it, so 
frightened that our code…written in a *high level scripting language*, might 
not be “correct”.  We might not know that IO is about to happen!   How is this 
different from the much more tangible and day-

Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-23 Thread Mike Bayer

> On Nov 23, 2014, at 6:35 PM, Donald Stufft  wrote:
> 
> 
> For whatever it’s worth, I find explicit async io to be _way_ easier to
> understand for the same reason I find threaded code to be a rats nest.

web applications aren’t explicitly “threaded”.   You get a request, load some 
data, manipulate it, and return a response.   There are no threads to reason 
about, nothing is explicitly shared in any way.

> 
> The co-routine style of asyncio (or Twisted’s inlineCallbacks) solves
> almost all of the problems that I think most people have with explicit
> asyncio (namely the callback hell) while still getting the benefits.

coroutines are still “inside out” and still have all the issues discussed in 
http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html
 which I also refer to in 
http://stackoverflow.com/questions/16491564/how-to-make-sqlalchemy-in-tornado-to-be-async/16503103#16503103.

> 
> Glyph wrote a good post that mirrors my opinions on implicit vs explicit
> here: https://glyph.twistedmatrix.com/2014/02/unyielding.html.

this is the post that most makes me think about the garbage collector analogy, 
re: “gevent works perfectly fine, but sorry, it just isn’t “correct”.  It 
should be feared! ”.   Unfortunately Glyph has orders of magnitude more 
intellectual capabilities than I do, so I am ultimately not an effective 
advocate for my position; hence I have my fallback career as a cheese maker 
lined up for when the async agenda finally takes over all computer programming.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-23 Thread Mike Bayer

> On Nov 23, 2014, at 7:30 PM, Donald Stufft  wrote:
> 
> 
>> On Nov 23, 2014, at 7:21 PM, Mike Bayer  wrote:
>> 
>> Given that, I’ve yet to understand why a system that implicitly defers CPU 
>> use when a routine encounters IO, deferring to other routines, is relegated 
>> to the realm of “magic”.   Is Python reference counting and garbage 
>> collection “magic”?How can I be sure that my program is only declaring 
>> memory, only as much as I expect, and then freeing it only when I absolutely 
>> say so, the way async advocates seem to be about IO?   Why would a high 
>> level scripting language enforce this level of low-level bookkeeping of IO 
>> calls as explicit, when it is 100% predictable and automatable ?
> 
> The difference is that in the many years of Python programming I’ve had to 
> think about garbage collection all of once. I’ve yet to write a non trivial 
> implicit IO application where the implicit context switch didn’t break 
> something and I had to think about adding explicit locks around things.

that’s your personal experience, how is that an argument?  I deal with the 
Python garbage collector, memory management, etc. *all the time*.   I have a 
whole test suite dedicated to ensuring that SQLAlchemy constructs tear 
themselves down appropriately in the face of gc and such: 
https://github.com/zzzeek/sqlalchemy/blob/master/test/aaa_profiling/test_memusage.py
 .   This is the product of tons of different observed and reported issues 
about this operation or that operation forming constructs that would take up 
too much memory, wouldn’t be garbage collected when expected, etc.  

Yet somehow I still value very much the work that implicit GC does for me and I 
understand well when it is going to happen.  I don’t decide that that whole 
world should be forced to never have GC again.  I’m sure you wouldn’t be happy 
if I got Guido to drop garbage collection from Python because I showed how 
sometimes it makes my life more difficult, therefore we should all be managing 
memory explicitly.

I’m sure my agenda here is pretty transparent.  If explicit async becomes the 
only way to go, SQLAlchemy basically closes down.   I’d have to rewrite it 
completely (after waiting for all the DBAPIs that don’t exist to be written, 
why doesn’t anyone ever seem to be concerned about that?) , and it would run 
much less efficiently due to the massive amount of additional function call 
overhead incurred by the explicit coroutines.   It’s a pointless amount of 
verbosity within a scripting language.  

> 
> Really that’s what it comes down to. Either you need to enable explicit 
> context switches (via callbacks or yielding, or whatever) or you need to add 
> explicit locks. Neither solution allows you to pretend that context switching 
> isn’t going to happen nor prevents you from having to deal with it. The 
> reason I prefer explicit async is because the failure mode is better (if I 
> forget to yield I don’t get the actual value so my thing blows up in 
> development) and it ironically works more like blocking programming because I 
> won’t get an implicit context switch in the middle of a function. Compare 
> that to the implicit async where the failure mode is that at runtime 
> something weird happens.
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-23 Thread Mike Bayer

> On Nov 23, 2014, at 8:23 PM, Donald Stufft  wrote:
> 
> I don’t really take performance issues that seriously for CPython. If you 
> care about performance you should be using PyPy. I like that argument though 
> because the same argument is used against the GCs which you like to use as an 
> example too.
> 
> The verbosity isn’t really pointless, you have to be verbose in either 
> situation, either explicit locks or explicit context switches. If you don’t 
> have explicit locks you just have buggy software instead.

Funny thing is that relational databases will lock on things whether or not the 
calling code is using an async system.  Locks are a necessary thing in many 
cases.  That lock-based concurrency code can’t be mathematically proven bug 
free doesn’t detract from its vast usefulness in situations that are not 
aeronautics or medical devices.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-24 Thread Mike Bayer

> On Nov 23, 2014, at 9:24 PM, Donald Stufft  wrote:
> 
> 
> There’s a long history of implicit context switches causing buggy software 
> that breaks. As far as I can tell the only downsides to explicit context 
> switches that don’t stem from an inferior interpreter seem to be “some 
> particular API in my head isn’t as easy with it” and “I have to type more 
> letters”. The first one I’d just say that constraints make the system and 
> that there are lots of APIs which aren’t really possible or easy in Python 
> because of one design decision or another. For the second one I’d say that 
> Python isn’t a language which attempts to make code shorter, just easier to 
> understand what is going to happen when.
> 
> Throwing out hyperboles like “mathematically proven” isn’t a particular 
> valuable statement. It is *easier* to reason about what’s going to happen 
> with explicit context switches. Maybe you’re a better programmer than I am 
> and you’re able to keep in your head every place that might do an implicit 
> context switch in an implicit setup and you can look at a function and go “ah 
> yup, things are going to switch here and here”. I certainly can’t. I like my 
> software to maximize the ability to locally reason about a particular chunk 
> of code.

But this is a false choice.  There is a third way.  It is, use explicit async 
for those parts of an application where it is appropriate; when dealing with 
message queues and things where jobs and messages are sent off for any amount 
of time to come back at some indeterminate point later, all of us would 
absolutely benefit from an explicit model w/ coroutines.  If I was trying to 
write code that had to send off messages and then had to wait, but still has 
many more messages to send off, so that without async I’d need to be writing 
thread pools and all that, absolutely, async is a great programming model.

But when the code digs into functions that are oriented around business logic, 
functions that within themselves are doing nothing concurrency-wise against 
anything else within them, and merely need to run step 1, 2, and 3, 
that don’t deal with messaging and instead talk to a single relational database 
connection, where explicit async would mean that a single business logic method 
would need to be exploded with literally many dozens of yields in it (with a 
real async DBAPI; every connection, every execute, every cursor close, every 
transaction start, every transaction end, etc.), it is completely cumbersome 
and unnecessary.  These methods should run in an implicit async context. 

To that degree, the resistance that explicit async advocates have to the 
concept that both approaches should be switchable, and that one may be more 
appropriate than the other in difference cases, remains confusing to me.   We 
from the threading camp are asked to accept that *all* of our programming 
models must change completely, but our suggestion that both models be 
integrated is met with, “well that’s wrong, because in my experience (doing 
this specific kind of programming), your model *never* works”.   




> 
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-24 Thread Mike Bayer

> On Nov 24, 2014, at 9:23 AM, Adam Young  wrote:
> 
> 
> 
> For pieces such as the Nova compute that talk almost exclusively on the 
> Queue, we should work to remove Monkey patching and use a clear programming 
> model.  If we can do that within the context of Eventlet, great.  If we need 
> to replace Eventlet with a different model, it will be painful, but should be 
> done.  What is most important is that we avoid doing hacks like we've had to 
> do with calls to Memcached and monkeypatching threading.

Nova compute does a lot of relational database access and I’ve yet to see an 
explicit-async-compatible DBAPI other than psycopg2’s and Twisted abdbapi.   
Twisted adbapi appears just to throw regular DBAPIs into a thread pool in any 
case (see 
http://twistedmatrix.com/trac/browser/trunk/twisted/enterprise/adbapi.py), so 
given that awkwardness and lack of real async, if eventlet is dropped it would 
be best to use a thread pool for database-related methods directly.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

2014-11-24 Thread Mike Bayer

> On Nov 24, 2014, at 12:40 PM, Doug Hellmann  wrote:
> 
> 
> This is a good point. I’m not sure we can say “we’ll only use 
> explicit/implicit async in certain cases" because most of our apps actually 
> mix the cases. We have WSGI apps that send RPC messages and we have other 
> apps that receive RPC messages and operate on the database. Can we mix 
> explicit and implicit operating models, or are we going to have to pick one 
> way? If we have to pick one, the implicit model we’re currently using seems 
> more compatible with all of the various libraries and services we depend on, 
> but maybe I’m wrong?

IMHO, in the ideal case, a single method shouldn’t be mixing calls to a set of 
database objects as well as calls to RPC APIs at the same time, there should be 
some kind of method boundary to cross.   There’s a lot of ways to achieve that.

What is really needed is some way that code can switch between explicit yields 
and implicit IO on a per-function basis.   Like a decorator for one or the 
other.

The approach that Twisted takes of just using thread pools for those IO-bound 
elements that aren’t compatible with explicit yields is one way to do this. 
This might be the best way to go, if there are in fact issues with mixing in 
implicit async systems like eventlet.  I can imagine, vaguely, that the 
eventlet approach of monkey patching might get in the way of things in this 
more complicated setup.

Part of what makes this confusing for me is that there’s a lack of clarity over 
what benefits we’re trying to get from the async work.  If the idea is, the GIL 
is evil so we need to ban the use of all threads, and therefore must use defer 
for all IO, then that includes database IO which means we theoretically benefit 
from eventlet monkeypatching  - in the absence of truly async DBAPIs, this is 
the only way to have deferrable database IO.

If the idea instead is, the code we write that deals with messaging would be 
easier to produce, organize, and understand given an asyncio style approach, 
but otherwise we aren’t terribly concerned what highly sequential code like 
database code has to do, then a thread pool may be fine.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Mike Bayer

> On Nov 24, 2014, at 5:20 PM, Michael Still  wrote:
> 
> Heya,
> 
> Review https://review.openstack.org/#/c/135644/4 proposes the addition
> of a new database for our improved implementation of cells in Nova.
> However, there's an outstanding question about how to handle soft
> delete of rows -- we believe that we need to soft delete for forensic
> purposes.

Everytime I talk to people about the soft delete thing, I hear the usual 
refrain “we thought we needed it, but we didn’t and now it’s just overbuilt 
cruft we want to get rid of”.

Not saying you don’t have a need here but you definitely have this need, not 
just following the herd right?   Soft delete makes things a lot less convenient.

> 
> This is a new database, so its our big chance to get this right. So,
> ideas welcome...
> 
> Some initial proposals:
> 
> - we do what we do in the current nova database -- we have a deleted
> column, and we set it to true when we delete the instance.
> 
> - we have shadow tables and we move delete rows to a shadow table.


Both approaches are viable, but as the soft-delete column is widespread, it 
would be thorny for this new app to use some totally different scheme, unless 
the notion is that all schemes should move to the audit table approach (which I 
wouldn’t mind, but it would be a big job).FTR, the audit table approach is 
usually what I prefer for greenfield development, if all that’s needed is 
forensic capabilities at the database inspection level, and not as much active 
GUI-based “deleted” flags.   That is, if you really don’t need to query the 
history tables very often except when debugging an issue offline.  The reason 
its preferable is because those rows are still “deleted” from your main table, 
and they don’t get in the way of querying.   But if you need to refer to these 
history rows in context of the application, that means you need to get them 
mapped in such a way that they behave like the primary rows, which overall is a 
more difficult approach than just using the soft delete column.

That said, I have a lot of plans to send improvements down the way of the 
existing approach of “soft delete column” into projects, from the querying POV, 
so that criteria to filter out soft delete can be done in a much more robust 
fashion (see 
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
   But this is still more complex and less performant than if the rows are just 
gone totally, off in a history table somewhere (again, provided you really 
don’t need to look at those history rows in an application context, otherwise 
it gets all complicated again).



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Mike Bayer

> On Nov 24, 2014, at 7:32 PM, Michael Still  wrote:
> 
> Interesting. I hadn't seen consistency between the two databases as
> trumping doing this less horribly, but it sounds like its more of a
> thing that I thought.

it really depends on what you need to do.  if you need to get a result set of 
all entities, deleted or not, consider the difference between a SELECT for all 
rows from a single table, easy, vs. doing a UNION from primary table to history 
table, matching up all the columns that hopefully do in fact match up 
(awkward), and then dealing with joining out to related tables if you need that 
as well (very awkward from a UNION).

if you have any plans to consume these rows in the app i’d advise just doing it 
like all the other tables.  if we want to change that approach, we’d do it 
en-masse at some point and you’d get it for free.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-25 Thread Mike Bayer

> On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL  wrote:
> 
> Hi,
> 
> Le 2014-11-24 17:20, Michael Still a écrit :
>> Heya,
>> 
>> This is a new database, so its our big chance to get this right. So,
>> ideas welcome...
>> 
>> Some initial proposals:
>> 
>>  - we do what we do in the current nova database -- we have a deleted
>> column, and we set it to true when we delete the instance.
>> 
>>  - we have shadow tables and we move delete rows to a shadow table.
>> 
>>  - something else super clever I haven't thought of.
> 
> Some random thoughts that came to mind ...
> 
> 1/ as far as I remember, you rarely want to delete a row
> - it's usually a heavy DB operation (well, was back then)
> - it's destructive (but we may want that)
> - it creates fragmentation (less of a problem depending on db engine)
> - it can break foreign key relations if not done the proper way

deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.


> 
> 2/ updating a row to 'deleted=1'
> - gives an opportunity to set a useful deletion time-stamp
> I would even argue that setting the deleted_at field would suffice to declare 
> a row 'deleted' (as in 'not NULL'). I know, "explicit is better than 
> implicit" …

the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).

> - the update operation is not destructive
> - an admin/DBA can decide when and how to purge/archive rows
> 
> 3/ moving the row at deletion
> - you want to avoid additional steps to complete an operation, thus avoid 
> creating a new record while deleting one
> - even if you wrap things into a transaction, not being able to create a row 
> somewhere can make your delete transaction fail
> - if I were to archive all deleted rows, at scale I'd probably move them to 
> another db server altogether

if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.


> Now, I for one would keep the current mark-as-deleted model.
> 
> I however perfectly get the problem of massive churn with instance 
> creation/deletion.

is there?   inserting and updating rows is a normal thing in relational DBs.


> So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
> 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
> feasible ?)

I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.


> This would especially come handy if the admin decides the global cells 
> database may not need to keep track of deleted instances, the cell-local nova 
> database being the absolute reference for that.

why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Mike Bayer

> 
> Precisely. Why is the RDBMS the thing that is used for archival/audit 
> logging? Why not a NoSQL store or a centralized log facility? All that would 
> be needed would be for us to standardize on the format of the archival 
> record, standardize on the things to provide with the archival record (for 
> instance system metadata, etc), and then write a simple module that would 
> write an archival record to some backend data store.
> 
> Then we could rid ourselves of the awfulness of the shadow tables and all of 
> the read_deleted=yes crap.


+1000 - if we’re really looking to “do this right”, as the original message 
suggested, this would be “right”.  If you don’t need these rows in the app (and 
it would be very nice if you didn’t), dump it out to an archive file / 
non-relational datastore.   As mentioned elsewhere, this is entirely acceptable 
for organizations that are “obliged” to store records for auditing purposes.   
Nova even already has a dictionary format for everything set up with nova 
objects, so dumping these dictionaries out as JSON would be the way to go.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] sqlalchemy-migrate call for reviews

2014-11-30 Thread Mike Bayer
I’ve +2’ed it, it was caused by https://review.openstack.org/#/c/81955/.


> On Nov 29, 2014, at 9:54 PM, Davanum Srinivas  wrote:
> 
> Looks like there is a review in the queue -
> https://review.openstack.org/#/c/111485/
> 
> -- dims
> 
> On Sat, Nov 29, 2014 at 6:28 PM, Jeremy Stanley  wrote:
>> To anyone who reviews sqlalchemy-migrate changes, there are people
>> talking to themselves on GitHub about long-overdue bug fixes because
>> the Gerrit review queue for it is sluggish and they apparently don't
>> realize the SQLAM reviewers don't look at Google Code issues[1] and
>> GitHub pull request comments[2].
>> 
>> [1] https://code.google.com/p/sqlalchemy-migrate/issues/detail?id=171
>> [2] https://github.com/stackforge/sqlalchemy-migrate/pull/5
>> 
>> --
>> Jeremy Stanley
>> 
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> -- 
> Davanum Srinivas :: https://twitter.com/dims
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] sqlalchemy-migrate call for reviews

2014-12-01 Thread Mike Bayer

I can +2 whichever patches are needed by Openstack projects, or that are 
critically needed in general, that you can point me towards directly.
Overall I’m not the “maintainer” of sqlalchemy-migrate, I’ve only volunteered 
to have a +2 role for critically needed issues, so in the absence of someone 
willing to take on a real maintainer role (bug triage, etc.), for users on the 
outside of immediate Openstack use cases I’d prefer if they can continue 
working towards moving to Alembic; the major features I’ve introduced in 
Alembic including the SQLite support are intended to make transition much more 
feasible.


> On Dec 1, 2014, at 5:19 AM, Ihar Hrachyshka  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> Indeed, the review queue is non-responsive. There are other patches in
> the queue that bit rot there:
> 
> https://review.openstack.org/#/q/status:open+project:stackforge/sqlalchemy-migrate,n,z
> 
> I guess since no one with a +2 hammer systematically monitors patches
> there, users are on their own and better fork if blocked. Sad but true.
> 
> (btw technically monitoring is not that difficult: gerrit allows to
> subscribe to specific projects, and this one does not look like time
> consuming from reviewer perspective.)
> 
> /Ihar
> 
> On 30/11/14 00:28, Jeremy Stanley wrote:
>> To anyone who reviews sqlalchemy-migrate changes, there are people 
>> talking to themselves on GitHub about long-overdue bug fixes
>> because the Gerrit review queue for it is sluggish and they
>> apparently don't realize the SQLAM reviewers don't look at Google
>> Code issues[1] and GitHub pull request comments[2].
>> 
>> [1]
>> https://code.google.com/p/sqlalchemy-migrate/issues/detail?id=171 
>> [2] https://github.com/stackforge/sqlalchemy-migrate/pull/5
>> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> 
> iQEcBAEBCgAGBQJUfECeAAoJEC5aWaUY1u57cx8H/2d8urszdd3RIsU+3JyrnVg6
> I92WtoCS84HdOEE7DjM5m/tgFGjIp9Gh4lovEft5JYDcnHACfd4gdhUunt+PAvVO
> 2usFuPdR9IJvbKc28FJAqZeXJpvMc0KSMN4j8t1dtgu6Cv4TaFZEN77G6vrV9jem
> b56npPlmpIaDpGP49XtFBHMcbU0pVJ0AQCWUd0wOX+NQl4EfF0stlvxd/1LWn9xf
> rZCzatEqyRItlAB+ATpI0TlGSgvVv0PKqrV+TnoZ4OU/TZINNoCjZELB7NkmfDMz
> 9rJgviCmCHRyWs+VwsbEeGKDI3nBLjX7UEk5K2f93VsZQWYpW3q6Z2rrpmH977Y=
> =gT/r
> -END PGP SIGNATURE-
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [neutron] alembic 0.7.1 will break neutron's "heal" feature which assumes a fixed set of potential autogenerate types

2014-12-01 Thread Mike Bayer
hey neutron -

Just an FYI, I’ve added https://review.openstack.org/#/c/137989/ / 
https://launchpad.net/bugs/1397796 to refer to an issue in neutron’s “heal” 
script that is going to start failing when I put out Alembic 0.7.1, which is 
potentially later today / this week.

The issue is pretty straightforward,  Alembic 0.7.1 is adding foreign key 
autogenerate (and really, could add more types of autogenerate at any time), 
and as these new commands are revealed within the execute_alembic_command(), 
they are not accounted for, so it fails.   I’d recommend folks try to push this 
one through or otherwise decide how this issue (which should be expected to 
occur many more times) should be handled.

Just a heads up in case you start seeing builds failing!

- mike



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] alembic 0.7.1 will break neutron's "heal" feature which assumes a fixed set of potential autogenerate types

2014-12-03 Thread Mike Bayer
So folks, I had to put Alembic 0.7.1 out as I realized that the “batch” mode 
was being turned on for autogenerate across the board in 0.7.0, and that was 
not the plan.

So it is now out, and the builds are failing due to 
https://launchpad.net/bugs/1397796 <https://launchpad.net/bugs/1397796>.

There’s some nits happening on the review  
https://review.openstack.org/#/c/137989/ 
<https://review.openstack.org/#/c/137989/>, so I’m hoping someone with some 
Neutron cred adjust the patch to their liking and get it merged.I’m just 
the messenger on this. 

- mike



> On Dec 1, 2014, at 5:43 PM, Salvatore Orlando  wrote:
> 
> Thanks Mike!
> 
> I've left some comments on the patch.
> Just out of curiosity, since now alembic can autogenerate foreign keys, are 
> we be able to remove the logic for identifying foreign keys to add/remove [1]?
> 
> Salvatore
> 
> [1] 
> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/migration/alembic_migrations/heal_script.py#n205
>  
> <http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/migration/alembic_migrations/heal_script.py#n205>
>  
> 
> On 1 December 2014 at 20:35, Mike Bayer  <mailto:mba...@redhat.com>> wrote:
> hey neutron -
> 
> Just an FYI, I’ve added https://review.openstack.org/#/c/137989/ 
> <https://review.openstack.org/#/c/137989/> / 
> https://launchpad.net/bugs/1397796 <https://launchpad.net/bugs/1397796> to 
> refer to an issue in neutron’s “heal” script that is going to start failing 
> when I put out Alembic 0.7.1, which is potentially later today / this week.
> 
> The issue is pretty straightforward,  Alembic 0.7.1 is adding foreign key 
> autogenerate (and really, could add more types of autogenerate at any time), 
> and as these new commands are revealed within the execute_alembic_command(), 
> they are not accounted for, so it fails.   I’d recommend folks try to push 
> this one through or otherwise decide how this issue (which should be expected 
> to occur many more times) should be handled.
> 
> Just a heads up in case you start seeing builds failing!
> 
> - mike
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org <mailto:OpenStack-dev@lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] sqlalchemy-migrate vs alembic for new database

2014-12-05 Thread Mike Bayer

> On Dec 5, 2014, at 3:14 PM, Matt Riedemann  wrote:
> 
> 
> 
> On 12/5/2014 1:45 PM, Andrew Laski wrote:
>> The cells v2 effort is going to be introducing a new database into
>> Nova.  This has been an opportunity to rethink and approach a few things
>> differently, including how we should handle migrations. There have been
>> discussions for a long time now about switching over to alembic for
>> migrations so I want to ask, should we start using alembic from the
>> start for this new database?
>> 
>> The question was first raised by Dan Smith on
>> https://review.openstack.org/#/c/135424/
>> 
>> I do have some concern about having two databases managed in two
>> different ways, but if the details are well hidden behind a nova-manage
>> command I'm not sure it will actually matter in practice.
>> 
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> I don't have experience with Alembic but I'd think we should use Alembic for 
> the new database unless there is a compelling reason not to. Maybe we need 
> Mike Bayer (or other oslo.db people) to give us an idea of what kinds of 
> problems we might have with managing two databases with two different 
> migration schemes.
> 
> But the last part you said is key for me, if we can abstract it well then 
> hopefully it's not very painful.

sqlalchemy-migrate doesn’t really have a dedicated maintainer anymore, AFAICT.  
It’s pretty much on stackforge life support.   So while the issue of merging 
together a project with migrate and alembic at the same time seems to be 
something for which there are some complexity and some competing ideas (I have 
one that’s pretty fancy, but I haven’t spec’ed or implemented it yet, so for 
now there are “wrappers” that run both), it sort of has to happen regardless.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] [ha] potential issue with implicit async-compatible mysql drivers

2014-12-05 Thread Mike Bayer
Hey list -

I’m posting this here just to get some ideas on what might be happening here, 
as it may or may not have some impact on Openstack if and when we move to MySQL 
drivers that are async-patchable, like MySQL-connector or PyMySQL.  I had a 
user post this issue a few days ago which I’ve since distilled into test cases 
for PyMySQL and MySQL-connector separately.   It uses gevent, not eventlet, so 
I’m not really sure if this applies.  But there’s plenty of very smart people 
here so if anyone can shed some light on what is actually happening here, that 
would help.

The program essentially illustrates code that performs several steps upon a 
connection, however if the greenlet is suddenly killed, the state from the 
connection, while damaged, is still being allowed to continue on in some way, 
and what’s super-catastrophic here is that you see a transaction actually being 
committed *without* all the statements proceeding on it. 

In my work with MySQL drivers, I’ve noted for years that they are all very, 
very bad at dealing with concurrency-related issues.  The whole “MySQL has gone 
away” and “commands out of sync” errors are ones that we’ve all just drowned 
in, and so often these are due to the driver getting mixed up due to concurrent 
use of a connection.  However this one seems more insidious.   Though at the 
same time, the script has some complexity happening (like a simplistic 
connection pool) and I’m not really sure where the core of the issue lies.

The script is at https://gist.github.com/zzzeek/d196fa91c40cb515365e and also 
below.  If you run it for a few seconds, go over to your MySQL command line and 
run this query:

SELECT * FROM table_b WHERE a_id not in (SELECT id FROM table_a) ORDER BY a_id 
DESC;

and what you’ll see is tons of rows in table_b where the “a_id” is zero 
(because cursor.lastrowid fails), but the *rows are committed*.   If you read 
the segment of code that does this, it should be impossible:

connection = pool.get()
rowid = execute_sql(
connection,
"INSERT INTO table_a (data) VALUES (%s)", ("a",)
)

gevent.sleep(random.random() * 0.2)
 
try:
execute_sql(
connection,
"INSERT INTO table_b (a_id, data) VALUES (%s, %s)",
(rowid, "b",)
)
 
connection.commit()
 
pool.return_conn(connection) 

except Exception:
connection.rollback()
pool.return_conn(connection)

so if the gevent.sleep() throws a timeout error, somehow we are getting thrown 
back in there, with the connection in an invalid state, but not invalid enough 
to commit.

If a simple check for “SELECT connection_id()” is added, this query fails and 
the whole issue is prevented.  Additionally, if you put a foreign key 
constraint on that b_table.a_id, then the issue is prevented, and you see that 
the constraint violation is happening all over the place within the commit() 
call.   The connection is being used such that its state just started after the 
gevent.sleep() call.  

Now, there’s also a very rudimental connection pool here.   That is also part 
of what’s going on.  If i try to run without the pool, the whole script just 
runs out of connections, fast, which suggests that this gevent timeout cleans 
itself up very, very badly.   However, SQLAlchemy’s pool works a lot like this 
one, so if folks here can tell me if the connection pool is doing something 
bad, then that’s key, because I need to make a comparable change in 
SQLAlchemy’s pool.   Otherwise I worry our eventlet use could have big problems 
under high load.





# -*- coding: utf-8 -*-
import gevent.monkey
gevent.monkey.patch_all()

import collections
import threading
import time
import random
import sys

import logging
logging.basicConfig()
log = logging.getLogger('foo')
log.setLevel(logging.DEBUG)

#import pymysql as dbapi
from mysql import connector as dbapi


class SimplePool(object):
def __init__(self):
self.checkedin = collections.deque([
self._connect() for i in range(50)
])
self.checkout_lock = threading.Lock()
self.checkin_lock = threading.Lock()

def _connect(self):
return dbapi.connect(
user="scott", passwd="tiger",
host="localhost", db="test")

def get(self):
with self.checkout_lock:
while not self.checkedin:
time.sleep(.1)
return self.checkedin.pop()

def return_conn(self, conn):
try:
conn.rollback()
except:
log.error("Exception during rollback", exc_info=True)
try:
conn.close()
except:
log.error("Exception during close", exc_info=True)

# recycle to a new connection
conn = self._connect()
with s

[openstack-dev] Mike Bayer 20141205

2014-12-05 Thread Mike Bayer
1. Alembic release - I worked through some regressions introduced by Alembic 
0.7.0 and the subsequent 0.7.1 with the Neutron folks.  This started on Monday 
with https://review.openstack.org/#/c/137989/, and by Wednesday I had 
identified enough small regressions in 0.7.0 that I had to put 0.7.1 out, so 
that review got expedited with https://review.openstack.org/#/c/138998/ 
following from Neutron devs to continue fixing.   Version 0.7.1 includes the 
foreign key autogenerate support first proposed by Ann Kamyshnikova.  Changelog 
at http://alembic.readthedocs.org/en/latest/changelog.html#change-0.7.1.

2. MySQL driver stuff.   I have a SQLAlchemy user who is running some kind of 
heavy load with gevent and PyMySQL.  While this user is not openstack-specific, 
the thing he is doing is a lot like what we might be doing if and when we move 
our MySQL drivers to MySQL-connector-Python, which is compatible with eventlet 
in that it is pure Python and can be monkeypatched.The issue observed by 
this user applies to both PyMySQL and MySQL-connector, and I can reproduce it 
*without* using SQLAlchemy, though it does use a very makeshift connection pool 
designed to approximate what SQLAlchemy’s does.   The issue is scary because it 
illustrates Python code that should have been killed being invoked on a 
database connection that should have been dead, calling commit(), and then 
actually *succeeding* in committing only *part* of the data.   This is not an 
issue that impacts Openstack right now but if the same thing applies to 
eventlet, then this would definitely be something we’d need to worry about if 
we start using MySQL-connector in a high load scenario (which has been the 
plan) so I’ve forwarded my findings onto openstack-dev to see if anyone can 
help me understand it.  The intro + test case for this issue starts at 
http://lists.openstack.org/pipermail/openstack-dev/2014-December/052344.html. 

3. enginefacade - The engine facade as I described in 
https://review.openstack.org/#/c/125181/, which we also talked about on the 
Nova compute call this week, is now built!  I spent monday and tuesday on the 
buildout for this, and that can be seen and reviewed here: 
https://review.openstack.org/#/c/138215/  As of today I’m still nursing it 
through CI, as even with projects using the “legacy” APIs, they are still 
finding lots of little silly things that I keep having to fix (people calling 
the old EngineFacade with arguments I didn’t expect, people importing from 
oslo.db in an order I did not expect, etc).  While these consuming projects 
could be fixed to not have these little issues, for now I am trying to push 
everything to work as identically as possible to how it was earlier, when the 
new API is not explicitly invoked.   I’ll be continuing to get this to pass all 
tempest runs through next week.

For enginefacade I’d like the folks from the call to take a look, and in 
particular if Matthew Booth wants to look into it, this is ready to start being 
used for prototyping Nova with it.

4. Connectivity stuff - today I worked a bunch with Viktor Sergeyev who has 
been trying to fix an issue with MySQL OperationalErrors that are raised when 
the database is shut off entirely; in oslo.db we have logic that wraps all 
exceptions unconditionally, including that it identifies disconnect exceptions. 
 In the case where the DB throws a disconnect, and we loop around to “retry” 
this query in order to get it to reconnect, then that reconnect continues to 
fail, the second run doesn’t get wrapped.   So today I’ve fixed both the 
upstream issue for SQLAlchemy 1.0, and also made a series of adjustments to 
oslo.db to accommodate SQLAlchemy 1.0’s system correctly as well as to work 
around the issue when SQLAlchemy < 1.0 is present.   That’s a three-series of 
patches that are unsurprisingly going to take some nursing to get through the 
gate, so I’ll be continuing with that next week.  This series starts at 
https://review.openstack.org/139725 https://review.openstack.org/139733 
https://review.openstack.org/139738 .

5. SQLA 1.0 stuff. - getting SQLAlchemy 1.0 close to release is becoming 
critical so I’ve been moving around issues and priorities to expedite this.  
There’s many stability enhancements oslo.db would benefit from as well as some 
major performance-related features that I’ve been planning all along to 
introduce to projects.   1.0 is very full of lots of changes that aren’t really 
being tested outside of my own CI, so getting something out the door on it is 
key, otherwise it will just be too different from 0.9 in order for people to 
have smooth upgrades.   I do run SQLA 1.0 in CI against a subset of Neutron, 
Nova, Keystone and Oslo tests so we should be in OK shape, but there is still a 
lot to go.  Work completed so far can be seen at 
http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html.  


___
OpenStack-dev mailing list
OpenStack-

Re: [openstack-dev] Mike Bayer 20141205

2014-12-05 Thread Mike Bayer
this was sent to the wrong list!   please ignore.   (or if you find it 
interesting, then great!)


> On Dec 5, 2014, at 6:13 PM, Mike Bayer  wrote:
> 
> 1. Alembic release - I worked through some regressions introduced by Alembic 
> 0.7.0 and the subsequent 0.7.1 with the Neutron folks.  This started on 
> Monday with https://review.openstack.org/#/c/137989/, and by Wednesday I had 
> identified enough small regressions in 0.7.0 that I had to put 0.7.1 out, so 
> that review got expedited with https://review.openstack.org/#/c/138998/ 
> following from Neutron devs to continue fixing.   Version 0.7.1 includes the 
> foreign key autogenerate support first proposed by Ann Kamyshnikova.  
> Changelog at 
> http://alembic.readthedocs.org/en/latest/changelog.html#change-0.7.1.
> 
> 2. MySQL driver stuff.   I have a SQLAlchemy user who is running some kind of 
> heavy load with gevent and PyMySQL.  While this user is not 
> openstack-specific, the thing he is doing is a lot like what we might be 
> doing if and when we move our MySQL drivers to MySQL-connector-Python, which 
> is compatible with eventlet in that it is pure Python and can be 
> monkeypatched.The issue observed by this user applies to both PyMySQL and 
> MySQL-connector, and I can reproduce it *without* using SQLAlchemy, though it 
> does use a very makeshift connection pool designed to approximate what 
> SQLAlchemy’s does.   The issue is scary because it illustrates Python code 
> that should have been killed being invoked on a database connection that 
> should have been dead, calling commit(), and then actually *succeeding* in 
> committing only *part* of the data.   This is not an issue that impacts 
> Openstack right now but if the same thing applies to eventlet, then this 
> would definitely be something we’d need to worry about if we start using 
> MySQL-connector in a high load scenario (which has been the plan) so I’ve 
> forwarded my findings onto openstack-dev to see if anyone can help me 
> understand it.  The intro + test case for this issue starts at 
> http://lists.openstack.org/pipermail/openstack-dev/2014-December/052344.html. 
> 
> 3. enginefacade - The engine facade as I described in 
> https://review.openstack.org/#/c/125181/, which we also talked about on the 
> Nova compute call this week, is now built!  I spent monday and tuesday on the 
> buildout for this, and that can be seen and reviewed here: 
> https://review.openstack.org/#/c/138215/  As of today I’m still nursing it 
> through CI, as even with projects using the “legacy” APIs, they are still 
> finding lots of little silly things that I keep having to fix (people calling 
> the old EngineFacade with arguments I didn’t expect, people importing from 
> oslo.db in an order I did not expect, etc).  While these consuming projects 
> could be fixed to not have these little issues, for now I am trying to push 
> everything to work as identically as possible to how it was earlier, when the 
> new API is not explicitly invoked.   I’ll be continuing to get this to pass 
> all tempest runs through next week.
> 
> For enginefacade I’d like the folks from the call to take a look, and in 
> particular if Matthew Booth wants to look into it, this is ready to start 
> being used for prototyping Nova with it.
> 
> 4. Connectivity stuff - today I worked a bunch with Viktor Sergeyev who has 
> been trying to fix an issue with MySQL OperationalErrors that are raised when 
> the database is shut off entirely; in oslo.db we have logic that wraps all 
> exceptions unconditionally, including that it identifies disconnect 
> exceptions.  In the case where the DB throws a disconnect, and we loop around 
> to “retry” this query in order to get it to reconnect, then that reconnect 
> continues to fail, the second run doesn’t get wrapped.   So today I’ve fixed 
> both the upstream issue for SQLAlchemy 1.0, and also made a series of 
> adjustments to oslo.db to accommodate SQLAlchemy 1.0’s system correctly as 
> well as to work around the issue when SQLAlchemy < 1.0 is present.   That’s a 
> three-series of patches that are unsurprisingly going to take some nursing to 
> get through the gate, so I’ll be continuing with that next week.  This series 
> starts at https://review.openstack.org/139725 
> https://review.openstack.org/139733 https://review.openstack.org/139738 .
> 
> 5. SQLA 1.0 stuff. - getting SQLAlchemy 1.0 close to release is becoming 
> critical so I’ve been moving around issues and priorities to expedite this.  
> There’s many stability enhancements oslo.db would benefit from as well as 
> some major performance-related features that I’ve been planning all along to 
> introduce to projects.   1.0 is very full of lots of changes that aren’t 
> really being tested ou

[openstack-dev] [oslo.db] engine facade status, should reader transactions COMMIT or ROLLBACK?

2014-12-09 Thread Mike Bayer
Hi folks -

Just a reminder that the majority of the enginefacade implementation is up for 
review, see that at: https://review.openstack.org/#/c/138215/.Needs a lot 
more people looking at it.

Matthew Booth raised a good point which I also came across, which is of 
transactions that are “read only”.   What is the opinion of openstack-dev for a 
transaction that is marked as “reader” and emits only SELECT statements, do we 
prefer that it COMMIT at the end or just ROLLBACK?Doesn’t matter much to 
me.  In my own work I tend to use ROLLBACK, but the current design of most 
openstack database code I see is doing simple “with session.begin()”, which 
means it’s currently all COMMIT.

Thanks for your attention!

- mike


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [ha] potential issue with implicit async-compatible mysql drivers

2014-12-12 Thread Mike Bayer

> On Dec 12, 2014, at 9:27 AM, Ihar Hrachyshka  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> Reading the latest comments at
> https://github.com/PyMySQL/PyMySQL/issues/275, it seems to me that the
> issue is not to be solved in drivers themselves but instead in
> libraries that arrange connections (sqlalchemy/oslo.db), correct?
> 
> Will the proposed connection reopening help?

disagree, this is absolutely a driver bug.  I’ve re-read that last comment and 
now I see that the developer is suggesting that this condition not be flagged 
in any way, so I’ve responded.  The connection should absolutely blow up and if 
it wants to refuse to be usable afterwards, that’s fine (it’s the same as 
MySQLdb “commands out of sync”).  It just has to *not* emit any further SQL as 
though nothing is wrong.

It doesn’t matter much for PyMySQL anyway, I don’t know that PyMySQL is up to 
par for openstack in any case (look at the entries in their changelog: 
https://github.com/PyMySQL/PyMySQL/blob/master/CHANGELOG "Several other bug 
fixes”, “Many bug fixes"- really?  is this an iphone app?)

We really should be looking to get this fixed in MySQL-connector, which seems 
to have a similar issue.   It’s just so difficult to get responses from 
MySQL-connector that the PyMySQL thread is at least informative.





> 
> /Ihar
> 
> On 05/12/14 23:43, Mike Bayer wrote:
>> Hey list -
>> 
>> I’m posting this here just to get some ideas on what might be
>> happening here, as it may or may not have some impact on Openstack
>> if and when we move to MySQL drivers that are async-patchable, like
>> MySQL-connector or PyMySQL.  I had a user post this issue a few
>> days ago which I’ve since distilled into test cases for PyMySQL and
>> MySQL-connector separately.   It uses gevent, not eventlet, so I’m
>> not really sure if this applies.  But there’s plenty of very smart
>> people here so if anyone can shed some light on what is actually
>> happening here, that would help.
>> 
>> The program essentially illustrates code that performs several
>> steps upon a connection, however if the greenlet is suddenly
>> killed, the state from the connection, while damaged, is still
>> being allowed to continue on in some way, and what’s
>> super-catastrophic here is that you see a transaction actually
>> being committed *without* all the statements proceeding on it.
>> 
>> In my work with MySQL drivers, I’ve noted for years that they are
>> all very, very bad at dealing with concurrency-related issues.  The
>> whole “MySQL has gone away” and “commands out of sync” errors are
>> ones that we’ve all just drowned in, and so often these are due to
>> the driver getting mixed up due to concurrent use of a connection.
>> However this one seems more insidious.   Though at the same time,
>> the script has some complexity happening (like a simplistic
>> connection pool) and I’m not really sure where the core of the
>> issue lies.
>> 
>> The script is at
>> https://gist.github.com/zzzeek/d196fa91c40cb515365e and also below.
>> If you run it for a few seconds, go over to your MySQL command line
>> and run this query:
>> 
>> SELECT * FROM table_b WHERE a_id not in (SELECT id FROM table_a)
>> ORDER BY a_id DESC;
>> 
>> and what you’ll see is tons of rows in table_b where the “a_id” is
>> zero (because cursor.lastrowid fails), but the *rows are
>> committed*.   If you read the segment of code that does this, it
>> should be impossible:
>> 
>> connection = pool.get() rowid = execute_sql( connection, "INSERT
>> INTO table_a (data) VALUES (%s)", ("a",) )
>> 
>> gevent.sleep(random.random() * 0.2)  try: execute_sql( connection, 
>> "INSERT INTO table_b (a_id, data) VALUES (%s, %s)", (rowid, "b",) 
>> )  connection.commit()  pool.return_conn(connection)
>> 
>> except Exception: connection.rollback() 
>> pool.return_conn(connection)
>> 
>> so if the gevent.sleep() throws a timeout error, somehow we are
>> getting thrown back in there, with the connection in an invalid
>> state, but not invalid enough to commit.
>> 
>> If a simple check for “SELECT connection_id()” is added, this query
>> fails and the whole issue is prevented.  Additionally, if you put a
>> foreign key constraint on that b_table.a_id, then the issue is
>> prevented, and you see that the constraint violation is happening
>> all over the place within the commit() call.   The connection is
>> being used such that its state just started after the
>> gevent.sleep() call.
>> 
>> Now, there’s also a very rudimen

Re: [openstack-dev] [all] [oslo] [ha] potential issue with implicit async-compatible mysql drivers

2014-12-13 Thread Mike Bayer

> On Dec 12, 2014, at 1:16 PM, Mike Bayer  wrote:
> 
> 
>> On Dec 12, 2014, at 9:27 AM, Ihar Hrachyshka  wrote:
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA512
>> 
>> Reading the latest comments at
>> https://github.com/PyMySQL/PyMySQL/issues/275, it seems to me that the
>> issue is not to be solved in drivers themselves but instead in
>> libraries that arrange connections (sqlalchemy/oslo.db), correct?
>> 
>> Will the proposed connection reopening help?
> 
> disagree, this is absolutely a driver bug.  I’ve re-read that last comment 
> and now I see that the developer is suggesting that this condition not be 
> flagged in any way, so I’ve responded.  The connection should absolutely blow 
> up and if it wants to refuse to be usable afterwards, that’s fine (it’s the 
> same as MySQLdb “commands out of sync”).  It just has to *not* emit any 
> further SQL as though nothing is wrong.
> 
> It doesn’t matter much for PyMySQL anyway, I don’t know that PyMySQL is up to 
> par for openstack in any case (look at the entries in their changelog: 
> https://github.com/PyMySQL/PyMySQL/blob/master/CHANGELOG "Several other bug 
> fixes”, “Many bug fixes"- really?  is this an iphone app?)
> 
> We really should be looking to get this fixed in MySQL-connector, which seems 
> to have a similar issue.   It’s just so difficult to get responses from 
> MySQL-connector that the PyMySQL thread is at least informative.

so I spent the rest of yesterday continuing to stare at that example case and 
also continued the thread on that list.

Where I think it’s at is that, while I think this is a huge issue in any one or 
all of:  1. a gevent-style “timeout” puts a monkeypatched socket in an entirely 
unknown state, 2. MySQL’s protocol doesn’t have any provision for matching an 
OK response to the request that it corresponds to, 3. the MySQL drivers we’re 
dealing with don’t have actual “async” APIs, which could then be easily 
tailored to work with eventlet/gevent safely (see 
https://github.com/zacharyvoase/gevent-psycopg2 
https://bitbucket.org/dvarrazzo/psycogreen for the PG examples of these, 
problem solved), at the moment I’m not fully confident the drivers are going to 
feasibly be able to provide a complete fix here. MySQL sends a status 
message that is essentially, “OK”, and there’s not really any way to tell that 
this “OK” is actually from a different statement.

What we need at the very basic level is that, if we call connection.rollback(), 
it either fails with an exception, or it succeeds.   Right now, the core of the 
test case is that we see connection.rollback() silently failing, which then 
causes the next statement (the INSERT) to also fail - then the connection 
rights itself and continues to be usable to complete the transaction.   There 
might be some other variants of this.

So in the interim I have added for SQLA 0.9.9, which I can also make available 
as part of oslo.db.sqlalchemy.compat if we’d like, a session.invalidate() 
method that will just call connection.invalidate() on the current bound 
connection(s); this is then caught within the block where we know that 
eventlet/gevent is in a “timeout” status.

Within the oslo.db.sqlalchemy.enginefacade system, we can potentially add 
direct awareness of eventlet.Timeout 
(http://eventlet.net/doc/modules/timeout.html) as a distinct error condition 
within a transactional block, and invalidate the known connection(s) when this 
is caught.   This would insulate us from this particular issue regardless of 
driver, with the key assumption that it is in fact only a “timeout” condition 
under which this issue actually occurs.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] unit test migration failure specific to MySQL/MariaDB - 'uuid': used in a foreign key constraint 'block_device_mapping_instance_uuid_fkey'

2015-01-06 Thread Mike Bayer
Hello -

Victor Sergeyev and I are both observing the following test failure which 
occurs with all the tests underneath 
nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.This is against 
master with a brand new tox environment and everything at the default.

It does not seem to be occurring on gates that run these tests and 
interestingly the tests seem to complete very quickly (under seven seconds) on 
the gate as well; the failures here take between 50-100 seconds to occur, not 
fully deterministically, and only on the MySQL backend; the Postgresql and 
SQLite versions of these tests pass.  I’m running against MariaDB server 
10.0.14 with Python 2.7.8 on Fedora 21.   

Below is the test just for test_walk_versions, but the warnings (not 
necessarily the failures themselves) here also occur for test_migration_267 as 
well as test_innodb_tables.

I’m still looking into what the cause of this is, I’d imagine it’s something 
related to newer MySQL versions or perhaps MariaDB vs. MySQL, I’m just putting 
it up here in case someone already knows what this is or has some clue to save 
me some time figuring it out.  I apologize if I’m just doing something dumb, 
I’ve only recently begun to run Nova’s test suite in full against all backends, 
so I haven’t yet put intelligent thought into this nor have I tried to yet look 
at the migration in question causing the problem.  Will do that next.


[mbayer@thinkpad nova]$ tox -e py27 -- 
nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
py27 develop-inst-noop: /home/mbayer/dev/openstack/nova
py27 runtests: PYTHONHASHSEED='0'
py27 runtests: commands[0] | find . -type f -name *.pyc -delete
py27 runtests: commands[1] | bash tools/pretty_tox.sh 
nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
running testr
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./nova/tests} 
--list 
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./nova/tests}  
--load-list /tmp/tmpw7zqhE

2015-01-06 18:28:12.913 32435 WARNING oslo.db.sqlalchemy.session 
[req-5cc6731f-00ef-43df-8aec-4914a44d12c5 ] MySQL SQL mode is '', consider 
enabling TRADITIONAL or STRICT_ALL_TABLES
{0} 
nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions 
[51.553131s] ... FAILED

Captured traceback:
~~~
Traceback (most recent call last):
  File "nova/tests/unit/db/test_migrations.py", line 151, in 
test_walk_versions
self.walk_versions(self.snake_walk, self.downgrade)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/oslo/db/sqlalchemy/test_migrations.py",
 line 193, in walk_versions
self.migrate_up(version, with_data=True)
  File "nova/tests/unit/db/test_migrations.py", line 148, in migrate_up
super(NovaMigrationsCheckers, self).migrate_up(version, with_data)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/oslo/db/sqlalchemy/test_migrations.py",
 line 263, in migrate_up
self.REPOSITORY, version)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/versioning/api.py",
 line 186, in upgrade
return _migrate(url, repository, version, upgrade=True, err=err, **opts)
  File "", line 2, in _migrate
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/versioning/util/__init__.py",
 line 160, in with_engine
return f(*a, **kw)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/versioning/api.py",
 line 366, in _migrate
schema.runchange(ver, change, changeset.step)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/versioning/schema.py",
 line 93, in runchange
change.run(self.engine, step)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/versioning/script/py.py",
 line 148, in run
script_func(engine)
  File 
"/home/mbayer/dev/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions/267_instance_uuid_non_nullable.py",
 line 103, in upgrade
process_null_records(meta, scan=False)
  File 
"/home/mbayer/dev/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions/267_instance_uuid_non_nullable.py",
 line 89, in process_null_records
table.columns.uuid.alter(nullable=False)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/changeset/schema.py",
 line 534, in alter
return alter_column(self, *p, **k)
  File 
"/home/mbayer/dev/openstack/nova/.tox/py27/lib/python2.7/site-packages/migrate/changeset/schema.py",
 

Re: [openstack-dev] [nova] unit test migration failure specific to MySQL/MariaDB - 'uuid': used in a foreign key constraint 'block_device_mapping_instance_uuid_fkey'

2015-01-07 Thread Mike Bayer
working with sdague on IRC, the first thing I’m seeing is that my MariaDB 
server is disallowing a change in column that is UNIQUE and has an FK pointing 
to it, and this is distinctly different from a straight up MySQL server (see 
below).  

http://paste.openstack.org/raw/155896/


old school MySQL:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4840
Server version: 5.6.15 Homebrew

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table foo (id int, blah int, primary key (id), unique key (blah)) 
engine=InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql> create table bar(id int, blah_fk int, primary key (id), foreign key 
(blah_fk) references foo(blah)) engine=InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql> alter table foo change column blah blah int not null;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> 



MariaDB 10:

MariaDB [test]> create table foo (id int, blah int, primary key (id), unique 
key (blah));
Query OK, 0 rows affected (0.09 sec)

MariaDB [test]> create table bar(id int, blah_fk int, primary key (id), foreign 
key (blah_fk) references foo(blah));
Query OK, 0 rows affected (0.12 sec)

MariaDB [test]> alter table foo change column blah blah int not null;
ERROR 1833 (HY000): Cannot change column 'blah': used in a foreign key 
constraint 'bar_ibfk_1' of table 'test.bar'
MariaDB [test]> 

Matt Riedemann  wrote:

> 
> 
> On 1/6/2015 5:40 PM, Mike Bayer wrote:
>> Hello -
>> 
>> Victor Sergeyev and I are both observing the following test failure which 
>> occurs with all the tests underneath 
>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.This is 
>> against master with a brand new tox environment and everything at the 
>> default.
>> 
>> It does not seem to be occurring on gates that run these tests and 
>> interestingly the tests seem to complete very quickly (under seven seconds) 
>> on the gate as well; the failures here take between 50-100 seconds to occur, 
>> not fully deterministically, and only on the MySQL backend; the Postgresql 
>> and SQLite versions of these tests pass.  I’m running against MariaDB server 
>> 10.0.14 with Python 2.7.8 on Fedora 21.
>> 
>> Below is the test just for test_walk_versions, but the warnings (not 
>> necessarily the failures themselves) here also occur for test_migration_267 
>> as well as test_innodb_tables.
>> 
>> I’m still looking into what the cause of this is, I’d imagine it’s something 
>> related to newer MySQL versions or perhaps MariaDB vs. MySQL, I’m just 
>> putting it up here in case someone already knows what this is or has some 
>> clue to save me some time figuring it out.  I apologize if I’m just doing 
>> something dumb, I’ve only recently begun to run Nova’s test suite in full 
>> against all backends, so I haven’t yet put intelligent thought into this nor 
>> have I tried to yet look at the migration in question causing the problem.  
>> Will do that next.
>> 
>> 
>> [mbayer@thinkpad nova]$ tox -e py27 -- 
>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
>> py27 develop-inst-noop: /home/mbayer/dev/openstack/nova
>> py27 runtests: PYTHONHASHSEED='0'
>> py27 runtests: commands[0] | find . -type f -name *.pyc -delete
>> py27 runtests: commands[1] | bash tools/pretty_tox.sh 
>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
>> running testr
>> running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
>> OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
>> OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
>> ${PYTHON:-python} -m subunit.run discover -t ./ 
>> ${OS_TEST_PATH:-./nova/tests} --list
>> running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
>> OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
>> OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
>> ${PYTHON:-python} -m subunit.run discover -t ./ 
>> ${OS_TEST_PATH:-./nova/tests}  --load-list /tmp/tmpw7zqhE
>> 
>> 2015-01-06 18:28:12.913 32435 WARNING oslo.db.sqlalchemy.session 
>> [req-5cc6731f-00ef-43df-8aec-4914a44d12c5 ] MySQL SQL mode is '', consider 
>> enabling TRADITIONAL or STRICT_ALL_TABLES
>> {0} 
>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
>>  [51.553131s] ... FAILED
>> 
>> Captured traceback:
>> ~~

Re: [openstack-dev] [nova] unit test migration failure specific to MySQL/MariaDB - 'uuid': used in a foreign key constraint 'block_device_mapping_instance_uuid_fkey'

2015-01-07 Thread Mike Bayer
OK so it’s looking like sql_mode=‘TRADITIONAL’ that allows it to work.  So that 
is most of it.   My MariaDB has no default sql_mode but oslo.db should be 
setting this, but in any case this seems more like a local oslo.db connection 
type of thing that I can track down myself, so most of the mystery solved! (at 
least the part that I didn’t feel like getting into….which I did anyway).


Mike Bayer  wrote:

> working with sdague on IRC, the first thing I’m seeing is that my MariaDB 
> server is disallowing a change in column that is UNIQUE and has an FK 
> pointing to it, and this is distinctly different from a straight up MySQL 
> server (see below).  
> 
> http://paste.openstack.org/raw/155896/
> 
> 
> old school MySQL:
> 
> Welcome to the MySQL monitor.  Commands end with ; or \g.
> Your MySQL connection id is 4840
> Server version: 5.6.15 Homebrew
> 
> Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
> 
> Oracle is a registered trademark of Oracle Corporation and/or its
> affiliates. Other names may be trademarks of their respective
> owners.
> 
> Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
> 
> mysql> create table foo (id int, blah int, primary key (id), unique key 
> (blah)) engine=InnoDB;
> Query OK, 0 rows affected (0.01 sec)
> 
> mysql> create table bar(id int, blah_fk int, primary key (id), foreign key 
> (blah_fk) references foo(blah)) engine=InnoDB;
> Query OK, 0 rows affected (0.01 sec)
> 
> mysql> alter table foo change column blah blah int not null;
> Query OK, 0 rows affected (0.02 sec)
> Records: 0  Duplicates: 0  Warnings: 0
> 
> mysql> 
> 
> 
> 
> MariaDB 10:
> 
> MariaDB [test]> create table foo (id int, blah int, primary key (id), unique 
> key (blah));
> Query OK, 0 rows affected (0.09 sec)
> 
> MariaDB [test]> create table bar(id int, blah_fk int, primary key (id), 
> foreign key (blah_fk) references foo(blah));
> Query OK, 0 rows affected (0.12 sec)
> 
> MariaDB [test]> alter table foo change column blah blah int not null;
> ERROR 1833 (HY000): Cannot change column 'blah': used in a foreign key 
> constraint 'bar_ibfk_1' of table 'test.bar'
> MariaDB [test]> 
> 
> Matt Riedemann  wrote:
> 
>> On 1/6/2015 5:40 PM, Mike Bayer wrote:
>>> Hello -
>>> 
>>> Victor Sergeyev and I are both observing the following test failure which 
>>> occurs with all the tests underneath 
>>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.This is 
>>> against master with a brand new tox environment and everything at the 
>>> default.
>>> 
>>> It does not seem to be occurring on gates that run these tests and 
>>> interestingly the tests seem to complete very quickly (under seven seconds) 
>>> on the gate as well; the failures here take between 50-100 seconds to 
>>> occur, not fully deterministically, and only on the MySQL backend; the 
>>> Postgresql and SQLite versions of these tests pass.  I’m running against 
>>> MariaDB server 10.0.14 with Python 2.7.8 on Fedora 21.
>>> 
>>> Below is the test just for test_walk_versions, but the warnings (not 
>>> necessarily the failures themselves) here also occur for test_migration_267 
>>> as well as test_innodb_tables.
>>> 
>>> I’m still looking into what the cause of this is, I’d imagine it’s 
>>> something related to newer MySQL versions or perhaps MariaDB vs. MySQL, I’m 
>>> just putting it up here in case someone already knows what this is or has 
>>> some clue to save me some time figuring it out.  I apologize if I’m just 
>>> doing something dumb, I’ve only recently begun to run Nova’s test suite in 
>>> full against all backends, so I haven’t yet put intelligent thought into 
>>> this nor have I tried to yet look at the migration in question causing the 
>>> problem.  Will do that next.
>>> 
>>> 
>>> [mbayer@thinkpad nova]$ tox -e py27 -- 
>>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
>>> py27 develop-inst-noop: /home/mbayer/dev/openstack/nova
>>> py27 runtests: PYTHONHASHSEED='0'
>>> py27 runtests: commands[0] | find . -type f -name *.pyc -delete
>>> py27 runtests: commands[1] | bash tools/pretty_tox.sh 
>>> nova.tests.unit.db.test_migrations.TestNovaMigrationsMySQL.test_walk_versions
>>> running testr
>>> running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
>>> OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1

[openstack-dev] [nova] [oslo] compare and swap progress

2015-01-15 Thread Mike Bayer
For those who haven’t seen it, I’d like to first share Jay Pipes’ unbelievably 
thorough blog post on Nova update concurrency, specifically as it relates to 
the issue of emitting an UPDATE on a "locked” row without using SELECT..FOR 
UPDATE (as well as why we *can’t* keep using SELECT..FOR UPDATE).  Go read it, 
I’ll wait here:  

http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/


Got all that? OK.   There’ve been two patches so far I’m aware of to 
implement this within a key area in nova.  We have Jay’s and Matt Booth’s:

https://review.openstack.org/109837
https://review.openstack.org/141115

So what I want to do with either of those (Matt’s seems to be a little further 
along) is factor out all that UPDATE stuff, and make a nice oslo.db function 
that will:

1. emit an UPDATE statement that matches a row on a full set of attributes 
present in a given object “specimen”;

2. if exactly one row matched, retrieve the primary key of that row using as 
efficient a means as possible given the backend database and schema design;

3. return a persistent version of the given “specimen” as though it was just 
SELECTed from the database.

I have that ready to go, which most likely can be of use in many more scenarios 
than just this one.  I invite folks to take a look:

https://review.openstack.org/#/c/146228/

Example:

 specimen = MyModel(
y='y9', z='z5', x=6,
uuid='136254d5-3869-408f-9da7-190e0072641a'
)

result = session.query(MyModel).update_on_match(
specimen,
'uuid', 
values={'x': 9, 'z': 'z3'})

# result is now a persistent version of "specimen" (at the moment
# the same object) with all the new values.  the UPDATE statement 
guaranteed
# to match all of x, y, z and uuid.






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [oslo] [nova] [all] potential enginefacade adjustment - can everyone live with this?

2015-01-22 Thread Mike Bayer
Hey all -

Concerning the enginefacade system, approved blueprint:

https://review.openstack.org/#/c/125181/

which will replace the use of oslo_db.sqlalchemy.EngineFacade ultimately across 
all projects that use it (which is, all of them that use a database).

We are struggling to find a solution for the issue of application-defined 
contexts that might do things that the facade needs to know about, namely 1. 
that the object might be copied using deepcopy() or 2. that the object might be 
sent into a new set of worker threads, where its attributes are accessed 
concurrently.

While the above blueprint and the implementations so far have assumed that we 
need to receive this context object and use simple assignment, e.g. 
“context.session = the_session” in order to provide its attributes, in order to 
accommodate 1. and 2. I’ve had to add a significant amount of complexity in 
order to accommodate those needs (see patch set 28 at 
https://review.openstack.org/#/c/138215/).   It all works fine, but 
predictably, people are not comfortable with the extra few yards into the weeds 
it has to go to make all that happen.  In particular, in order to accommodate a 
RequestContext that is running in a different thread, it has to be copied 
first, because we have no ability to make the “.session” or “.connection” 
attributes dynamic without access to the RequestContext class up front.

So, what’s the alternative.   It’s that enginefacade is given just a tiny bit 
of visibility into the *class* used to create your context, such as in Nova, 
the nova.context.RequestContext class, so that we can place dynamic descriptors 
on it before instantiation (or I suppose we could monkeypatch the class on the 
first RequestContext object we see, but that seems even less desirable).   The 
blueprint went out of its way to avoid this.   But with contexts being copied 
and thrown into threads, I didn’t consider these use cases and I’d have 
probably needed to do the BP differently.

So what does the change look like?If you’re not Nova, imagine you’re 
cinder.context.RequestContext, heat.common.context.RequestContext, 
glance.context.RequestContext, etc.We throw a class decorator onto the 
class so that enginefacade can add some descriptors:

diff --git a/nova/context.py b/nova/context.py
index e78636c..205f926 100644
--- a/nova/context.py
+++ b/nova/context.py
@@ -22,6 +22,7 @@ import copy
from keystoneclient import auth
from keystoneclient import service_catalog
from oslo.utils import timeutils
+from oslo_db.sqlalchemy import enginefacade
import six

from nova import exception
@@ -61,6 +62,7 @@ class _ContextAuthPlugin(auth.BaseAuthPlugin):
region_name=region_name)


+@enginefacade.transaction_context_provider
class RequestContext(object):
"""Security context and request information.


the implementation of this one can be seen here: 
https://review.openstack.org/#/c/149289/.   In particular we can see all the 
lines of code removed from oslo’s approach, and in fact there’s a lot more 
nasties I can take out once I get to work on that some more.

so what’s controversial about this?   It’s that there’s an “oslo_db.sqlalchemy” 
import up front in the XYZ/context.py module of every participating project, 
outside of where anything else “sqlalchemy” happens.  

There’s potentially other ways to do this - subclasses of RequestContext that 
are generated by abstract factories, for one.   As I left my Java gigs years 
ago I’m hesitant to go there either :).   Perhaps projects can opt to run their 
RequestContext class through this decorator conditionally, wherever it is that 
it gets decided they are about to use their db/sqlalchemy/api.py module.

So can I please get +1 / -1 from the list on, “oslo_db.sqlalchemy wants an 
up-front patch on everyone’s RequestContext class”  ?  thanks!

- mike








__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] [nova] [all] potential enginefacade adjustment - can everyone live with this?

2015-01-23 Thread Mike Bayer


Doug Hellmann  wrote:

> We put the new base class for RequestContext in its own library because
> both the logging and messaging code wanted to influence it's API. Would
> it make sense to do this database setup there, too?

whoa, where’s that? is this an oslo-wide RequestContext class ? that would
solve everything b.c. right now every project seems to implement
RequestContext themselves.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] [nova] [all] potential enginefacade adjustment - can everyone live with this?

2015-01-23 Thread Mike Bayer


Ihar Hrachyshka  wrote:

> On 01/23/2015 05:38 PM, Mike Bayer wrote:
>> Doug Hellmann  wrote:
>> 
>>> We put the new base class for RequestContext in its own library because
>>> both the logging and messaging code wanted to influence it's API. Would
>>> it make sense to do this database setup there, too?
>> whoa, where’s that? is this an oslo-wide RequestContext class ? that would
>> solve everything b.c. right now every project seems to implement
>> RequestContext themselves.
> 
> https://github.com/openstack/oslo.context/blob/master/oslo_context/context.py#L35
> 
> Though not every project migrated to it yet.

WOW !!

OK!

Dear Openstack:

Can you all start using oslo_context/context.py for your RequestContext
base, as a condition of migrating off of legacy EngineFacade?


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] [nova] [all] potential enginefacade adjustment - can everyone live with this?

2015-01-23 Thread Mike Bayer


Mike Bayer  wrote:

> 
> 
> Ihar Hrachyshka  wrote:
> 
>> On 01/23/2015 05:38 PM, Mike Bayer wrote:
>>> Doug Hellmann  wrote:
>>> 
>>>> We put the new base class for RequestContext in its own library because
>>>> both the logging and messaging code wanted to influence it's API. Would
>>>> it make sense to do this database setup there, too?
>>> whoa, where’s that? is this an oslo-wide RequestContext class ? that would
>>> solve everything b.c. right now every project seems to implement
>>> RequestContext themselves.


so Doug -

How does this “influence of API” occur, would oslo.db import
oslo_context.context and patch onto RequestContext at that point? Or the
other way around? Or… ?


I’m almost joyful that this is here.   Assuming we can get everyone to use it, 
should be straightforward for that right?



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] [nova] [all] potential enginefacade adjustment - can everyone live with this?

2015-01-23 Thread Mike Bayer


Doug Hellmann  wrote:

> 
> 
> On Fri, Jan 23, 2015, at 12:49 PM, Mike Bayer wrote:
>> Mike Bayer  wrote:
>> 
>>> Ihar Hrachyshka  wrote:
>>> 
>>>> On 01/23/2015 05:38 PM, Mike Bayer wrote:
>>>>> Doug Hellmann  wrote:
>>>>> 
>>>>>> We put the new base class for RequestContext in its own library because
>>>>>> both the logging and messaging code wanted to influence it's API. Would
>>>>>> it make sense to do this database setup there, too?
>>>>> whoa, where’s that? is this an oslo-wide RequestContext class ? that would
>>>>> solve everything b.c. right now every project seems to implement
>>>>> RequestContext themselves.
>> 
>> 
>> so Doug -
>> 
>> How does this “influence of API” occur, would oslo.db import
>> oslo_context.context and patch onto RequestContext at that point? Or the
>> other way around? Or… ?
> 
> No, it's a social thing. I didn't want dependencies between
> oslo.messaging and oslo.log, but the API of the context needs to support
> use cases in both places.
> 
> Your case might be different, in that we might need to actually have
> oslo.context depend on oslo.db in order to call some setup code. We'll
> have to think about whether that makes sense and what other dependencies
> it might introduce between the existing users of oslo.context.

hey Doug -

for the moment, I have oslo_db.sqlalchemy.enginefacade applying its descriptors 
at import time onto oslo_context:

https://review.openstack.org/#/c/138215/30/oslo_db/sqlalchemy/enginefacade.py

https://review.openstack.org/gitweb?p=openstack/oslo.db.git;a=blob;f=oslo_db/sqlalchemy/enginefacade.py;h=3f76678a6c9788f62288c8fa5ef520db8dff2c0a;hb=bc33d20dc6db2f8e5f8cb02b4eb5f97d24dafb7a#l692

https://review.openstack.org/gitweb?p=openstack/oslo.db.git;a=blob;f=oslo_db/sqlalchemy/enginefacade.py;h=3f76678a6c9788f62288c8fa5ef520db8dff2c0a;hb=bc33d20dc6db2f8e5f8cb02b4eb5f97d24dafb7a#l498




> 
> Doug
> 
>> I’m almost joyful that this is here.   Assuming we can get everyone to
>> use it, should be straightforward for that right?
>> 
>> 
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] proposal for unwinding database usage from tests

2015-01-24 Thread Mike Bayer


Sean Dague  wrote:

> I've been looking at the following patch series -
> https://review.openstack.org/#/c/131691/13 for removing database
> requirements from some tests.
> 
> I whole heartedly support getting DB usage out of tests, but I'd like to
> make sure that we don't create new challenges in the process. The
> conditional create_db parameter in test functions adds a bit more
> internal test complexity than I think we should have.
> 
> I'd like to propose instead DB usage should be removed per test Class as
> an atomic unit. If that turns into too large a patch that probably means
> breaking apart the test class into smaller test classes first.
> 
> The other thing to be careful in understanding the results of timing
> tests. The way the database fixture works it caches the migration
> process -
> https://github.com/openstack/nova/blob/master/nova/tests/fixtures.py#L206
> 
> That actually means that the overhead of the db-migration sync is paid
> only once per testr worker (it's 1s on my fast workstation, might be 2s
> on gate nodes). The subsequence db setups for tests 2 -> N in the worker
> only take about 0.020s on my workstation (scale appropriately). So
> removing all the unneeded db setup code is probably only going to save
> ~30s over an entire test run.
> 
> Which doesn't mean it shouldn't be done, there are other safety reasons
> we shouldn't let every test randomly punch data into the db and still
> pass. But time savings should not be the primary motivator here, because
> it's actually not nearly as much gain as it looks like from running only
> a small number of tests.

I have a stalled patch for oslo.db that would provide for a new db fixture such 
that tests wouldn’t have to worry too much about when fixtures are set up or 
when per-test data is loaded; the system uses testresources to organize setup 
of schemas and teardown of data within longer-lived schemas in the most 
efficient way possible, across any number of backends:  
https://review.openstack.org/#/c/120870/.   Simplifying and bringing great 
consistency to the fixtures of participating projects and reducing the reliance 
upon SQLite in tests are the goals of this patch.  The patch is stalled not for 
any particular reason, except perhaps some lingering discomfort with the 
OptimisingTestSuite approach that necessitates patching into unittest internals 
in order to reorder whole groups of tests.

So yes, DB-specific tests and tests without DB absolutely have to remain broken 
up along class lines; the migration of an individual DB-test to a non-DB test 
should involve moving it to a different class that isn’t a DB-related fixture.  







> 
>   -Sean
> 
> -- 
> Sean Dague
> http://dague.net
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Oslo] [Ironic] DB migration woes

2014-06-07 Thread Mike Bayer

On Jun 6, 2014, at 8:12 PM, Devananda van der Veen  
wrote:

> I think some things are broken in the oslo-incubator db migration code.
> 
> Ironic moved to this when Juno opened and things seemed fine, until recently 
> when Lucas tried to add a DB migration and noticed that it didn't run... So I 
> looked into it a bit today. Below are my findings.
> 
> Firstly, I filed this bug and proposed a fix, because I think that tests that 
> don't run any code should not report that they passed -- they should report 
> that they were skipped.
>   https://bugs.launchpad.net/oslo/+bug/1327397
>   "No notice given when db migrations are not run due to missing engine"
> 
> Then, I edited the test_migrations.conf file appropriately for my local mysql 
> service, ran the tests again, and verified that migration tests ran -- and 
> they passed. Great!
> 
> Now, a little background... Ironic's TestMigrations class inherits from 
> oslo's BaseMigrationTestCase, then "opportunistically" checks each back-end, 
> if it's available. This opportunistic checking was inherited from Nova so 
> that tests could pass on developer workstations where not all backends are 
> present (eg, I have mysql installed, but not postgres), and still 
> transparently run on all backends in the gate. I couldn't find such 
> opportunistic testing in the oslo db migration test code, unfortunately - but 
> maybe it's well hidden.
> 
> Anyhow. When I stopped the local mysql service (leaving the configuration 
> unchanged), I expected the tests to be skipped, but instead I got two 
> surprise failures:
> - test_mysql_opportunistically() failed because setUp() raises an exception 
> before the test code could call calling _have_mysql()
> - test_mysql_connect_fail() actually failed! Again, because setUp() raises an 
> exception before running the test itself
> 
> Unfortunately, there's one more problem... when I run the tests in parallel, 
> they fail randomly because sometimes two test threads run different migration 
> tests, and the setUp() for one thread (remember, it calls _reset_databases) 
> blows up the other test.
> 
> Out of 10 runs, it failed three times, each with different errors:
>   NoSuchTableError: `chassis`
>   ERROR 1007 (HY000) at line 1: Can't create database 'test_migrations'; 
> database exists
>   ProgrammingError: (ProgrammingError) (1146, "Table 
> 'test_migrations.alembic_version' doesn't exist")
> 
> As far as I can tell, this is all coming from:
>   
> https://github.com/openstack/oslo-incubator/blob/master/openstack/common/db/sqlalchemy/test_migrations.py#L86;L111

Hello -

Just an introduction, I’m Mike Bayer, the creator of SQLAlchemy and Alembic 
migrations. I’ve just joined on as a full time Openstack contributor, and 
trying to help improve processes such as these is my primary responsibility.

I’ve had several conversations already about how migrations are run within test 
suites in various openstack projects.   I’m kind of surprised by this approach 
of dropping and recreating the whole database for individual tests.   Running 
tests in parallel is obviously made very difficult by this style, but even 
beyond that, a lot of databases don’t respond well to lots of 
dropping/rebuilding of tables and/or databases in any case; while SQLite and 
MySQL are probably the most forgiving of this, a backend like Postgresql is 
going to lock tables from being dropped more aggressively, if any open 
transactions or result cursors against those tables remain, and on a backend 
like Oracle, the speed of schema operations starts to become prohibitively 
slow.   Dropping and creating tables is in general not a very speedy task on 
any backend, and on a test suite that runs many tests against a fixed schema, I 
don’t see why a full drop is necessary.

If you look at SQLAlchemy’s own tests, they do in fact create tables on each 
test, or just as often for a specific suite of tests.  However, this is due to 
the fact that SQLAlchemy tests are testing SQLAlchemy itself, so the database 
schemas used for these tests are typically built explicitly for small groups or 
individual tests, and there are ultimately thousands of small “mini schemas” 
built up and torn down for these tests.   A lot of framework code is involved 
within the test suite to keep more picky databases like Postgresql and Oracle 
happy when building up and dropping tables so frequently.

However, when testing an application that uses a fixed set of tables, as should 
be the case for the majority if not all Openstack apps, there’s no reason that 
these tables need to be recreated for every test.   Typically, the way I 
recommend is that the test suite includes a “per suite” activity which creates 
the test schema just once

Re: [openstack-dev] [Oslo] [Ironic] DB migration woes

2014-06-08 Thread Mike Bayer

On Jun 8, 2014, at 11:46 AM, Roman Podoliaka  wrote:

> 
> Overall, the approach with executing a test within a transaction and
> then emitting ROLLBACK worked quite well. The only problem I ran into
> were tests doing ROLLBACK on purpose. But you've updated the recipe
> since then and this can probably be solved by using of save points.

yup, I went and found the gist, that is here:

https://gist.github.com/zzzeek/8443477



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation

2014-06-08 Thread Mike Bayer

On Jun 7, 2014, at 4:38 PM, Eugene Nikanorov  wrote:

> Hi folks,
> 
> There was a small discussion about the better way of doing sql operations for 
> vni synchronization with the config.
> Initial proposal was to handle those in chunks. Carl also suggested to issue 
> a single sql query.
> I've did some testing with my sql and postgress.
> I've tested the following scenario: vxlan range is changed from 5:15 
> to 0:10 and vice versa.
> That involves adding and deleting 5 vni in each test.
> 
> Here are the numbers:
> 50k vnis to add/deletePg adding vnis  Pg deleting vnisPg 
> TotalMysql adding vnisMysql deleting vnisMysql total
> non-chunked sql   23  22  45  14  20  34
> chunked in 10020   17 37  14   17 31
> 
> I've done about 5 tries to get each number to minimize random floating factor 
> (due to swaps, disc or cpu activity or other factors)
> That might be surprising that issuing multiple sql statements instead one big 
> is little bit more efficient, so I would appreciate if someone could 
> reproduce those numbers.
> Also I'd like to note that part of code that iterates over vnis fetched from 
> db is taking 10 seconds both on mysql and postgress and is a part of 
> "deleting vnis" numbers.
> In other words, difference between multiple DELETE sql statements and single 
> one is even bigger (in percent) than these numbers show.
> 
> The code which I used to test is here: http://paste.openstack.org/show/83298/
> Right now the chunked version is commented out, so to switch between versions 
> some lines should be commented and some - uncommented.

I’ve taken a look at this, though I’m not at the point where I have things set 
up to run things like this within full context, and I don’t know that I have 
any definitive statements to make, but I do have some suggestions:

1. I do tend to chunk things a lot, selects, deletes, inserts, though the chunk 
size I work with is typically more like 1000, rather than 100.   When chunking, 
we’re looking to select a size that doesn’t tend to overload the things that 
are receiving the data (query buffers, structures internal to both SQLAlchemy 
as well as the DBAPI and the relational database), but at the same time doesn’t 
lead to too much repetition on the Python side (where of course there’s a lot 
of slowness).

2. Specifically regarding “WHERE x IN (…..)”, I always chunk those.  When we 
use IN with a list of values, we’re building an actual SQL string that becomes 
enormous.  This puts strain on the database’s query engine that is not 
optimized for SQL strings that are hundreds of thousands of characters long, 
and on some backends this size is limited; on Oracle, there’s a limit of 1000 
items.   So I’d always chunk this kind of thing.

3. I’m not sure of the broader context of this code, but in fact placing a 
literal list of items in the IN in this case seems unnecessary; the 
“vmis_to_remove” list itself was just SELECTed two lines above.   There’s some 
in-Python filtering following it which does not seem necessary; the 
"alloc.vxlan_vni not in vxlan_vnis” phrase could just as well be a SQL “NOT IN” 
expression.  Not sure if determination of the “.allocated” flag can be done in 
SQL, if that’s a plain column, then certainly.Again not sure if this is 
just an artifact of how the test is done here, but if the goal is to optimize 
this code for speed, doing a DELETE…WHERE .. IN (SELECT ..) is probably better. 
  I see that the SELECT is using a lockmode, but it would seem that if just the 
rows we care to DELETE are inlined within the DELETE itself this wouldn’t be 
needed either.

It’s likely that everything in #3 is pretty obvious already and there’s reasons 
it’s the way it is, but I’m just learning all of these codebases so feel free 
to point out more of the background for me.   

4. The synchronize_session=“fetch” is certainly a huge part of the time spent 
here, and it seems unclear why this synchronize is necessary.  When I use 
query.delete() I never use “fetch”; I either have synchronization turned off, 
as the operation is not dealing with any set of objects already in play, or I 
use “evaluate” which here is not possible with the IN (though there is a 
SQLAlchemy ticket for many years to implement “evaluate” using "IN (values)" 
that is pretty easy to implement, but if the query became an "IN (SELECT …)” 
that again would not be feasible).

5. I don’t have a great theory on why chunking does better here on the INSERT.  
 My vague notion here is that as with the DELETE, the systems in play do better 
when they aren’t tasked with building up very large internal buffers for 
operations, but that’s not something I have the background to prove.  

These are all just some impressions and as I’m totally new to this code base I 
may be way off, so please feel to help me get up to speed !

- mike


__

Re: [openstack-dev] [Oslo] [Ironic] DB migration woes

2014-06-09 Thread Mike Bayer

On Jun 9, 2014, at 12:50 PM, Devananda van der Veen  
wrote:

> There may be some problems with MySQL when testing parallel writes in
> different non-committing transactions, even in READ COMMITTED mode,
> due to InnoDB locking, if the queries use non-unique secondary indexes
> for UPDATE or SELECT..FOR UPDATE queries. This is done by the
> "with_lockmode('update')" SQLAlchemy phrase, and is used in ~10 places
> in Nova. So I would not recommend this approach, even though, in
> principle, I agree it would be a much more efficient way of testing
> database reads/writes.
> 
> More details here:
> http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html and
> http://dev.mysql.com/doc/refman/5.5/en/innodb-record-level-locks.html

OK, but just to clarify my understanding, what is the approach to testing 
writes in parallel right now, are we doing CREATE DATABASE for two entirely 
distinct databases with some kind of generated name for each one?  Otherwise, 
if the parallel tests are against the same database, this issue exists 
regardless (unless autocommit mode is used, is FOR UPDATE accepted under those 
conditions?)




> 
> On Sun, Jun 8, 2014 at 8:46 AM, Roman Podoliaka  
> wrote:
>> Hi Mike,
>> 
>>>>> However, when testing an application that uses a fixed set of tables, as 
>>>>> should be the case for the majority if not all Openstack apps, there’s no 
>>>>> reason that these tables need to be recreated for every test.
>> 
>> This is a very good point. I tried to use the recipe from SQLAlchemy
>> docs to run Nova DB API tests (yeah, I know, this might sound
>> confusing, but these are actually methods that access the database in
>> Nova) on production backends (MySQL and PostgreSQL). The abandoned
>> patch is here [1]. Julia Varlamova has been working on rebasing this
>> on master and should upload a new patch set soon.
>> 
>> Overall, the approach with executing a test within a transaction and
>> then emitting ROLLBACK worked quite well. The only problem I ran into
>> were tests doing ROLLBACK on purpose. But you've updated the recipe
>> since then and this can probably be solved by using of save points. I
>> used a separate DB per a test running process to prevent race
>> conditions, but we should definitely give READ COMMITTED approach a
>> try. If it works, that will awesome.
>> 
>> With a few tweaks of PostgreSQL config I was able to run Nova DB API
>> tests in 13-15 seconds, while SQLite in memory took about 7s.
>> 
>> Action items for me and Julia probably: [2] needs a spec with [1]
>> updated accordingly. Using of this 'test in a transaction' approach
>> seems to be a way to go for running all db related tests except the
>> ones using DDL statements (as any DDL statement commits the current
>> transaction implicitly on MySQL and SQLite AFAIK).
>> 
>> Thanks,
>> Roman
>> 
>> [1] https://review.openstack.org/#/c/33236/
>> [2] https://blueprints.launchpad.net/nova/+spec/db-api-tests-on-all-backends
>> 
>> On Sat, Jun 7, 2014 at 10:27 PM, Mike Bayer  wrote:
>>> 
>>> On Jun 6, 2014, at 8:12 PM, Devananda van der Veen 
>>> wrote:
>>> 
>>> I think some things are broken in the oslo-incubator db migration code.
>>> 
>>> Ironic moved to this when Juno opened and things seemed fine, until recently
>>> when Lucas tried to add a DB migration and noticed that it didn't run... So
>>> I looked into it a bit today. Below are my findings.
>>> 
>>> Firstly, I filed this bug and proposed a fix, because I think that tests
>>> that don't run any code should not report that they passed -- they should
>>> report that they were skipped.
>>>  https://bugs.launchpad.net/oslo/+bug/1327397
>>>  "No notice given when db migrations are not run due to missing engine"
>>> 
>>> Then, I edited the test_migrations.conf file appropriately for my local
>>> mysql service, ran the tests again, and verified that migration tests ran --
>>> and they passed. Great!
>>> 
>>> Now, a little background... Ironic's TestMigrations class inherits from
>>> oslo's BaseMigrationTestCase, then "opportunistically" checks each back-end,
>>> if it's available. This opportunistic checking was inherited from Nova so
>>> that tests could pass on developer workstations where not all backends are
>>> present (eg, I have mysql installed, but not postgres), and still
>>> transparently run on all backends in the gate. I couldn't find such
>>> o

Re: [openstack-dev] [Oslo] [Ironic] DB migration woes

2014-06-09 Thread Mike Bayer

On Jun 9, 2014, at 1:08 PM, Mike Bayer  wrote:

> 
> On Jun 9, 2014, at 12:50 PM, Devananda van der Veen  
> wrote:
> 
>> There may be some problems with MySQL when testing parallel writes in
>> different non-committing transactions, even in READ COMMITTED mode,
>> due to InnoDB locking, if the queries use non-unique secondary indexes
>> for UPDATE or SELECT..FOR UPDATE queries. This is done by the
>> "with_lockmode('update')" SQLAlchemy phrase, and is used in ~10 places
>> in Nova. So I would not recommend this approach, even though, in
>> principle, I agree it would be a much more efficient way of testing
>> database reads/writes.
>> 
>> More details here:
>> http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html and
>> http://dev.mysql.com/doc/refman/5.5/en/innodb-record-level-locks.html
> 
> OK, but just to clarify my understanding, what is the approach to testing 
> writes in parallel right now, are we doing CREATE DATABASE for two entirely 
> distinct databases with some kind of generated name for each one?  Otherwise, 
> if the parallel tests are against the same database, this issue exists 
> regardless (unless autocommit mode is used, is FOR UPDATE accepted under 
> those conditions?)

Took a look and this seems to be the case, from oslo.db:

def create_database(engine):
"""Provide temporary user and database for each particular test."""
driver = engine.name

auth = {
'database': ''.join(random.choice(string.ascii_lowercase)
for i in moves.range(10)),
# ...

sqls = [
"drop database if exists %(database)s;",
"create database %(database)s;"
]

Just thinking out loud here, I’ll move these ideas to a new wiki page after 
this post.My idea now is that OK, we provide ad-hoc databases for tests, 
but look into the idea that we create N ad-hoc databases, corresponding to 
parallel test runs - e.g. if we are running five tests concurrently, we make 
five databases.   Tests that use a database will be dished out among this pool 
of available schemas.   In the *typical* case (which means not the case that 
we’re testing actual migrations, that’s a special case) we build up the schema 
on each database via migrations or even create_all() just once, run tests 
within rolled-back transactions one-per-database, then the DBs are torn down 
when the suite is finished.

Sorry for the thread hijack.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate

2014-06-12 Thread Mike Bayer

On 6/12/14, 8:26 AM, Julien Danjou wrote:
> On Thu, Jun 12 2014, Sean Dague wrote:
>
>> That's not cacthable in unit or functional tests?
> Not in an accurate manner, no.
>
>> Keeping jobs alive based on the theory that they might one day be useful
>> is something we just don't have the liberty to do any more. We've not
>> seen an idle node in zuul in 2 days... and we're only at j-1. j-3 will
>> be at least +50% of this load.
> Sure, I'm not saying we don't have a problem. I'm just saying it's not a
> good solution to fix that problem IMHO.

Just my 2c without having a full understanding of all of OpenStack's CI
environment, Postgresql is definitely different enough that MySQL
"strict mode" could still allow issues to slip through quite easily, and
also as far as capacity issues, this might be longer term but I'm hoping
to get database-related tests to be lots faster if we can move to a
model that spends much less time creating databases and schemas.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] mysql/mysql-python license "contamination" into openstack?

2014-06-12 Thread Mike Bayer

On Thu Jun 12 14:13:05 2014, Chris Friesen wrote:
> Hi,
>
> I'm looking for the community viewpoint on whether there is any chance
> of license contamination between mysql and nova.  I realize that
> lawyers would need to be involved for a proper ruling, but I'm curious
> about the view of the developers on the list.
>
> Suppose someone creates a modified openstack and wishes to sell it to
> others.  They want to keep their changes private.  They also want to
> use the mysql database.
>
> The concern is this:
>
> nova is apache licensed
> sqlalchemy is MIT licensed
> mysql-python (aka mysqldb1) is GPLv2 licensed
> mysql is GPLv2 licensed
>
>
>
> The concern is that since nova/sqlalchemy/mysql-python are all
> essentially linked together, an argument could be made that the work
> as a whole is a derivative work of mysql-python, and thus all the
> source code must be made available to anyone using the binary.
>
> Does this argument have any merit?

the GPL is excepted in the case of MySQL and other MySQL products 
released by Oracle (can you imagine such a sentence being 
written.), see 
http://www.mysql.com/about/legal/licensing/foss-exception/.   If 
MySQL-Python itself were an issue, OpenStack could switch to another 
MySQL library, such as MySQL Connector/Python which is now MySQL's 
official Python driver: 
http://dev.mysql.com/doc/connector-python/en/index.html

> Has anyone tested any of the mysql DBAPIs with more permissive licenses?

I just mentioned other MySQL drivers the other day; MySQL 
Connector/Python, OurSQL and pymysql are well tested within SQLAlchemy 
and these drivers generally pass all tests.   There's some concern over 
compatibility with eventlet, however, I can't speak to that just yet.

>
> Chris
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [glance] python namespaces considered harmful to development, lets not introduce more of them

2014-09-23 Thread Mike Bayer

On Sep 23, 2014, at 7:03 PM, Robert Collins  wrote:

> On 29 August 2014 04:42, Sean Dague  wrote:
>> On 08/28/2014 12:22 PM, Doug Hellmann wrote:
> ...
>>> The problem is that the setuptools implementation of namespace packages 
>>> breaks in a way that is repeatable but difficult to debug when a common 
>>> OpenStack installation pattern is used. So the fix is “don’t do that” where 
>>> I thought “that” meant the installation pattern and Sean thought it meant 
>>> “use namespace packages”. :-)
>> 
>> Stupid english... be more specific!
>> 
>> Yeh, Doug provides the most concise statement of where we failed on
>> communication (I take a big chunk of that blame). Hopefully now it's a
>> lot clearer what's going on, and why it hurts if you do it.
> 
> So... FWIW I think I've got a cleaner implementation of namespaces
> *for our context* - it takes inspiration from the PEP-420 discussion
> and final design. It all started when Mike reported issues with testr
> to me.
> 
> https://bugs.launchpad.net/oslo.db/+bug/1366869

I think you’ve got the wrong bug linked in here and your review, seems like you 
meant this one:

https://bugs.launchpad.net/oslo.db/+bug/1372250



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][LBaaS] Migrations in feature branch

2014-09-25 Thread Mike Bayer

If Neutron is ready for more Alembic features I could in theory begin work on 
https://bitbucket.org/zzzeek/alembic/issue/167/multiple-heads-branch-resolution-support
 .Folks should ping me on IRC regarding this.


On Sep 24, 2014, at 5:30 AM, Salvatore Orlando  wrote:

> Relying again on automatic schema generation could be error-prone. It can 
> only be enabled globally, and does not work when models are altered if the 
> table for the model being altered already exists in the DB schema.
> 
> I don't think it would be a big problem to put these migrations in the main 
> sequence once the feature branch is merged back into master.
> Alembic unfortunately does not yet do a great job in maintaining multiple 
> timelines. Even if only a single migration branch is supported, in theory one 
> could have a separate alembic environment for the feature branch, but that in 
> my opinion just creates the additional problem of handling a new environment, 
> and does not solve the initial problem of re-sequencing migrations.
> 
> Re-sequencing at merge time is not going to be a problem in my opinion. 
> However, keeping all the lbaas migrations chained together will help. You can 
> also do as Henry suggests, but that option has the extra (possibly 
> negligible) cost of squashing all migrations for the whole feature branch at 
> merge time.
> 
> As an example:
> 
> MASTER  ---> X -> X+1 -> ... -> X+n
> \
> FEATURE  \-> Y -> Y+1 -> ... -> Y+m
> 
> At every rebase of rebase the migration timeline for the feature branch could 
> be rearranged as follows:
> 
> MASTER  ---> X -> X+1 -> ... -> X+n --->
>  \
> FEATURE   \-> Y=X+n -> Y+1 -> ... -> Y+m = X+n+m
> 
> And therefore when the final merge in master comes, all the migrations in the 
> feature branch can be inserted in sequence on top of master's HEAD.
> I have not tried this, but I reckon that conceptually it should work.
> 
> Salvatore
> 
> 
> On 24 September 2014 08:16, Kevin Benton  wrote:
> If these are just feature branches and they aren't intended to be
> deployed for long life cycles, why don't we just skip the db migration
> and enable auto-schema generation inside of the feature branch? Then a
> migration can be created once it's time to actually merge into master.
> 
> On Tue, Sep 23, 2014 at 9:37 PM, Brandon Logan
>  wrote:
> > Well the problem with resequencing on a merge is that a code change for
> > the first migration must be added first and merged into the feature
> > branch before the merge is done.  Obviously this takes review time
> > unless someone of authority pushes it through.  We'll run into this same
> > problem on rebases too if we care about keeping the migration sequenced
> > correctly after rebases (which we don't have to, only on a merge do we
> > really need to care).  If we did what Henry suggested in that we only
> > keep one migration file for the entire feature, we'd still have to do
> > the same thing.  I'm not sure that buys us much other than keeping the
> > feature's migration all in one file.
> >
> > I'd also say that code in master should definitely NOT be dependent on
> > code in a feature branch, much less a migration.  This was a requirement
> > of the incubator as well.
> >
> > So yeah this sounds like a problem but one that really only needs to be
> > solved at merge time.  There will definitely need to be coordination
> > with the cores when merge time comes.  Then again, I'd be a bit worried
> > if there wasn't since a feature branch being merged into master is a
> > huge deal.  Unless I am missing something I don't see this as a big
> > problem, but I am highly capable of being blind to many things.
> >
> > Thanks,
> > Brandon
> >
> >
> > On Wed, 2014-09-24 at 01:38 +, Doug Wiegley wrote:
> >> Hi Eugene,
> >>
> >>
> >> Just my take, but I assumed that we’d re-sequence the migrations at
> >> merge time, if needed.  Feature branches aren’t meant to be optional
> >> add-on components (I think), nor are they meant to live that long.
> >>  Just a place to collaborate and work on a large chunk of code until
> >> it’s ready to merge.  Though exactly what those merge criteria are is
> >> also yet to be determined.
> >>
> >>
> >> I understand that you’re raising a general problem, but given lbaas
> >> v2’s state, I don’t expect this issue to cause many practical problems
> >> in this particular case.
> >>
> >>
> >> This is also an issue for the incubator, whenever it rolls around.
> >>
> >>
> >> Thanks,
> >> doug
> >>
> >>
> >>
> >>
> >> On September 23, 2014 at 6:59:44 PM, Eugene Nikanorov
> >> (enikano...@mirantis.com) wrote:
> >>
> >> >
> >> > Hi neutron and lbaas folks.
> >> >
> >> >
> >> > Recently I briefly looked at one of lbaas proposed into feature
> >> > branch.
> >> > I see migration IDs there are lined into a general migration
> >> > sequence.
> >> >
> >> >
> >> > I think something is definitely wrong with this approach as
> >> > feat

Re: [openstack-dev] [nova][cinder] (OperationalError) (1040, 'Too many connections') None None

2014-09-29 Thread Mike Bayer

On Sep 28, 2014, at 5:56 PM, Nader Lahouti  wrote:

> Hi All,
> 
> I am seeing 'Too many connections' error in nova api and cinder when when 
> installing openstack using the latest..
> The error happens when launching couple of VMs (in this test around 5 VMs).
> 
> Here are the logs when error happens:
> 
> (1) nova-api logs/traceback:
> http://paste.openstack.org/show/116414/
> 
> (2) cinder api logs/traceback:
> http://paste.openstack.org/show/hbaomc5IVS3mig8z2BWq/
> 
> (3) Stats of some connections:
> http://paste.openstack.org/show/116425/
> 
> As it shown in (3) the issue is not seen with icehouse release.
> 
> Can somebody please let me know if it is a known issue?

I’ve not been alerted to this before, the stats on (3) look pretty alarming.

anyone else seeing things like this?



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] (OperationalError) (1040, 'Too many connections') None None

2014-09-29 Thread Mike Bayer




On Sep 29, 2014, at 12:31 PM, Nader Lahouti  wrote:

> Hi Jay,
> 
> Thanks for your reply. 
> 
> I'm not able to use mysql command line.
> $ mysql
> ERROR 1040 (HY000): Too many connections
> $
> Is there any other way to collect the information?


you can try stopping everything, getting on the command line *first* and 
leaving it open, then rerunning your whole environment, so that you’ve reserved 
that spot at least to do testing queries.




> 
> 
> Thanks,
> Nader.
> 
> 
> On Mon, Sep 29, 2014 at 8:42 AM, Jay Pipes  wrote:
> On Sun, Sep 28, 2014 at 5:56 PM, Nader Lahouti  
> wrote:
> Hi All,
> 
> I am seeing 'Too many connections' error in nova api and cinder when when 
> installing openstack using the latest..
> The error happens when launching couple of VMs (in this test around 5 VMs).
> 
> Here are the logs when error happens:
> 
> (1) nova-api logs/traceback:
> http://paste.openstack.org/show/116414/
> 
> (2) cinder api logs/traceback:
> http://paste.openstack.org/show/hbaomc5IVS3mig8z2BWq/
> 
> (3) Stats of some connections:
> http://paste.openstack.org/show/116425/
> 
> As it shown in (3) the issue is not seen with icehouse release.
> 
> Can somebody please let me know if it is a known issue?
> 
> Hi Nader,
> 
> Would you mind pastebin'ing the contents of:
> 
>  SHOW FULL PROCESSLIST\G
> 
> when executed from the mysql command line client? 
> 
> That will help to show us which threads are stuck executing what in MySQL.
> 
> Best,
> -jay
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] (OperationalError) (1040, 'Too many connections') None None

2014-09-29 Thread Mike Bayer
what’s spooking me is the original paste at 
http://paste.openstack.org/show/116425/ which showed:

icehouse:

Fri Sep 26 17:00:50 PDT 2014
Number of open TCP:3306 - 58
Number of open TCP:3306 nova-api - 5
Number of open TCP:3306 mysqld - 29
Number of open TCP:8774 - 10
Number of nova-api - 14


fresh startup with juno:

Fri Sep 26 09:42:58 PDT 2014
Number of open TCP:3306 - 152
Number of open TCP:3306 nova-api - 7
Number of open TCP:3306 mysqld - 76
Number of open TCP:8774 - 66
Number of nova-api - 99


does that seem right that an upgrade from icehouse would cause there to be 99 
nova-api procs at startup where there were only 14 before?




On Sep 29, 2014, at 1:34 PM, Amrith Kumar  wrote:

> Yes, looks like MySQL was just configured with too low a max-connections 
> value.
> 
> -amrith
> 
> | -Original Message-
> | From: Jay Pipes [mailto:jaypi...@gmail.com]
> | Sent: Monday, September 29, 2014 12:58 PM
> | To: openstack-dev@lists.openstack.org
> | Subject: Re: [openstack-dev] [nova][cinder] (OperationalError) (1040, 'Too
> | many connections') None None
> | 
> | On 09/29/2014 12:48 PM, Nader Lahouti wrote:
> | > Hi Jay,
> | >
> | > I login first and the recreated the problem and here is the log:
> | > http://paste.openstack.org/show/116776/
> | 
> | OK. Looks like there isn't anything wrong with your setup. I'm guessing
> | you have set up Keystone to run in Apache with 10 worker processes, and
> | you have the workers config option setting in nova.conf, neutron.conf and
> | all the other project configuration files set to 0, which will trigger a
> | number of worker processes equal to the number of CPU cores on your box,
> | which I'm guessing from looking at your SHOW FULL PROCESSLIST is around
> | 24-32 cores.
> | 
> | Solution: either lower the workers configuration option from 0 to
> | something like 12, or increase the max_connections setting in your my.cnf
> | to something that can handle the worker processes from all the OpenStack
> | services (I'd recommend something like 2000).
> | 
> | Best,
> | -jay
> | 
> | ___
> | OpenStack-dev mailing list
> | OpenStack-dev@lists.openstack.org
> | http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Group-based Policy] Database migration chain

2014-10-04 Thread Mike Bayer


On Oct 4, 2014, at 1:10 AM, Kevin Benton  wrote:

> Does sqlalchemy have good support for cross-database foreign keys? I was 
> under the impression that they cannot be implemented with the normal syntax 
> and semantics of an intra-database foreign-key constraint. 

cross “database” is not typically portable, but cross “schema” is.   

different database vendors have different notions of “databases” or “schemas”.

if you can get the “other database” to be accessible from the target database 
via “otherdatabase.sometable”, then you’re in.

from SQLAlchemy’s perspective, it’s just a name with a dot.   It’s the database 
itself that has to support the foreign key at the scope you are shooting for.



> 
> On Fri, Oct 3, 2014 at 5:25 PM, Ivar Lazzaro  wrote:
> Hi,
> 
> Following up the latest GBP team meeting [0][1]:
> 
> As we keep going with our Juno stackforge implementation [2], although the 
> service is effectively a Neutron extension, we should avoid breaking 
> Neutron's migration chain by adding our model on top of it (and subsequently 
> changing Neutron's HEAD [3]). If we do that, upgrading from Juno to Kilo will 
> be painful for those who have used GBP. 
> 
> There are roughly a couple of possibilities for reaching this goal:
> 
> 1) Using a separate DBs with separate migration chain;
> 2) Using the same DB with separate chain (and different alembic version 
> table).
> 
> Personally I prefer the option 1, moving to a completely different database 
> while leveraging cross database foreign keys.
> 
> Please let me know your preference, or alternative solutions! :)
> 
> Cheers,
> Ivar.
> 
> [0] 
> http://eavesdrop.openstack.org/meetings/networking_policy/2014/networking_policy.2014-09-25-18.02.log.html
> [1] 
> http://eavesdrop.openstack.org/meetings/networking_policy/2014/networking_policy.2014-10-02-18.01.log.html
> [2] https://github.com/stackforge/group-based-policy
> [3] https://review.openstack.org/#/c/123617/
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> -- 
> Kevin Benton
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Group-based Policy] Database migration chain

2014-10-04 Thread Mike Bayer

On Oct 4, 2014, at 11:24 AM, Clint Byrum  wrote:

> 
> Excerpts from Mike Bayer's message of 2014-10-04 08:10:38 -0700:
>> 
>> On Oct 4, 2014, at 1:10 AM, Kevin Benton  wrote:
>> 
>>> Does sqlalchemy have good support for cross-database foreign keys? I was 
>>> under the impression that they cannot be implemented with the normal syntax 
>>> and semantics of an intra-database foreign-key constraint. 
>> 
>> cross “database” is not typically portable, but cross “schema” is.   
>> 
>> different database vendors have different notions of “databases” or 
>> “schemas”.
>> 
>> if you can get the “other database” to be accessible from the target 
>> database via “otherdatabase.sometable”, then you’re in.
>> 
>> from SQLAlchemy’s perspective, it’s just a name with a dot.   It’s the 
>> database itself that has to support the foreign key at the scope you are 
>> shooting for.
>> 
> 
> All true, however, there are zero guarantees that databases will be
> hosted on the same server, and typically permissions are setup to prevent
> cross-schema joins.

the impression I’ve gotten so far is that they are looking to have different 
sets of database tables isolated into groups that can be migrated 
independently, and rather than using multiple alembic version tables, they’d go 
with this approach. This to me means that on a postgresql backend these would 
just be individual sub-schemas, and on a MySQL backend would be other databases 
that are limited to being on the same host (which is MySQL’s version of 
“schemas”, in that the “CREATE SCHEMA” command is a synonym for “CREATE 
DATABASE").

> 
> Typically we use the public API's when we want to access data in a
> different application. The database is a private implementation detail
> of each application.

I was just having a discussion on IRC the other day with a Neutron dev stating 
that they’d like to break out several parts of Neutron into “plugins”, which 
would have their own database tables but still linked to the database tables of 
the core Neutron application.   If this is the case, within an architecture 
like that you’d have different sub-applications cross-accessing databases.




> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo][db][docs] RFC: drop support for libpq < 9.1

2014-10-06 Thread Mike Bayer

On Oct 6, 2014, at 9:56 AM, Ihar Hrachyshka  wrote:

> >
> > But we can do better. We should also enforce utf8 on client side,
> > so that there is no way to run with a different encoding, and so
> > that we may get rid of additional options in sql connection
> > strings. I've sent a patch for oslo.db [4] to do just that.

i would recommend that we definitely do *not* set explicit client encodings on 
all columns, and go with the most minimal approach for whatever the target 
database is.For example, with Postgresql 8.4, utf-8 is not an issue so long 
as client_encoding is set within postgresql.conf.I’ve had this kind of 
discussion many times in the past with folks who are trying to “protect” some 
subset of users that don’t have this setting in their conf file, either because 
they installed from an OSX package or some other weird reason, and generally 
I’m not buying it.There’s no need to build tremendous verbosity and 
fine-grained specificity throughout a system in order to appease very narrow 
edge cases like this.   It’s not just about potential performance problems, 
it’s also just a schema and code management issue, in that it is complexity 
that IMHO is just not needed.

For this reason I’m pretty ambivalent overall about any kind of utf-8 
hardcoding within the application connection logic.  IMHO this should be 
handled at the configurational layer as much as possible.  Though as long as 
it’s just an application time setting, and not something hardwired throughout 
the schema (implying we now have to *rely* upon UTF-8 encoding explicitly for 
all columns everywhere…and then what happens if we are on some target database 
that doesn’t quite do things the same way, e.g. DB2, oracle, others?), it’s not 
*too* big of a deal for me if it solves some problems right now.

short answer, there should be no need to drop PG8.X support for this reason.   
If the client encoding setting is something we want hardcoded in the app layer 
and it fails for those versions (which I’m not familiar with? what is the thing 
that is not actually supported prior to libpq 9.1 ?  psycopg2’s 
set_client_encoding, really?   this doesn’t sound familiar two me), we can 
detect that and forego the setting - sqlalchemy dialect has server_version_info 
for this reason.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo][db][docs] RFC: drop support for libpq < 9.1

2014-10-07 Thread Mike Bayer

On Oct 7, 2014, at 8:29 AM, Ihar Hrachyshka  wrote:

> 
> That said, I wonder how we're going to manage cases when those
> *global* settings for the whole server should be really limited to
> specific databases. Isn't it better to enforce utf8 on service side,
> since we already know that we always run against utf8 database?

I think whenever we do a “CREATE DATABASE”, we should make sure the desired 
encodings are set up at that level.  I’ve seen lots of migration scripts that 
are listing through tables and setting each table individually to utf-8, and 
that’s less than ideal, though I suppose that’s more of a retroactive fix.

> 
> Please let me clarify... Do you say that setting client encoding on
> oslo.db side is actually ok? I haven't suggested to enforce utf8 per
> column/table, though I guess we're already there.

The way we are setting client encoding now should be fine, if you could clarify 
what about that isn’t working for PG 8.4 that would be helpful.IMHO 
especially on Postgresql we should be able to get away with not having it.   
MySQLdb is not as nicely implemented as far as encoding (including the 
use_unicode speed issues) so it may be more necessary there.

But yes what I really *dont* want is the encoding set up on every column 
definition, e.g. “VARCHAR(20) CHARACTER SET ‘utf-8’”, that’s been suggested 
before and that would be terrible.   I don’t think Postgresql has quite the 
same thing available (only collation per column).

> 
> Forgoing, again, means some users running with utf8 client encoding,
> others without it.
> I strive towards consistency here, that's why I try
> to find a way to set the setting for all clients we support (and the
> question is *which* of those clients we really support).
> 
> The thing that is not supported in psql < 9.1 is a connection option
> added to libpq as of:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=02e14562a806a96f38120c96421d39dfa7394192

OK but that is just the connection parameter, when you pass client_encoding to 
SQLAlchemy’s psycopg2 dialect, the encoding is set using psycopg2’s 
set_client_encoding() method: 
http://initd.org/psycopg/docs/connection.html#connection.set_client_encoding.  
This ultimately emits “SET client_encoding TO ‘utf8’” on the connection:

conn_set_client_encoding -> 
https://github.com/psycopg/psycopg2/blob/master/psycopg/connection_int.c#L1188

pq_set_guc_locked -> 
https://github.com/psycopg/psycopg2/blob/master/psycopg/pqpath.c#L709

“SET client_encoding TO ” is supported in all Postgresql versions, 
here’s 8.0: http://www.postgresql.org/docs/8.0/static/multibyte.html

So there’s no issue with Postgresql 8.2 here as far as client encoding, the 
libpq feature isn’t used for that.   Also, it defaults to the encoding that is 
set for the database (e.g. server side), so if we make sure CREATE DATABASE is 
emitted with the encoding, then we wouldn’t even need to set it (and perhaps we 
shouldn’t if the database is set to a different encoding).





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Meeting time

2014-10-07 Thread Mike Bayer

On Oct 7, 2014, at 3:15 PM, Doug Hellmann  wrote:

> 
> I remember talking about that at some point, but I don’t remember why we 
> decided against it. Since we have several options for earlier in the week 
> using the existing meeting rooms I think it’s safe to pick one of those.
> 
> I have a slight preference for the Monday time slot, but all of them work for 
> me. Does anyone else have a preference or a hard conflict with any of these 
> times?

I’m fine with whatever.




> 
> Doug
> 
>> 
>> Please let me know what you think.
>> 
>> Thanks,
>> Roman
>> 
>> [0] 
>> https://www.google.com/calendar/ical/bj05mroquq28jhud58esggq...@group.calendar.google.com/public/basic.ics
>> 
>> [1] 
>> http://www.timeanddate.com/worldclock/meetingdetails.html?year=2014&month=11&day=3&hour=16&min=0&sec=0&p1=367&p2=195&p3=179&p4=224
>> 
>> [2] 
>> http://www.timeanddate.com/worldclock/meetingdetails.html?year=2014&month=11&day=6&hour=16&min=0&sec=0&p1=367&p2=195&p3=179&p4=224
>> 
>> [3] 
>> http://www.timeanddate.com/worldclock/meetingdetails.html?year=2014&month=11&day=6&hour=17&min=0&sec=0&p1=367&p2=195&p3=179&p4=224
>> 
>> [4] 
>> http://www.timeanddate.com/worldclock/meetingdetails.html?year=2014&month=11&day=7&hour=16&min=0&sec=0&p1=367&p2=195&p3=179&p4=224
>> 
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [oslo] Acceptable methods for establishing per-test-suite behaviors

2014-10-07 Thread Mike Bayer
Hi folks -

So just to update on the status here, for the transactional testing and all 
that, I’ve had a blueprint in the queue for quite awhile:

https://review.openstack.org/#/c/117335/

and for a long time have had a working (with one caveat) version of the whole 
thing that uses testresources:

https://review.openstack.org/120870 , which is also dependent on 
https://review.openstack.org/110170 as a first step.

to recap, the purpose here is to provide a system that 1. creates and drops 
anonymous databases in an efficient way, where they are held open for the span 
of many (all?) tests on a per-process basis,   2. provides utilities to create 
schemas within these databases if needed, which if used are also held open for 
the span of many tests 3. allows individual tests to run within transactions 
where the state of the transaction is unconditionally rolled back at the end of 
each test, but the schema remains  4. unifies and simplifies the system by 
which we connect to lots of different kinds of databases within tests  5. makes 
the whole system of running against multiple backends easier to expand upon for 
new backends.

This system works great, with the catch that as the original email describes 
here, to get something to happen “at the end of *all* tests”, e.g. drop those 
databases, is very challenging.The testresources way is to bundle all tests 
into a single suite called OptimisingTestSuite, which then provides its own 
run() method.OptimisingTestSuite can be rolled out on a per-module, or 
per-package, basis.   

What we’re shooting for here is to roll it out on a “per all tests completely” 
basis, by putting it in the top level __init__.py file where the tests start.   
Unfortunately there are some issues where depending on how tests are run, this 
top level file might not be loaded as a package, and then the load_tests() hook 
which this relies upon does not get invoked.  Robert Collins created 
http://bugs.python.org/issue22457 to check with the Python devs on how this 
issue can be resolved, and if/when someone ever replies to it, he will commit 
similar workarounds to testtools directly.   There is also some potential 
overlap with the fact that oslo.db itself runs tests without using the -d flag 
and instead points right at “tests/“, which would imply that we might need to 
move “tests/“ in oslo.db itself into more of a package like “oslodb/tests”, but 
I am not sure.   In any case, any other project can still put the special 
load_tests() hook into their top level __init__.py or even within individual 
modules.

So from my POV these patches are held up just so that load_tests() can work in 
all cases no matter where we put it, but in reality I think it’s fine that it 
would be placed in more database-test-specific locations.

I’d really like to get these patches in so please feel free to review the spec 
as well as the patches.

- mike



On Aug 22, 2014, at 3:35 PM, Mike Bayer  wrote:

> Hi all -
> 
> I’ve spent many weeks on a series of patches for which the primary goal is to 
> provide very efficient patterns for tests that use databases and schemas 
> within those databases, including compatibility with parallel tests, 
> transactional testing, and scenario-driven testing (e.g. a test that runs 
> multiple times against different databases).
> 
> To that end, the current two patches that achieve this behavior in a 
> rudimental fashion are part of oslo.db and are at: 
> https://review.openstack.org/#/c/110486/ and 
> https://review.openstack.org/#/c/113153/.They have been in the queue for 
> about four weeks now.  The general theory of operation is that within a 
> particular Python process, a fixed database identifier is established 
> (currently via an environment variable).   As tests request the services of 
> databases, such as a Postgresql database or a MySQL database, the system will 
> provision a database within that backend of that fixed identifier and return 
> it.   The test can then request that it make use of a particular “schema” - 
> for example, Nova’s tests may request that they are using the “nova schema”, 
> which means that the schema for Nova’s model will be created within this 
> database, and will them remain permanently across the span of many tests 
> which use this same schema.  Only when a test requests that it wants a 
> different schema, or no schema, will the tables be dropped.To ensure the 
> schema is “clean” for every test, the provisioning system ensures that each 
> test runs within a transaction, which at test end is rolled back.In order 
> to accommodate tests that themselves need to roll back, the test additionally 
> runs within the context of a SAVEPOINT.   This system is entirely working, 
> and for those that are wondering, yes it works with SQLite as well (see 
> https://review.openstack.org/#/c/113152/).
> 
> And 

[openstack-dev] [all] [oslo] Proposed database connectivity patterns

2014-10-08 Thread Mike Bayer
Hi all -

I’ve drafted up my next brilliant idea for how to get Openstack projects to use 
SQLAlchemy more effectively.   The proposal here establishes significant detail 
on what’s wrong with the current state of things, e.g. the way I see 
EngineFacade, get_session() and get_engine() being used, and proposes a new 
system that provides a true facade around a managed context.   But of course, 
it requires that you all change your code!  (a little bit).  Based on just a 
few tiny conversations on IRC so far, seems like this might be a hard sell.  
But please note, most projects are pretty much broken in how they use the 
database - this proposal is just a first step to making it all non-broken, if 
not as amazing and cool as some people wish it could be.  Unbreaking the code 
and removing boilerplate is the first step - with sane and declarative patterns 
in place, we can then build in more bells and whistles.

Hoping you’re all super curious now, here it is!  Jump in:  
https://review.openstack.org/#/c/125181/

- mike







___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [oslo] Proposed database connectivity patterns

2014-10-09 Thread Mike Bayer
So so far, everyone seems really positive and psyched about the proposal.

It looks like providing some options for how to use would be best, that is 
provide decorators and context managers.

Now the thing with the context manager, it can be as I stated:

with sql.reader() as session:

or we can even have that accept the “context”:

with sql.reader(context) as session:

The latter again avoids having to use thread locals.

But can I get a feel for how comfortable we are with using thread local storage 
to implement this feature?   I had anticipated people wouldn’t like it because 
it’s kind of a “global” object, even though it will be hidden behind this 
facade (of course CONF is global as is sys.modules, and I think it is fine).
 If I just use a tlocal, this whole thing is pretty simple.

I’m going to do another pass that attempts to unify these three syntaxes - I’m 
proposing calling the context manager “using_” so that it can be differentiated 
from the decorator (e.g. so each function doesn’t need to inspect arguments):

@sql.reader
def my_api_method(context, …):
context.session

def my_api_method(context, …):
   with sql.using_reader(context) as session:
 session , context.session

def my_api_method(…):
   with sql.using_reader() as session:
   session

all three will be fully interchangeable - meaning they will ultimately use 
thread local storage in any case.For now I think if 
sql.using_reader(context) or @sql.reader is called with different context 
identities in a single call stack, it should raise an exception - not that we 
can’t support that, but whether it means we should push new state onto the 
“stack” or not is ambiguous at the moment so we should refuse to guess.




On Oct 8, 2014, at 5:07 PM, Mike Bayer  wrote:

> Hi all -
> 
> I’ve drafted up my next brilliant idea for how to get Openstack projects to 
> use SQLAlchemy more effectively.   The proposal here establishes significant 
> detail on what’s wrong with the current state of things, e.g. the way I see 
> EngineFacade, get_session() and get_engine() being used, and proposes a new 
> system that provides a true facade around a managed context.   But of course, 
> it requires that you all change your code!  (a little bit).  Based on just a 
> few tiny conversations on IRC so far, seems like this might be a hard sell.  
> But please note, most projects are pretty much broken in how they use the 
> database - this proposal is just a first step to making it all non-broken, if 
> not as amazing and cool as some people wish it could be.  Unbreaking the code 
> and removing boilerplate is the first step - with sane and declarative 
> patterns in place, we can then build in more bells and whistles.
> 
> Hoping you’re all super curious now, here it is!  Jump in:  
> https://review.openstack.org/#/c/125181/
> 
> - mike
> 
> 
> 
> 
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [oslo] Proposed database connectivity patterns

2014-10-10 Thread Mike Bayer

On Oct 10, 2014, at 6:13 AM, Ihar Hrachyshka  wrote:

> Signed PGP part
> On 09/10/14 21:29, Mike Bayer wrote:
> > So so far, everyone seems really positive and psyched about the
> > proposal.
> >
> > It looks like providing some options for how to use would be best,
> > that is provide decorators and context managers.
> >
> > Now the thing with the context manager, it can be as I stated:
> >
> > with sql.reader() as session:
> >
> > or we can even have that accept the “context”:
> >
> > with sql.reader(context) as session:
> >
> > The latter again avoids having to use thread locals.
> >
> > But can I get a feel for how comfortable we are with using thread
> > local storage to implement this feature?   I had anticipated people
> > wouldn’t like it because it’s kind of a “global” object, even
> > though it will be hidden behind this facade (of course CONF is
> > global as is sys.modules, and I think it is fine). If I just
> > use a tlocal, this whole thing is pretty simple.
> 
> Won't the approach conflict with eventlet consumers that for some
> reason do not patch thread module, or do not patch it early enough? I
> guess in that case we may end up with mixed contexts.

I’ve been asking a lot about “hey are people cool with thread locals?” and have 
been waiting for what the concerns are.

Since I wrote that email I’ve shifted, and I’ve been considering only:

@sql.reader
def my_api_method(context, …):
   context.session

def my_api_method(context, …):
  with sql.using_reader(context) as session:
session , context.session

because in fact, if you *want* to use a thread local context, you can, 
explicitly with the above:

GLOBAL_CONTEXT = threading.local()

def my_api_method(…):
  with sql.using_reader(GLOBAL_CONTEXT) as session:
session 

I like that one the best.  But again, Keystone folks would need to accept this 
explicitness.

The challenge on my end is not technical in any way.  It’s getting every 
project to agree on a single approach and not getting bogged down with 
idealistics (like, “let’s build a dependency injection framework!”).Because 
this “everyone does it their own way” thing is crazy and has to stop.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [oslo] Proposed database connectivity patterns

2014-10-10 Thread Mike Bayer

On Oct 10, 2014, at 11:41 AM, Mike Bayer  wrote:

> I’ve been asking a lot about “hey are people cool with thread locals?” and 
> have been waiting for what the concerns are.
> 
> Since I wrote that email I’ve shifted, and I’ve been considering only:
> 
> @sql.reader
> def my_api_method(context, …):
>   context.session
> 
> def my_api_method(context, …):
>  with sql.using_reader(context) as session:
>session , context.session
> 
> because in fact, if you *want* to use a thread local context, you can, 
> explicitly with the above:
> 
> GLOBAL_CONTEXT = threading.local()
> 
> def my_api_method(…):
>  with sql.using_reader(GLOBAL_CONTEXT) as session:
>session 
> 
> I like that one the best.  But again, Keystone folks would need to accept 
> this explicitness.
> 
> The challenge on my end is not technical in any way.  It’s getting every 
> project to agree on a single approach and not getting bogged down with 
> idealistics (like, “let’s build a dependency injection framework!”).
> Because this “everyone does it their own way” thing is crazy and has to stop.

I’ve now pushed these changes, as well as a summation of all the alternatives 
so far, to the latest release.  See https://review.openstack.org/#/c/125181/.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [neutron] [oslo.db] model_query() future and neutron specifics

2014-10-20 Thread Mike Bayer
As I’ve established oslo.db blueprints which will roll out new SQLAlchemy 
connectivity patterns for consuming applications within both API [1] and tests 
[2], one of the next big areas I’m to focus on is that of querying.   If one 
looks at how SQLAlchemy ORM queries are composed across Openstack, the most 
prominent feature one finds is the prevalent use of the model_query() 
initiation function.This is a function that is implemented in a specific 
way for each consuming application; its purpose is to act as a factory for new 
Query objects, starting from the point of acquiring a Session, starting up the 
Query against a selected model, and then augmenting that Query right off with 
criteria derived from the given application context, typically oriented around 
the widespread use of so-called “soft-delete” columns, as well as a few other 
fixed criteria.

There’s a few issues with model_query() that I will be looking to solve, 
starting with the proposal of a new blueprint.   Key issues include that it 
will need some changes to interact with my new connectivity specification, it 
may need a big change in how it is invoked in order to work with some new 
querying features I also plan on proposing at some point (see 
https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Baked_Queries), and 
also it’s current form in some cases tends to slightly discourage the 
construction of appropriate queries.

In order to propose a new system for model_query(), I have to do a survey of 
how this function is implemented and used across projects.  Which is why we 
find me talking about Neutron today - Neutron’s model_query() system is a much 
more significant construct compared to that of all other projects.   It is 
interesting because it makes clear some use cases that SQLAlchemy may very well 
be able to help with.  It also seems to me that in its current form it leads to 
SQL queries that are poorly formed - as I see this, on one hand we can blame 
the structure of neutron’s model_query() for how this occurs, but on the other, 
we can blame SQLAlchemy for not providing more tools oriented towards what 
Neutron is trying to do.   The use case Neutron has here is very common 
throughout many Python applications, but as yet I’ve not had the opportunity to 
address this kind of pattern in a comprehensive way.   

I first sketched out my concerns on a Neutron issue 
https://bugs.launchpad.net/neutron/+bug/1380823, however I was encouraged to 
move it over to the mailing list.

Specifically with Neutron’s model_query(), we're talking here about the plugin 
architecture in neutron/db/common_db_mixin.py, where the 
register_model_query_hook() method presents a way of applying modifiers to 
queries. This system appears to be used by: db/external_net_db.py, 
plugins/ml2/plugin.py, db/portbindings_db.py, 
plugins/metaplugin/meta_neutron_plugin.py.

What the use of the hook has in common in these cases is that a LEFT OUTER JOIN 
is applied to the Query early on, in anticipation of either the filter_hook or 
result_filters being applied to the query, but only *possibly*, and then even 
within those hooks as supplied, again only *possibly*. It's these two 
"*possiblies*" that leads to the use of LEFT OUTER JOIN - this extra table is 
present in the query's FROM clause, but if we decide we don't need to filter on 
it, the idea is that it's just a left outer join, which will not change the 
primary result if not added to what’s being filtered. And even, in the case of 
external_net_db.py, maybe we even add a criteria "WHERE  IS 
NULL", that is doing a "not contains" off of this left outer join.

The result is that we can get a query like this:

SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL

this can happen for example if using External_net_db_mixin, the outerjoin to 
ExternalNetwork is created, _network_filter_hook applies 
"expr.or_(ExternalNetwork.network_id != expr.null())", and that's it.

The database will usually have a much easier time if this query is expressed 
correctly [3]:

   SELECT a.* FROM a INNER JOIN b ON a.id=b.aid

the reason this bugs me is because the SQL output is being compromised as a 
result of how the plugin system is organized. Preferable would be a system 
where the plugins are either organized into fewer functions that perform all 
the checking at once, or if the plugin system had more granularity to know that 
it needs to apply an optional JOIN or not.   My thoughts for new 
SQLAlchemy/oslo.db features are being driven largely by Neutron’s use case here.

Towards my goal of proposing a better system of model_query(), along with 
Neutron’s heavy use of generically added criteria, I’ve put some thoughts down 
on a new SQLAlchemy feature which would also be backported to oslo.db. The 
initial sketch is at 
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event,
 and the main idea is that Query would include a system by which we can ask 
que

[openstack-dev] [oslo.db] Add long-lived-transactionalized-db-fixtures - can we comment ?

2014-10-21 Thread Mike Bayer
Hi all -

Thanks for all the responses I got on my "Make EngineFacade a Facade” spec - 
plenty of people have commented, pretty much all positively so I’m pretty 
confident we can start building the basic idea of that out into a new review. 

I want to point out that there is another, closely related spec that has been 
around several weeks longer, which is to overhaul the capabilities of our test 
runner system: "Add long-lived-transactionalized-db-fixtures” - 
https://review.openstack.org/#/c/117335/.I’ve talked about this spec 
several times before on this list, and it is still out there, and I 
additionally have most of the implementation working for several weeks now.
The spec and implementation has been mostly twisting in the wind, partially due 
to a little bit of waiting for some possible changes to namespace packages and 
test invocation, however I’d like to reiterate that A. the whole series works 
right now independently of those changes 
(https://review.openstack.org/#/c/117335/, 
https://review.openstack.org/#/c/120870/), and B. the spec describing the 
system has hardly been +1/-1’ed by anyone in any case (comments positive or 
negative are appreciated!   I redid the whole thing in response to previous 
comments many weeks ago).

Just to try to pitch this series, yet again, here’s what we get:

1. a solid and extensible system of building up and tearing down databases 
(provisioning), as well as a DROP ALL of database objects, in a 
database-specific way (e.g. drops special objects like PG ENUM objects and 
such).   This is completed, works right now.

2. the ability to produce “transactionalized” test fixtures, where you can run 
as many tests as you want against a single schema that remains in place, each 
test instead has all of its changes rolled back inside of a transaction.  In 
particular this will make it lots easier for large test suites like Nova’s DB 
api suite to run against many databases efficiently, as it won’t have to drop 
and rebuild the whole schema for each test.This mechanism is completed, 
works right now, as soon as it’s merged I can start doing a proof of concept 
for Nova’s test_db_api.py.

3. the ability to run non-transactionalized tests like we do now, which going 
forward would remain appropriate at least for migration tests, on a fixed 
database-per-subprocess, emitting an unconditional DROP of all objects 
remaining in the schema at the end of each test without actually dropping the 
whole database.   Completed and works right now, I’ve done a test against 
Neutron’s migration tests and the optimising suite system works.

4. An overhaul to how connectivity for multiple databases is set up, e.g. 
Postgresql, MySQL, others.   The usual system of “opportunistic” looking around 
for backends remains unchanged, but you can affect the specific URLs that will 
be queried, as well as limit the test run to any particular database URL using 
an environment variable. The system also supports other databases besides the 
three of SQLite, PG, and MySQL now whereas it had some issues before which 
would prevent that.   Completed and works right now!

5. The ability to have a single test suite run automatically for any number of 
backends, including future backends that might not be added to oslo.db yet, 
replacing the current system of subclassing MySQLOpportunisticTest and 
PostgresqlOpportunisticTest.   Not completed!   But is fairly trivial.

6. Once the EngineFacade overhaul is in place, the two systems will integrate 
together so that it will be very simple for projects whose test suite currently 
runs off of CONF + sqlite will be able to use the new system just by dropping 
in a mixin class.

The four “pillars” I’m trying to get through, hopefully by the end of Kilo are: 
  1. application connectivity and transaction control 2. test connectivity and 
transaction 3. query modernization and 4. getting ready for Alembic (where I 
add SQLite support and multiple branch support). We definitely need #1 and 
#2, so I’d like to get continued feedback #2 so I can point that in the correct 
direction.

thanks all for your support!
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [oslo.db] model_query() future and neutron specifics

2014-10-25 Thread Mike Bayer

> On Oct 23, 2014, at 11:27 AM, Kyle Mestery  wrote:
> 
> Mike, first, thanks for sending out this detailed analysis. I'm hoping
> that some of the DB experts from the Neutron side have read this.
> Would it make sense to add this to our weekly meeting [1] for next
> week and discuss it during there? At least we could give it some
> airtime. I'm also wondering if it makes sense to grab some time in
> Paris on Friday to discuss in person. Let me know your thoughts.
> 
> Thanks,
> Kyle
> 
> [1] https://wiki.openstack.org/wiki/Network/Meetings
> 

hey Kyle -

both good ideas though I’m missing the summit this year due to a new addition 
to our family, and overall not around too much the next couple of weeks.I 
think I’ll be able to circle back to this issue more fully after the summit.

- mike



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [oslo][all] Alembic migrations for SQLite

2014-11-10 Thread Mike Bayer
Bonjour openstackers -

While you were all sipping champagne on the Champs-Élysées, I took some time to 
tackle one of the two most critically wanted features in Alembic, which is that 
of being able to migrate tables on a SQLite database with some degree of 
sanity.   My immediate focus on Alembic was spurred on partially because some 
changes to the test suite pushed it into 0.7.0, and then we got a very large 
number of bug fixes in, so the urgency to get 0.7.0 is relatively high; but 
what good is a pseudo-major point release without some big new features ?

The feature works similarly to that of SQLAlchemy-Migrate, but I’m hoping in a 
way that is more controllable and open-ended.   I would encourage all those 
interested to take a look at the basic mode of operation over at 
http://alembic.readthedocs.org/en/latest/tutorial.html#running-batch-migrations-for-sqlite-and-other-databases.
   Highlights include that several table operations can take place within one 
“move and copy” operation at once, and the system can also be applied to other 
databases if one so desired (not a common use case but some have expressed 
interest in this being possible…so it is!).   The format of a SQLite-compatible 
migration script will change slightly, though for the better as per-table 
operations are grouped together, and the scripts of course in the default case 
continue to work exactly as before on all other target databases.

I know that a handful of projects have moved or started with Alembic, and I’d 
like to continue pushing Alembic to be the single solution across all projects. 
 There’s some work in oslo.db to define a common environmental format as well 
(see https://review.openstack.org/#/c/112842/).  I would encourage projects to 
continue to evaluate moving their migrations over to Alembic at some point in 
the future, which should also include sending me emails/ircs/bug reports about 
what additional features/fixes are needed.

The next major feature for Alembic, which I will tentatively use this week to 
see if I can get it online, is the multiple heads/branch resolution feature 
(https://bitbucket.org/zzzeek/alembic/issue/167/multiple-heads-branch-resolution-support)
 which a *lot* of people are really asking for.   This feature would allow 
independent migration series to coexist simultaneously, as well “merge point” 
migrations that join disparate branches back into a single stream.   The risk 
level for this feature is significantly higher than that of the SQLite 
migration feature, as while the SQLite migration feature lives entirely within 
a new API that is otherwise unused, the multiple branch feature makes some 
fundamental changes to how versioning is performed.   So while I’d like to get 
this in 0.7.0 as well, if it gets into the weeds I may have to push 0.7.0 
without it as there’s really a crapload of other fixes to be pushed.

Thanks for reading!

- mike




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Consistency, efficiency, and safety of NovaObject.save()

2014-11-12 Thread Mike Bayer

> On Nov 12, 2014, at 12:45 PM, Dan Smith  wrote:
> 
>> I personally favour having consistent behaviour across the board. How
>> about updating them all to auto-refresh by default for consistency,
>> but adding an additional option to save() to disable it for particular
>> calls?
> 
> I think these should be two patches: one to make them all auto-refresh,
> and another to make it conditional. That serves the purpose of (a)
> bisecting a regression to one or the other, and (b) we can bikeshed on
> the interface and appropriateness of the don't-refresh flag :)
> 
>> I also suggest a tactical fix to any object which fetches itself twice
>> on update (e.g. Aggregate).
> 
> I don't see that being anything other than an obvious win, unless there
> is some obscure reason for it. But yeah, seems like a good thing to do.

lets keep in mind my everyone-likes-it-so-far proposal for reader() and 
writer(): https://review.openstack.org/#/c/125181/   (this is where it’s going 
to go as nobody has -1’ed it, so in absence of any “no way!” votes I have to 
assume this is what we’re going with).

in this system, the span of session use is implicit within the context and/or 
decorator, and when writer() is specified, a commit() can be implicit as well.  
IMHO there should be no “.save()” at all, at least as far as database writing 
is concerned. SQLAlchemy doesn’t need boilerplate like that - just let the 
ORM work normally:

@sql.writer
def some_other_api_method(context):
someobject = context.session.query(SomeObject)….one()
someobject.change_some_state()

# done!

if you want an explicit refresh, then just do so:

@sql.writer
def some_other_api_method(context):
someobject = context.session.query(SomeObject)….one()
someobject.change_some_state()

context.session.flush()
context.session.refresh(someobject)
# do something with someobject

however, seeing as this is all one API method the only reason you’d want to 
refresh() is that you think something has happened between that flush() and the 
refresh() that would actually show up, I can’t imagine what that would be 
looking for, unless maybe some large amount of operations took up a lot of time 
between the flush() and the refresh().



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Consistency, efficiency, and safety of NovaObject.save()

2014-11-12 Thread Mike Bayer

> On Nov 12, 2014, at 10:56 AM, Matthew Booth  wrote:
> 
> For brevity, I have conflated what happens in object.save() with what
> happens in db.api. Where the code lives isn't relevant here: I'm only
> looking at what happens.
> 
> Specifically, the following objects refresh themselves on save:
> 
> Aggregate
> BlockDeviceMapping
> ComputeNode

> Excluding irrelevant complexity, the general model for objects which
> refresh on update is:
> 
> object = 
> object.update()
> object.save()
> return 
> 
> Some objects skip out the second select and return the freshly saved
> object. That is, a save involves an update + either 1 or 2 selects.

If I may inquire as to the irrelevant complexity, I’m trying to pinpoint where 
you see this happening.

When we talk about updating a ComputeNode, because I’m only slightly familiar 
with Nova’s codebase, I assume we are looking at “def compute_node_update()” on 
line 633 of nova/db/sqlalchemy/api.py ?

if that’s the full extent of it, I’m not seeing the second select:

def compute_node_update(context, compute_id, values):
"""Updates the ComputeNode record with the most recent data."""

session = get_session()
with session.begin():
compute_ref = _compute_node_get(context, compute_id, 
session=session)
values['updated_at'] = timeutils.utcnow()
datetime_keys = ('created_at', 'deleted_at', 'updated_at')
convert_objects_related_datetimes(values, *datetime_keys)
compute_ref.update(values)

return compute_ref

so “with session.begin()”, when that context ends, will emit the flush of the 
compute_ref, and then commit the transaction.  The Session by default has a 
behavior “expire_on_commit”, which means that when this compute_ref is returned 
to the outside world, the first thing that accesses anything on it *will* emit 
a SELECT for the row again.  However, as far as I can tell the expire_on_commit 
flag is turned off.   get_session() returns from oslo.db’s EngineFacade (the 
subject of my previously mentioned blueprint), and that passes through 
“expire_on_commit” of False by default.  It is definitely False when oslo.db 
does the sessionmaker and I see no code that is setting it to True anywhere.
The save() method is not used here either, but even if it is, NovaBase.save() 
calls into ModelBase.save() which just calls a flush(), shouldn’t be emitting a 
SELECT either.

Let me know if a. I’m looking in the right place, b. if this second SELECT is 
actually observed; if it’s occurring I’d like to understand better what we’re 
looking at.




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   >