Really interesting thread... Thanks Massimo about the clear explanation about web2py pros and con... Also I like the lookup table cache tricks...
Richard On Sat, Dec 29, 2012 at 5:30 PM, Massimo Di Pierro < massimo.dipie...@gmail.com> wrote: > The problem is that SQLA is from this prospective lower level. You can use > scripts/extract_pgsql_models.py to introspect the database and "guess" the > model, but while this information is sufficient for SQLA, it is not > sufficient for web2py. Web2py needs to know more about the column than it > is known to the database. For example a varchar field could be an hashed > password, or a serialized list of references, etc. Introspection does not > help you except in trivial cases. > > Massimo > > > > On Saturday, 29 December 2012 15:30:10 UTC-6, Michele Comitini wrote: > >> This thread reminds me of a feature I would like to see in the DAL. >> >> The DAL is the missing introspection features SQLAlchemy has: legacy db >> introspection. So with the DAL the developer needs to write always the >> metadata to describe the db schema. >> >> I faced that in a fairly complex sqlite schema. I hate to write things >> that are already there ;-) ... so I *had* to make a simple script to >> translate DDL of sqlite schema to proper DAL data definition instructions, >> it can be found under the scripts directory in recent web2py versions. >> The DAL should have introspection and eventually build in memory cached >> table definitions on the fly. Optionally the developer could decide to >> make those table definitions persistent to reduce start-up times and make >> customizations. >> >> mic >> >> >> >> 2012/12/29 Massimo Di Pierro <massimo....@gmail.com> >> >> To better illustrate my previous point I built an example which may be >>> useful to others who are concerned about performance. >>> >>> https://dl.dropbox.com/u/**18065445/Tmp/a1000.zip<https://dl.dropbox.com/u/18065445/Tmp/a1000.zip> >>> >>> Contains two apps a1000_1 and a1000_2. >>> >>> Both apps define 1000 tables each with 10 columns (no attributes, >>> validators, widgets, no auth). They have one controller which expose a grid >>> http://..../a1000_1/default/**index/t00001 gives you the grid for table >>> t00001. >>> >>> a1000_1 uses normal web2py models (defined in db.py and db_tables.py) >>> but it bytecode compiled, uses lazy_tables, and disables migrations. On my >>> slow laptop is serves one page in 0.9 seconds. It means it takes less than >>> 0.9ms to define each lazy table. >>> >>> a1000_2 defines all tables in modules/db_modules.py imports db in >>> models/db.py. Now it takes 80ms/req. more of 50% of it is because of the >>> complex template and cookie/session/languages logic. basically there is no >>> overhead in creating 1000 tables. >>> >>> While web2py defaults to a1000_1, other frameworks default to a1000_2 >>> but web2py can do it too! >>> >>> What are the drawbacks of the second approach? There are 2 and they are >>> not web2py specific. >>> >>> 1) The web server occasionally creates new processes. They will need to >>> reload the module. This takes time. Some users will experience the slow >>> down. >>> 2) Now the db object is shared by all request (not the connector, they >>> still different db sessions/transactions). This means you cannot alter the >>> attributes of models (db.t00001.f001.readable=**False) because this >>> will affect all concurrent requests and following requests. >>> >>> Web2py makes you pay a price in order to have one new clean db object at >>> every request. You pay this price so that you do not have to worry about >>> changing its state and affecting other requests. This costs ~1ms/table on >>> my laptop. >>> >>> With conditional models this is a non-issue since you only the price for >>> tables you use in the action. >>> >>> Hope I am making some sense. >>> >>> Massimo >>> >>> >>> >>> >>> >>> >>> >>> On Saturday, 29 December 2012 11:30:19 UTC-6, Massimo Di Pierro wrote: >>>> >>>> TLTR: if you can do it with others you can do it with web2py, although >>>> default behavior may be different. >>>> >>>> There are some important differences between web2py DAL and SQLAlchemy: >>>> sqlalchemy has better support for non-integer primary keys (dal supports >>>> them too but they make automatic form handling very difficult so we want to >>>> discourage that); 2) sqlalchemy allows multiple database sessions per >>>> thread over the same connection (in web2py if you want to use multiple >>>> session in the same thread, you need multiple connections, I have never >>>> seen a use case for this). 3) frameworks which use SQLAlchemy define tables >>>> only once when the program starts, web2py defines tables at every request. >>>> They are both equally reliable and fast. >>>> >>>> 3) causes a performance penalty for web2py compared with other >>>> frameworks when large number of tables are present. Yet this is not a DAL >>>> limitation. This is because of the way web2py choses to use the dal. We >>>> want to allow different http requests to see changes in the schema. This is >>>> part of the price for the easy to use approach. Using conditional models go >>>> a long way towards reducing this overhead. If you use the DAL from a pyhton >>>> script or a background process of other framework (for example Tornado), >>>> you do not have the overhead problem. Even in web2py there are ways to >>>> avoid it completely although not documented yet (define tables at top-level >>>> in a module, import db from the module, then call db.reconnect() in >>>> models). >>>> >>>> Let me put it in another way. In web2py you can put logic in modules or >>>> in models. They are two types of files (one is imported one is executed). >>>> The more logic you put in models, the more agile it is. The more logic you >>>> put in modules, the faster it in. Other frameworks do not make this >>>> distinction (models are modules and are imported too) and therefore they >>>> give you no option. >>>> >>>> Web2py can definitively be used to build a complex system with many >>>> tables. As with any framework the bottle neck will always be database >>>> access to you need lots of caching. Whether it is the right system to >>>> interface with an existing database that I cannot say but it depends on the >>>> existing database but you will have issues with any framework you choose. >>>> >>>> Massimo >>>> >>>> >>>> >>>> On Saturday, December 29, 2012 11:02:34 AM UTC-6, Alex Glaros wrote: >>>>> >>>>> Hi Lazaro and Massimo, >>>>> >>>>> The Fi$cal project is not my project; I don't know how many tables it >>>>> has, but is an example of highly relational financial software so that >>>>> web2py members could understand my question clearly. Most projects will be >>>>> in strict 3rd Normal Form. >>>>> >>>>> Let me be more transparent regarding my goals. >>>>> >>>>> I have a nonprofit government improvement >>>>> organization<http://www.gov-ideas.com/>and part of our mission is to >>>>> recommend software architecture to >>>>> government. I am evaluating web2py as a candidate for prototyping >>>>> applications. I'm very excited that Massimo's vision to lower the average >>>>> person's programming entry barrier also works to improve government by >>>>> allowing rapid prototyping. >>>>> >>>>> The next step is to assess how to best migrate prototypes to >>>>> production. There maybe no further steps necessary, or for large projects >>>>> like Fi$cal, there may be a consensus such as moving the entire project to >>>>> Python/sqlAlchemy which provides the most stability, but allows copying of >>>>> some web2py components for reuse. >>>>> >>>>> It is a significant responsibility to make these recommendations and I >>>>> need many diverse advisors to select products that reduce risk in >>>>> government software development. >>>>> >>>>> >>>>> 1. At this time is there a consensus regarding the top candidates >>>>> for open source business-oriented, highly relational software tools? >>>>> 2. What would the top 10 look like? >>>>> 3. What are the reasons that each one was selected? >>>>> 4. At what points or areas would other products surpass web2py? >>>>> >>>>> >>>>> Any comments would be much appreciated, >>>>> >>>>> Alex >>>>> >>>>> On Saturday, December 29, 2012 8:47:42 AM UTC-8, Massimo Di Pierro >>>>> wrote: >>>>>> >>>>>> I should add that in the system I have looked almost all tables where >>>>>> de-normalized. All tabled storing a key would also stored the value >>>>>> corresponding to the code. This is for two reasons: 1) if you have >>>>>> thousands of tables you cannot join everything all the time. 2) for >>>>>> auditing purposes it should be possible to change a value in a lookup >>>>>> table >>>>>> without changing the corresponding previous values of records created >>>>>> before. >>>>>> >>>>>> This means that while I needed the lookup tables for IS_IN_SET() >>>>>> IS_IN_DB() validators, I never needed more than two joins. >>>>>> >>>>>> Again, the all design seems crazy to many of use but it has its plus >>>>>> sides from an auditing point of view. >>>>>> >>>>>> Massimo >>>>>> >>>>>> On Friday, 28 December 2012 22:39:51 UTC-6, Massimo Di Pierro wrote: >>>>>>> >>>>>>> I have some experience with a large peoplesoft system where they >>>>>>> claims thousands of tables. Turns out almost everything I needed was in >>>>>>> less than 10 tables. >>>>>>> >>>>>>> Basically most of the tables were lookup tables used simply as >>>>>>> key:value store for the possible values of a field. For example table >>>>>>> GENDER {'M':'Male','F':'Female','O':'****Other'}. >>>>>>> >>>>>>> So 10 tables with 100 columns each and one lookup table for each >>>>>>> column gives you more than 10,000 tables. I suspects that is your case >>>>>>> too. >>>>>>> >>>>>>> All systems that claim an insane amount of tables belong to this >>>>>>> category. >>>>>>> >>>>>>> The way to handle it is to load all lookup tables in cache and use >>>>>>> cache instead of database access to convert key<->value. In fact the >>>>>>> values >>>>>>> for lookup tables almost never change. >>>>>>> >>>>>>> I suggest before you embark in this venture do the following >>>>>>> exercise: make a list of all table names. For each table make list of >>>>>>> fields in the table and count the number of records (more or less). >>>>>>> >>>>>>> You will find many tables with less then 100 records and less then >>>>>>> 10 columns. You will find a few tables with more than 10 columns and >>>>>>> more >>>>>>> then 100000 records. You need to find out how many tables belong to one >>>>>>> category and how many to the other. >>>>>>> >>>>>>> If this is the case, as I suspect, than you can use web2py but you >>>>>>> need to setup some clever caching system to hable the lookup tables. It >>>>>>> would be the same with other frameworks since you don't want to join >>>>>>> everything all the time or your database will grind to a halt. >>>>>>> >>>>>>> It is also possible I am completely wrong in my assumption. >>>>>>> >>>>>>> In the case of the peoplesoft system I studied they also were >>>>>>> storing all past versions of each record in the same table as the >>>>>>> current >>>>>>> record. Basically every record had had two dates (valid_from, >>>>>>> valid_until). >>>>>>> Current records had valid_until set to 2999-12-31. records would never >>>>>>> be >>>>>>> modified. The process for modifying a record consisted of creating a >>>>>>> copy >>>>>>> of the current record, editing the copy, setting the valid_until=now for >>>>>>> the previous current record, updating all references pointing to the >>>>>>> record. Af course all tables used the same mechanism for versioning >>>>>>> thus >>>>>>> making the update process very slow and cumbersome, and all tables >>>>>>> un-necessary large. Yet this simplifies auditing because you can go >>>>>>> back to >>>>>>> any moment in time simply by filtering records in a query. >>>>>>> >>>>>>> The reason I am explaining all of this is that probably you are >>>>>>> going to have to deal with something like this. The problem is not >>>>>>> web2py >>>>>>> vs other framework. The problems will be that you need special logic to >>>>>>> handle those tables which is foreign to web2py and many modern >>>>>>> frameworks >>>>>>> which simply assume more moder database design practices. >>>>>>> >>>>>>> My suggestion is start small and see what happens. Find who are your >>>>>>> primary target users. Find which tables they need to access and create a >>>>>>> web interface for those tables. You will probably be able to factorize >>>>>>> the >>>>>>> interaction with the database in many small apps. >>>>>>> >>>>>>> Massimo >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Friday, 28 December 2012 09:38:25 UTC-6, Alex Glaros wrote: >>>>>>>> >>>>>>>> Can web2py be used for highly complex relational databases for >>>>>>>> large fiscal projects? Example: California's Fi$cal project - >>>>>>>> http://www.fiscal.ca.gov/<http://www.linkedin.com/redirect?url=http%3A%2F%2Fwww%2Efiscal%2Eca%2Egov%2F&urlhash=DBJm&_t=tracking_anet> >>>>>>>> - with roughly 10,000 tables and many complex joins. >>>>>>>> >>>>>>>> What components of web2py would start to get slow or not work well >>>>>>>> when having so many tables? >>>>>>>> >>>>>>>> If web2py would instead be better used to prototype the Fi$cal >>>>>>>> system, what would be good production-version candidates to migrate to? >>>>>>>> Pure Python using sqlAlchemy? Java? Anything that would make migration >>>>>>>> easier such as Python-based frameworks? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Alex Glaros >>>>>>> >>>>>>> -- >>> >>> >>> >>> >> >> -- > > > > -- --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.