On Sat, 04 Dec 2010 16:42:36 -0600, Jorge Biquez wrote: > Hello all. > > Newbie question. Sorry. > > As part of my process to learn python I am working on two personal > applications. Both will do it fine with a simple structure of data > stored in files. I now there are lot of databases around I can use but I > would like to know yoor advice on what other options you would consider > for the job (it is training so no pressure on performance). One > application will run as a desktop one,under Windows, Linux, Macintosh, > being able to update data, not much, not complex, not many records. The > second application, running behind web pages, will do the same, I mean, > process simple data, updating showing data. not much info, not complex. > As an excersice it is more than enough I guess and will let me learn > what I need for now. Talking with a friend about what he will do (he use > C only) he suggest to take a look on dBase format file since it is a > stable format, fast and the index structure will be fine or maybe go > with BD (Berkley) database file format (I hope I understood this one > correctly) . Plain files it is not an option since I would like to have > option to do rapid searches. > > What would do you suggest to take a look? If possible available under > the 3 plattforms. > > Thanks in advance for your comments. > > Jorge Biquez
Well, two NoSQL databases that I have some experience with are MongoDB and CouchDB. The choice among them depends on your application. CouchDB is an extremely simple to set up, it is all about the web interface, as a matter of fact it communicates with the outside world using HTTP protocol, returning JSON objects. You can configure it using curl. It is also extremely fast but it doesn't allow you to run ad hoc queries. You have to create something called a "view". This is more akin to what people in the RDBMS world call a "materialized view". Views are created by running JavaScript function on every document in the database. Results are stored in B*Tree index and then modified as documents are being inserted, updated or deleted. It is completely schema free, there are no tables, collections or "shards". The primary language for programming Couch is JavaScript. The same thing applies to MongoDB which is equally fast but does allow ad hoc queries and has quite a few options how to do them. It allows you to do the same kind of querying as RDBMS software, with the exception of joins. No joins. It also allows map/reduce queries using JavaScript and is not completely schema free. Databases have sub-objects called "collections" which can be indexed or partitioned across several machines ("sharding"), which is an excellent thing for building shared-nothing clusters. Collections can be indexed and can be aggregated using JavaScript and Google's map/reduce. Scripting languages like Python are very well supported and linked against MongoDB, which tends to be faster then communicating using HTTP. I find MongoDB well suited for what is traditionally known as data warehousing. Of course, traditional RDBMS specimens like MySQL, PostgreSQL, Firebird, Oracle, MS SQL Server or DB2 still rule supreme and most of the MVC tools like Django or Turbo Gears are made for RDBMS schemas and can read things like the primary or foreign keys and include that into the application. In short, there is no universal answer to your question. If prices are a consideration, Couch, Mongo, MySQL, PostgreSQL, Firebird and SQL Lite 3 all cost about the same: $0. You will have to learn significantly less for starting with a NoSQL database, but if you need to create a serious application fast, RDBMS is still the right answer. You may want to look at this Youtube clip entitled "MongoDB is web scale": http://www.youtube.com/watch?v=b2F-DItXtZs -- I don't think, therefore I am not. -- http://mail.python.org/mailman/listinfo/python-list