Is there any chance that the PostgreSQL developers would accept a new cluster wide configuration option to control how the system handles symbol case folding?
Currently PostgreSQL folds all un-quoted symbols to lower case. We would like to add a global configuration option with a name like symbol_casefold with settings to allow folding symbols to lc - lower case (default) uc - upper case none - no case folding USE CASE 1 The option we actually want for our own use is "none". Currently we have several large bodies of code that work with an SQL Anywhere backend. This server preserves case and - by a configuration option - performs case insensitive lookup on all symbols. We are moving to PostgreSQL for internal applications. We are also considering using PostgreSQL for the storage backend in a product. We use camel case for symbol names in all of our application and backend code (largely written in C++, Object Pascal, and Python). In a language like Pascal that ignores case, we still maintain consistent use of case in symbol names for the sake of human readers of the code. Where we make a connection from application code to a storage backend we would like to use exactly the same symbol name - including case - for the symbol in the backend as we use in the corresponding symbol in application code. For example, if we intend to read a value into a variable called FirstName we would like the corresponding field in the database to also be FirstName. The main reason we want this exact match is so that a human reader sees exactly the same thing in the two places. This makes it easier to see the connection between the two bodies of code. However, in places we also automate the connection between client symbols and server symbols, and in such a case it is useful, though certainly not necessary, to have an exact match. While we do not do so at present, where we use a database purely through code, we can likely add double quotes around all symbols in our generated SQL, which means we can have an exact match as we want under PostgreSQL as it currently behaves. However, in many cases we also access the same database backend through third party tools and ad hoc queries. If we double quote all of our symbols, which is what we have tried to do in our initial tests, then all such tools and manually written queries must also double quote all symbols. We use several tools and libraries that do not appear to have any way to properly quote symbols when they are obtained automatically from the schema. So, for example, if our reporting tool retrieves a field called FirstName from the database schema, it happily uses that name in its internally generated SQL. And, on PostgreSQL, of course this fails because FirstName in script is firstname at the server, which is not defined if we have double quoted the field name in the table definition. Since we do not have the code, we cannot work around this in any easy way. For ad hoc queries we can double quote all symbol references. But, this makes the queries noticeably harder for a person to read. And, I do not believe this is just a matter of "getting used to it". The quotes clutter the script, and that clutter makes the script fundamentally harder to understand. Maybe this is small, but anything that makes comprehension harder is a bad thing. Queries can be hard enough to understand without extra syntactic clutter. So, these considerations leave us the option of never quoting symbols when using PostgreSQL. This works everywhere we have tried it. But, it is pretty unattractive from the point of view of looking at the symbol names in the backend if they are to match exactly the symbol names we use in code. Using a different naming convention than we use everywhere else in our code (underscore separated all lower case symbol names, say) is not appealing either. We are very consistent in our symbol name handling in our code, and breaking our conventions in some relatively large section of our code is very unattractive. In addition, in many places our tools and code use schema supplied field names to form column titles in a table or in a caption on an edit box (etc.). The mixed case names are much nicer for this purpose than the folded names. For our purposes, these are our arguments for wanting control of how the server folds case. USE CASE 2 Even though we have no use for it, I have included the option "uc" because, in trying to determine whether PostgreSQL could support our desired behavior, I found a fairly large number of people who are coming from a different backend, such as Oracle (from what I see on the internet - no personal experience), that case fold similarly to PostgreSQL. However, instead of folding to lower case, it appears a number of other database servers fold to upper case. This leaves people who are moving from these other database systems with problems in their own code that they have to patch up to be able to make the port successfully. With an option to case fold to upper case, such users would have an easier time moving to PostgreSQL. IMPLEMENTATION We have not studied the PostgreSQL code enough to know for sure that the implementation of this proposed configuration option is simple. However, an initial review indicates that symbol case folding is handled in a single place in the function downcase_identifier(). If this analysis is correct, and there are no interactions with keyword case insensitivity handling, then the implementation should be quite simple. And, the implementation - since it would be isolated to a small change in a single function - would also be relatively risk free for messing up the default behavior of converting symbols to lower case. Therefore, we would not be likely to corrupt the main use case of the default behavior. The reason we are proposing a cluster wide option, rather than a per database or per schema case folding configuration option, is that the current PostgreSQL symbol case conversion processing does not appear to have any context. Therefore, it appears it would take a quite difficult - and therefore correspondingly risky - restructuring to provide context for the symbol case conversion. Before looking at the code, our original idea was to propose a locale based configuration for controlling symbol case conversions. We could then define a locale with no case conversion for our use. However, looking at downcase_identifier(), it appears that the code performs an ASCII case conversion, and the notes indicate that there is good reason for this. Therefore, we dropped this idea. TIME FRAME It is not completely clear that we gain enough to justify implementing this proposal. At the moment, for example, we have no idea how to build the PostgreSQL code for testing. And, I do not have any understanding of what testing requirements you would have on any patch we might submit. So, I am not completely sure we can dedicate enough time to the work to be able to do the job. However, we are interested in looking at the task seriously enough to determine whether we are up to the implementation. If we were to pursue an implementation, we would likely have something within the next six months or so as we would do it as part of currently active development work to port our systems to PostgreSQL. Before pursuing this, I would like to have some idea whether the PostgreSQL developers would be likely to accept our patch if we implemented it well. We are not interested in basing our systems on our own branch of PostgreSQL. The gain to us is insufficient to justify the maintenance requirements of keeping up with PostgreSQL development. And, we do not want to end up stuck on our own old version of the tool. Therefore, if this is not a proposal that the developers are likely to entertain, we will work with the PostgreSQL system as it stands. Ian Lewis www.mstarlabs.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers