Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

Effectively Implementing SQL Phrasebooks

Rated 2.54 (Ratings: 7) (Add your rating)

Log in to add a comment
(9 comments so far)

Want more?

  • More articles in Code
  • More articles by tag
 
Picture of tag

Scott McCoy

Member info | Full bio

User since: May 10, 2003

Last login: February 12, 2011

Articles written: 1

This article describes effective and portable implementations of SQL Phrasebooks, using formatted constants as opposed to the currently accepted methods which call for XML Configuration files and dynamic general SQL Query generation routines. Formatting is a very old and standardized method of making strings dynamic, and although there are currently numerous implementations that vary depending on language, the formatted strings we will use are completely portable and standardized among all popular languages today.

Why not use XML? Well, there is an argument (taken from The Phrasebook Design Pattern) that mixing languages is bad. My feeling is that mixing SQL and XML, with library specific scalars intended to be parsed by the current application, is hardly better than mixing SQL with Perl, PHP, or any other programming language. So our response is to isolate the SQL. This keeps us from having a statement for any one table more than once, ever. We do this by only using completely constant values where we don't need variations of the same query, and otherwise using strict prototyping. This gives three distinct advantages when writing Phrasebooks as libraries.

The first is the obvious speed advantage, resulting from not having to parse the XML file. The second is that string formatting is extremely standard among today's modern languages, which means our formatted queries are quite portable, and porting the Phrasebooks themselves to different languages can be done with minimal effort.

The third advantage is that it helps reduce repetition. Reducing repetition is the main focus behind Phrasebook design. Repeating yourself in the code is quite possibly the worst thing you can do, especially when your repetition is large chunks of an alternate language within your program. Why worry about repetition? Well, repetition makes maintaining code quite difficult, especially with SQL, since if you change your database, you have to remember to search all your programs that access the specific tables you have changed, for whatever unanticipated reason you changed them, and change the queries contained in them. If you have all of your queries in a single location, and even then only once, this makes this process much less painful.

When I first began discussing this in a public forum, the first question I received was, "Well, then won't we need a bunch of extra sprintfs? I think that is kind of annoying." Of course, my response was "Yes, but you don't have to see them." The way we get around this is by writing our own extremely mild wrappers to whichever DBI we are using. The wrappers are quite simple; for instance, in PHP we simply make a class to wrap around the long, ugly mysql_* functions that PHP's mysql DBI provides:

/* Simple DBI Wrapper */
class EasyMySQL {
var $Result;
var $Link;

function EasyMySQL ($User,$Pass,$Data,$Host = "localhost") {
    $this->Link = mysql_connect($Host,$User,$Pass);
    mysql_select_db($Data,$this->Link);
}
function rqueryf () {
    $args = func_get_args();
    return mysql_fetch_object(mysql_query
        ( call_user_func_array( 'sprintf', $args ), $this->Link));
}
function queryf () {
    $args = func_get_args();
    return mysql_query(call_user_func_array( 'sprintf', $args ));
}
function rquery ($Query) {
    return mysql_fetch_object(mysql_query($Query, $this->Link));
}
function query ($Query) {
    return $this->Result = mysql_query($Query, $this->Link);
}
function next ($Handle="") {
    return mysql_fetch_object($Handle ? $Handle : $this->Result);
}

/* Destruction Function [ php has no destructor, call explicitly ] */
function close () {
    mysql_close($this->Link);
    return NULL;
}
}

This specific class may leave you with a few questions. It provides more than just a simple wrapper around sprintf and the query at hand. It also provides slightly enhanced syntax for iterations, formatted and nonformatted functions, and an internal Link and Result reference. I used these because in the particularprogram I wrote this class for, my module API left the global mysql link available to extensions, without interference. Why did we fetch our rows as objects, instead of arrays or associative arrays? That's a matter of personal preference. You can use whichever methods you want.

So, when does the Phrasebook come in? It comes in now. Personally, my favorite way to design Phrasebooks in PHP is as a class that inherits my DBI wrapper. Here is a simple example:

class Phrasebook extends EasyMySQL {
function Q ($Query) {
    switch ($Query):
        case ACCOUNT:
return <<<SQL
SELECT  id,
    name,
    address,
    city,
    state,
    zip
FROM    accounts
WHERE   id = %d
SQL;
        case HITS:
return <<<SQL
SELECT  sum(raw) as raw,
    sum(uni) as uni,
    date
FROM    stats
WHERE   id = %d
AND date >= %d
AND date <= %d
SQL;
    endswitch;
}
}

Here is an example of how we would use this:

/* Was like this... */
mysql_connect($User, $Pass);
mysql_select_db($Database);

$result = mysql_query("SELECT id, name, address, city, state, zip
                    FROM accounts WHERE id = $id");
while ($Data = mysql_fetch_object($result)) {
...
}

/* Now it's like this */
$Database = new Phrasebook ($User, $Pass, $Database);

$Database->queryf($Database->Q("ACCOUNTS"), $id);
while ($Data = $Database->next()) {
...
}

Obviously, the end result is a lot less code, especially for very large queries, and looks much nicer, with all of the advantages discussed earlier.

So what about Perl? Well, Perl actually has specific advantages in Phrasebooks over PHP, mostly, that we can contain our SQL Phrasebook in a package with constant, null prototyped definitions (when possible) and this increases the speed of the Phrasebook access exponentially (not that it isn't already very fast in the example above). One rule to keep in mind with Perl is that if you only use a table once, you should use a null prototype for that query. Why? Well if we're not doing anything dynamic, we don't need to pass parameters. If we use a null prototype, the Perl compiler flags that method as a constant, and compiles it as if it were a single quoted string into our code. So it has absolutely no runtime loss over single quoted, static string formatting. That's pretty impressive, abstracting the code without slowing it down whatsoever.

Here is an example Phrasebook package in Perl that executes the same queries as the above PHP example:

package Phrase;

sub ACCOUNT () { 
<<SQL;
SELECT  id,
    name,
    address,
    city,
    state,
    zip
FROM    accounts
WHERE   id = %d
SQL
}

sub HITS    () {
<<SQL;
SELECT  sum(raw) as raw,
    sum(uni) as uni,
    date
FROM    stats
WHERE   id = %d
AND date >= %d
AND date <= %d
SQL
}

Here is an example of how we would use this:

# What was this
$sth = $dbh->prepare
("SELECT id, name, address, city, state, zip FROM accounts WHERE id = $id");

# Now becomes this
$sth = $dbh->prepare(sprintf(Phrase->ACCOUNT, $id));

This is obviously not only more pleasant to have in your code if you're not currently worrying about the SQL, but also has all the advantages mentioned earlier.

As you can see here, we didn't exercise the database abstraction as in the PHP example mentioned, because the robustness of Perl's DBI doesn't really require it. If you wanted, however, to rid your code of the extra sprintfs, considering that Perl's DBI already offers the other conveniences our PHP wrapper had, write a simple object to inherit a DBI object and call prepare with a sprintf. Here is an extremely simple example:

package FormatDBI;
use DBI;
use Data::Dumper;

@FormatDBI::ISA = qw(DBI);

package FormatDBI::db;
@FormatDBI::db::ISA = qw(DBI::db);

sub preparef {
return $_[0]->prepare(sprintf($_[1], @_[2..$#_]));
}

package FormatDBI::st;
@FormatDBI::st::ISA = qw(DBI::st);

package main;
use strict;
use warnings FATAL => qw( all );

use Phrase;

my $dsn = "DBI:mysql:database=aff;host=localhost";
my $dbh = FormatDBI->connect($dsn, "username", "password");

my ($Data,$sth);

$sth = $dbh->preparef(Phrase->ACCOUNT, 40);
...etc...

Hope you find this useful.

=~ /JA[PAC]?H/

You can find me at http://www.blisted.org

How to write code nobody can maintain...

Submitted by tapir on May 22, 2003 - 05:18.

This is mostly a bad idea, because the next person who looks at your code is going to find his brain twisted in knots. He's going to look at the place where you do the SQL query, wonder what the heck Phrase->ACCOUNT is, and spend a lot of time looking for it.

Later on, when this person wants to add a query in a hurry, he's just going to say "screw this" and just write the query into the code.

If you were using the same queries again and again in different places, then this may have some merit. On the other hand, you can get the same benefit in a much more conventional way by putting commonly used queries in a subroutine.

login or register to post comments

re: bad idea

Submitted by foo bar on May 22, 2003 - 23:39.

hardly - i'd question the competance of someone who would have trouble with the above example.

login or register to post comments

I am an experienced programmer...

Submitted by tapir on May 23, 2003 - 04:21.

For every project I've been in, I've had the experience of working with a less skilled programmer or having a less skilled programmer take over the maintainance of something. For instance, I'd work for some small .com and the guys would think I'm charging too much money and get some people from the third world to take over -- a few months later the project would be a mess and they'd send me the check that they hadn't sent me for a few months and they'd want me to get them out of their mess.

Any scheme like the one above has tradeoffs associated with it. The advantage of the above scheme is that SQL statements are centralized. This could be a boon if you want to port the project from one database to another database, because you'd have all the SQL statements in one place. The flip side, however, is that you're introducing nonlocality and another thing that people have to learn. For instance, a programmer looking at the code snippet

Phrase->ACCOUNT

isn't immediately going to know what it does -- access all of an account, part of an account. You can't even tell if it's a SELECT, UPDATE, INSERT or DELETE. What happens when your database gets more complicated and the complete state of an account is distributed between twenty tables and it takes more than one SELECT to extract all the information out of an account? To understand what the code does, the maintainer has to look at the immediate context and also has to go searching for the place where the Phrasebook is defined. Every single time that the maintainer wants to understand what a SQL query does he has to look in more than one place.

The PHP example does the particularly bad thing of using a Case statement to switch between phrases. This is particularly bad. If you have battle scars from maintaining other people's codes, you should feel violently ill when you see case statements and if-then-else ladders. Imagine that the Phrasebook grows to include 100 phrases -- that subroutine is really starting to get insane. In a case like this, an associative array is a lot more appropriate. In an awful lot of other cases, OO techniques such as polymorphism are a much better way to handle cases. If-then-else and case statements have a way of mutating over time into twisted, unmaintainable forms.

A more conventional, and better, way to accomplish what's being done above is to encapsulate the whole query into subroutines. You could write something like

$account=$db->getAccount($id)
echo "Name = $account->name";

which is simple and clear. A subroutine can not only contain a SQL query, but it can also do all the cursor handling and it can do multiple queries. For instance, there are some things you could write as one SQL statement in Oracle with subselects that you'd need to write as multiple queries in MySQL 4.0. Subroutines can also do locking and concurrency control connected with update statements.

login or register to post comments

Re: Paticularly Bad?

Submitted by tag on May 24, 2003 - 15:29.

Being an expierenced developer you should understand the importance of comments, then. Comments, as expressed in a v ariety of books and papers, should explain not what you are doing but why you are doing it. Explaining why we're doi ng $sth->prepare(Phrasebook->Whatever); can be done very simply and very efficently with the following statemen t

# A library of queries to keep our code clean
use Phrasebook;

In my own defense, I believe this would be *NO* less confusing to a inexperienced programmer as placeholders or other more advanced DBI techniques. If you're really, really worried about the inexperienced being able to read and under stand your code at first glance, your code will be inefficently and ineffectively written. Phrasebooks are by no mea ns a bad idea and are already widely accepted as a development tactic. Personally I feel this paticular method is co nsiderably less confusing than something like Class::Phrasebook::SQL's API to accessing its XML phrasebook libraries. Compare yourself,

$statement = $sql->get("INSERT_INTO_CONFIG_ROW",
{ id => 88,
parent => 77,
level => 5 });

as taken directly from the Synopsis.

As for the PHP example, I can agree with your concern to a certain extent. A 500 line switch can be confusing if it is poorly maintained and especially when mixed with ternary operators and non terminating heredoc ;). The logic behind using a large switch to encapsulate the queries was that with alternative syntax, it clearly stands out as a library and not in fact a part of the program. Also, in all actuality, it would take an extremely sloppy programmer to make a mess of a switch in which each case only makes a single function call. If you would prefer to separate your queries through other means I believe an object cast would be a more proper and more portable way to accomplish this. "Isolating" your queries by placing them in large methods/subroutines along with the database connectivity does not in fact solve the problem but creates another one in its own that we tried to solve. It mixes your queries up in your code. The logic behind entirely segragating our database connectivity from our queries is that it keeps everything in one place and everything easy to get to, which is in fact the ideology behind object oriented programming to begin with. But lets just play along with that switches might take a bit of effort to keep clean and easily maintainable. In this case, I would believe a better and more polymorphic method would be to use the previously mentioned object cast, as follows:

$Phrasebook = (object) array
( "One" => <<<SQL
SELECT this,
      that,
      the,
      other,
      thing
FROM   table
WHERE   field = %s
SQL;
  "Two" => ... etc );

This would result in, quite obviously, your queries being called as object variables. $this->queryf($Phrasebook->Query); Which is much more measured and clean than failing to properly separate your SQL from your PHP, and your queries from your database functions. Especially since aside from the object being named 'db', your example doesn't clearly show "This is a query" or "This is an iteration" or...who knows. Of course, it does not show any example of how you would impliment such a method, either. And lastly, using the same method for your initial query that you use for iteration is strangely unclear of what you're doing, and I would definately not suggest it. Our database object in the example above leaves a clear separation between the initial query and the following iteration, for this reason.

Simply stated, your example doesn't accomplish whats being done above, and actually could be used in the implimentation of your example.

login or register to post comments

Followup...

Submitted by tag on May 24, 2003 - 15:36.

I should also include, I suppose, a polymorphic example of proper use of the object cast. In this, we would make our phrasebook returned as a object from a method of our database class.

class DatabaseJunk {
  function phrases () {
    return (object) array ( ... our phrasebook array ..);
  }
}

I probably should also mention that placeholders were not used in the perl DBI example because they're not portable, you can use a stringformatted phrasebook such as this anywhere from an awk associative array, to a C struct, and was the one of the intents behind the string formatted phrasebook design, consisency is good...so using the same method everywhere in various parts of your application that may use multiple languages, atleast helps the maintainer keep his head on straight since he can expect the same type of implimentation reguardless of language.

login or register to post comments

Object implementation versus object design

Submitted by tupholme on June 3, 2003 - 08:39.

The main value of this article is in pointing up the importance of separating distinct parts of your application, which is a good thing to do in general because it gives you flexibility and ease of maintenance.

What I don't necessarily agree with is HOW it's being done here. In general I don't have too much of a problem with the phrasebook concept, but I agree with tapir in that for more complex queries this purely data-driven approach is too limited. Even if you're only doing retrievals, you may often want to to include or exclude columns, ordering or joins based on users' preferences. Using this approach, you either have to start putting SQL fragments back into your function parameters, or you end up with all-inclusive query templates that deliver worst-case performance every time. The database is still the bottleneck for most applications so it is better to get the code to work on optimising the query, rather than trying to standardise your queries and suffering a speed penalty.

On the subject of objects, what's being talked about here is an object implementation (i.e., creating classes in PHP and Perl), but not an object design. A true object design aims to encapsulate data and behaviour within a class. This is a different kind of separation - keeping your application clustered around real-world objects in the business domain. A true class would be called 'account' or 'customer' rather than 'phrasebook' or 'database'.

With this approach, we're generally trying to keep all our business logic in one place - in the class. The phrasebook works against us. On the other hand, business objects give us a good way to accommodate complexity in our queries (for example, building queries on the fly) without either ending up with one huge IF statement, or ending up with SQL code scattered randomly through our application. The code is located in the definition of the class that deals specifically with that part of the business; it is easy to find and easy to change without impacting the rest of the application.

Which approach you chose really comes down to the complexity of the application and the rate of change in the environment. If you think you will move to a new database, seek separation of your query language from your application code and use a phrasebook. If the business you are working for is itself constantly changing but has a stable technical architecture, use an object oriented approach to make your application logic easier to update.

login or register to post comments

SQL code in your app is a bad thing

Submitted by caderoux on June 10, 2003 - 10:56.

To start with, I think you've got a problem any time you find raw ad-hoc SQL in your app - regardless of whether it's within an abstraction layer. Except for a necessarily ad hoc query (which can be eliminate in almost all cases), we try never to put SQL in the app. A compiled stored procedure is better for performance and maintenance and allows you to optimize db performance at that level without impacting readability at the script level. Hide any optimization and tweaks in the database - that way if someone else is messing around in your database, they'll see the stored procedures and know more about how the data is actually used before messing something up. You should always used parametrized stored procedures.

Using an object level abstraction for application code _is_ far more readable, and I agree with tapir. I've been using .NET a lot and building a robust object hierarchy which supports your application domain is easily extensible, and isolates database access to the objects themselves, and can be very maintainable, with the object connecting to the database and calling appropriate SPs.

login or register to post comments

SQL Phrasebook != Good Object-oriented design

Submitted by sephiroth on December 2, 2003 - 11:10.

I agree with tupholme that such a phrasebook is bad OO design. A far better way to do this sort of thing is to isolate your value objects (Account, for example, should hold only information for an account), from your model objects (AccountRepository).

To use the examples given above, the best way to do this for a larger system that needs to be truly maintainable would be to develop an Account class that is essentially a typed grouping of data. It has a succint purpose: it is an account. Next, one would create an interface for a Repository (or Storage, or Cabinet, or whatever), and extend it to be an AccountRepository. You'd then create various implementations of that interface (which might rely on something like a GenericRepository to store their information) for the various different databases.

Finely-grained objects like this create both maintainability and stability in a system. When someone gets an AccountRepository, runs repo.storeAccount(foo) or repo.getAccount(idNumber), it makes perfect sense, and data-access details are isolated from actual business logic.

Another way to do this is to simply make a Storable interface and have Account implement it. Then, you make an ObjectStorage (preferably using an OODBMS ... I still don't understand people's fascination with relational databases) that works for Accounts, Customers, and anything else you care to store.

Clearly, there are several ways to tackle the issue, but I must say that creating a "phrasebook" based on sprintf and dodgy replacements is not the way to go about it. That might work for a 14-year-old's guestbook; I do not think it would be an effective method on any production site.

login or register to post comments

A wonderful embarassment

Submitted by tag on February 12, 2011 - 22:40.

This article is a beautiful failure on many levels. It was written by a younger version of myself, approaching a decade ago, who had decided he ultimately wanted to be a writer. My writing skill level at the time was approximately 8th grade, and although I had been programming for nearly a decade I had only two years experience with either production web applications or databases. I was thoroughly unqualified to provide any sort of advice on the matter.

I wrote this article during a project I was working on where I had become fed-up with embedded SQL appearing riddled throughout application logic. I had not yet heard of the traditional designs for data-access abstractions, was not aware stored procedures were available, and barely understood object oriented programming. It is of no coincidence that in alignment with the commentary above, I discovered the practice attempted here was an outright bad idea, no later than the end of the project where I concocted this half-baked design. This is not to mention the insecurity of the examples above, which I will humbly leave in tact, from their being subject to SQL injection.

It is worth mentioning that upon the discovery of the shortcomings of this approach and after reviewing the general response to this article, the younger self who was the author designed and developed an intermediate programming language for generating data-access layers. This experiment worked markedly better, and was actually somewhat analogous to an ORM with the exception that rather than generating the SQL, it generated the data transfer objects and data access interfaces. Quaint as it is, this concept as well has since been abandoned because it is an excessively complicated answer to a very trivial problem. If anyone cares to review it, (http://www.blisted.org/browser/Phrasebook/trunk) remains archived on my old website.

In this day, I currently propose using a stored procedure API, and a simple data-access abstraction. One should always use place-holders for parameters to any queries or procedures so the driver may quote the input. If you are going to use a phrasebook-type approach, just simply place your queries in some kind of flexible configuration format (such as XML), statically load these into a simple hash-table as a part of the initialization process. Always hedge your business logic from knowing anything about this, by using a data-access interface who's implementation encapsulates all notions of data-access, and never follow these examples.

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.