Melbourne Linux Users Group Presentation (22/08/2003)
August 22nd, 2003 By Jason Godden
Sourced From: http://members.optusnet.com.au/jasongodden/pg2.html

Summary: This presentation describes the features available in a modern RDBMS with a practical approach to their application.


Practical Use of SQL and
Modern Relational Database Management Systems

Author: Jason Godden

1. Introduction
This document describes some of the basics of SQL and it's use in a modern RDBMS. The standard features of a modern RDBMS and the concept of an RDBMS as a development environment for backend applications is also touched on.

Part I . The Basics

1. A Modern RDBMS

Any modern RDBMS provides a framework for developing backend elements of an application. Whilst MS Access and FileMaker Pro are certainly RDBMS' they do not provide the flexibility and power of a true client server model and so are not regarded as such in this docment. Recommended RDBMS are Oracle, Microsoft SQL Server, PostgreSQL, MySQL, SyBase, Informix and IBM DB2.

A modern RDBMS usually provides a basic object heirarchy of the form:

Object Tree Definition
DATA CLUSTER
DATABASES
SCHEMAS
TABLES
TRIGGERS
RULES
INDEXES
SEQUENCES
TYPES
DOMAINS
STORED PROCEDURES
VIEWS
ROLES/GROUPS
USERS
Dedicated On Disk Storage Area
Application Container
Application Seperation
Object Classes/Data
Object Events
Event Enhancement
Look Up References/Data Integrity
System Number Generators
Data Types stored in Tables/Classes
Fine Grained Typing/Type Inheritance/Extensibility
Specialised Functions
Predefined Data Reports
Role Grouping for System Users
System User Accounts

Always chose an RDBMS appropriate for the problem domain. If you require unparrelled reliability and data consistency Oracle, Informix, Sybase and PostgreSQL are good choices. If you require raw speed MySQL is a good choice. If you require strict adherence to a particular devleopment environment (ie - Windows) then SQL Server may be a good choice. With the exception of SQL Server, all databases mentioned run quite well under a modern Unix such as GNU/Linux, *BSD and Solaris and are all available freely to developers and/or the general public.

2. TABLES: Object Classes Made Easy

Tables should always be regarded as classes/objects on disk and not simply as a storage mechanism. A class usually provides object storage, row or object instantiation (as types) and methods including methods that govern object storage, retrieval and specialised functions (stored procedures).

Tables model an object or relationship containing datum. Each item of data has associated types, modifiers (constraints), checks and defualts.

To create a table one uses the the SQL CREATE TABLE statement. If we wanted to express some products we may define each product as having a product code, name, description and cost. We might also decide to place the number of products in stock in the definition as well (although this isn't strictly normalised).

So to create our products table:

CREATE TABLE products (
productcode INT4 NOT NULL PRIMARY KEY,
productname VARCHAR(40) NOT NULL,
productdesc VARCHAR(200),
numstock INT4 NOT NULL DEFAULT 0,
unitprice NUMERIC(8,2) NOT NULL
CHECK (unitprice > CAST(0 AS NUMERIC(8,2)))
);
Column Data Type Size Constraints Modifiers
productcode INTEGER 4 Bytes NOT NULL
PRIMARY KEY
 
productname CHARACTER VARYING No more than 40 characters NOT NULL  
productdesc CHARACTER VARYING No more than 200 characters    
numstock INTEGER 4 bytes NOT NULL DEFAULT = 0
unitprice NUMERIC 8 numerical characters, 2 after the decimal point NOT NULL
unitprice > 0
 

Hint: ALWAYS use NUMERIC or DECIMAL types for any monetary values. FLOAT types are a little unpredictable and the size of the float values in a NUMERIC type forces two digits after the decimal point.

There are many data types in defined in the ANSI SQL standard and most databases have their own extended types or an extensible type system so the types for your system may vary.

3. SQL: Structured Queries Made Easy

SQL stands for Structured Query Language. It provides an easy conceptual interface to the underlying data that is stored within a database. A query can generally be broken down into several parts.

COMMAND TYPE

(SELECT,INSERT,UPDATE,DELETE)

RANGE TABLE

FROM clause usualy in SELECT and DELETE. Special forms of INSERT and UPDATE can also use a Range Table although these are usually extensions to ANSI SQL.

RESULT RELATION

Specific user created relation (ie: customers,products) where an action (determined by the COMMAND TYPE) is to take place. Found as part of an INSERT, UPDATE or DELETE query.

TARGET LIST

Defines which columns should be acted upon and the values of those columns. Only exists in a SELECT, INSERT or UPDATE query. In a SELECT query it would be defined as the keys between the SELECT and FROM keywords.

QUALIFICATION

Essentially the WHERE clause of a statement and must ultimately return a boolean value relative to the current row or row to be inserted. Essentially gives the thumbs up or down to the database as to whether all is good.

JOIN TREE

(I'm not going to go into JOIN's in this document)

OTHERS

GROUP BY (Use in AGGREGATE queries) [HAVING]
ORDER BY [ASC|DESC]
LIMIT etc...

Given our earlier products table the query:

SELECT productcode,productname,numstock,unitprice
FROM products WHERE unitprice < 200;

...can be broken down into the following tree:

COMMAND TYPE		->SELECT
TARGET LIST		->productcode,productname,numstock,unitprice
RANGE TABLE		->FROM products
QUALIFICATION		->WHERE unitprice < 200;

4. Accessing Tables

SELECT -> Retrieve data from a table based on qualification

SELECT productcode,productname,numstock,unitprice
FROM products WHERE unitprice < 200;

INSERT -> Insert data into a table (with no qualification)

INSERT INTO products (productcode,productname,numstock,unitprice) VALUES
(1234,'Linux in a nutshell',10,20);

UPDATE -> Update data in a table based on qualification

UPDATE products SET numstock = 9, unitprice = 19.95 WHERE productcode = 1234;

DELETE -> Delete data from a table based on qualification

DELETE FROM products WHERE productcode = 1234;

Part II . Using RDBMS Features

1. Constraints

1.1 Check Constraints

Use check constraints and database error messages to facilitate your application. The database can already take care of all these things for you so don't try and re-invent the wheel in your client app unless you have to.

CREATE TABLE checkdemo (
productcode INT4 NOT NULL PRIMARY KEY,
availablefrom DATE NOT NULL DEFAULT current_date,
availableto DATE NOT NULL DEFAULT current_date+1
CHECK (availableto > availablefrom),
nextorder DATE NOT NULL
CHECK (nextorder BETWEEN availablefrom AND availableto)
);

Hint: Migrating from MySQL's non standard ENUM to another DB:
Use CHECK constraints, ie:

CREATE TABLE checkdemo2 (
productcode INT4 NOT NULL PRIMARY KEY,
producttype VARCHAR(20) NOT NULL
CHECK (producttype in ('BOOK','DVD','CD','SOFTWARE PACKAGE'))
);

1.2 Not Null Constraints

Use NOT NULL constraints to specify that a column cannot be null or empty.

1.3 Unique Constraints

Use UNIQUE constraints (as indexes) to specify that a column must be unique. Note that NULL is always unique because each NULL is unknown ergo different and so you may need to specify the column with NOT NULL also.

CREATE TABLE products (
productcode SERIAL NOT NULL PRIMARY KEY,
productname VARCHAR(20) NOT NULL UNIQUE
);

INSERT INTO products (productname) VALUES ('Linux in a nutshell');
INSERT 0 1
INSERT INTO products (productname) VALUES ('IPTABLES Firewalls');
INSERT 0 1
INSERT INTO products (productname) VALUES ('Linux in a nutshell');
ERROR: Cannot insert a duplicate key into unique index products_productname_key

1.4 Foreign Keys

Use FOREIGN KEYS to enforce your business rules and data relationships. They can also save you SQL statements by implemented updates for you and usually do so inside a transaction so data is guaranteed to be consistent.

CREATE TABLE fkdemo1 (
suppliername VARCHAR(20) NOT NULL PRIMARY KEY,
supplieraddress VARCHAR(40) NOT NULL
);

CREATE TABLE fkdemo2 (
productcode SERIAL NOT NULL PRIMARY KEY,
productname VARCHAR(40) NOT NULL,
suppliername VARCHAR(20) NOT NULL,
CONSTRAINT suppliername_fk FOREIGN KEY (suppliername)
REFERENCES fkdemo1 (suppliername) ON UPDATE CASCADE ON DELETE CASCADE
);

Events are UPDATE or DELETE
Possible actions are SET NULL, CASCADE, RESTRICT, DO NOTHING

1.5 Indexes

Indexes create special look up tables and additional constraints for your database objects. The most common index type is a primary key although there are academic regards for the primary key having super index properties that exclude it from begin classified as an index. We have already dealt with some indexes however for example:

CREATE TABLE products (
productcode SERIAL NOT NULL,
productname VARCHAR(40) NOT NULL,
suppliername VARCHAR(20) NOT NULL
);

--Creating a primary key 
ALTER TABLE products ADD CONSTRAINT products_pkey PRIMARY KEY (productcode); 

--Creating a unique index 
ALTER TABLE products ADD CONSTRAINT productname_unq UNIQUE (productname); 

--Creating a generic index 
CREATE INDEX suppliername_idx ON products (suppliername); 

--Creating a multi-column unique index 
CREATE INDEX suppliername_productname_unq ON products 
(productname,suppliername); 

--Creating a generic index on certain row values only 
CREATE INDEX suppliername_acme ON products (suppliername) WHERE suppliername 
LIKE 'ACME%'; 

--Creating a generic functional index 
CREATE INDEX suppliername_acme2 ON products (UPPER(suppliername)); 

2. Views

Views are invaluable as security and predefined sql mechanisms to access data. You may have a situation requiring particular output formatting or table joins that you don't wish to re-code in each client application.

CREATE VIEW vw_suppliers_products AS
(SELECT fkdemo1.suppliername, fkdemo1.supplieraddress, productcode, productname
FROM fkdemo1 INNER JOIN fkdemo2 ON fkdemo1.suppliername = fkdemo2.suppliername);

3. Rules

Rules are an integral part of a databases VIEW system. Essentially a rule instructs the query parser in a database to rewrite the query so that it performs a different action. When dealing with rules remember that a rule operates at STATEMENT level and as such it doesn't have the same powers as a trigger however it can be useful for various situtions:

BEGIN;

CREATE TABLE shipments (
containernumber SERIAL NOT NULL PRIMARY KEY,
productcount INT4 NOT NULL,
arrivaldate DATE NOT NULL DEFAULT current_date,
deleted BOOL NOT NULL DEFAULT FALSE
);

REVOKE ALL ON TABLE shipments FROM public; 

CREATE VIEW tvw_shipments AS (SELECT containernumber,productcount,arrivaldate 
FROM shipments WHERE deleted = FALSE); 

CREATE RULE shipments_insert AS ON INSERT TO tvw_shipments DO INSTEAD 
(INSERT INTO shipments VALUES (DEFAULT,NEW.productcount, 
NEW.arrivaldate,DEFAULT)); 

CREATE RULE shipments_update AS ON UPDATE TO tvw_shipments DO INSTEAD 
(UPDATE shipments SET containernumber = NEW.containernumber, productcount = 
NEW.productcount, arrivaldate = NEW.arrivaldate WHERE containernumber = 
OLD.containernumber); 

CREATE RULE shipments_delete AS ON DELETE TO tvw_shipments DO INSTEAD 
(UPDATE shipments SET deleted = TRUE WHERE containernumber = 
OLD.containernumber); 

GRANT SELECT ON VIEW tvw_shipments TO public; 

COMMIT;

Note: We wrap these statements in a transaction to ensure that nobody else can edit data in the shipments table whilst we are creating it.

4. Domains

Domains provide a means of extending and fine graining types. Most of us are now familiar with the VARCHAR type which can be used to store text. If we have a common requirement in our database to store object names as VARCHAR (40) NOT NULL we can use domains to ease our programming load and provide some basic documentation about our methods:

CREATE DOMAIN objectname AS VARCHAR(40) DEFAULT 'New Object' NOT NULL;

Now we can simply use the domain to standardise our tables:

CREATE TABLE products (
productcode SERIAL NOT NULL,
productname objectname
);

So productname field is defined as VARCHAR(40), NOT NULL and with a default value of 'New Object'.

5. Triggers

These are some of the most powerful objects in a RDBMS and many developers cannot understand their power until they've used them in a project.

Essentially a trigger is an action that occurs when a particualr event happens in your database. One of the most transparent examples is logging. Why program multiple inserts for logging on your client application or backend overnight jobs to look at changed fields when you can have the database do it all in the middle of the transaction and without the overhead of client<->server communication?

Triggers can be programmacticlaly instructed to fire after or before inserts, selects, updates and deletes on a table and are usually written in a procedural language (we will touch on that next).

If we wanted to log every action (except selects) on our products table from Part 1, point 2 we must first create our log table and then our trigger procedure:

CREATE TABLE products (
productcode SERIAL NOT NULL PRIMARY KEY,
productname VARCHAR(40) NOT NULL,
productdesc VARCHAR(200),
numstock INT4 NOT NULL DEFAULT 0,
unitprice NUMERIC(8,2) NOT NULL
CHECK (unitprice > CAST(0 AS NUMERIC(8,2)))
);

CREATE TABLE products_log (
actionwhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
actionwho VARCHAR(20) NOT NULL DEFAULT current_user,
actiontype VARCHAR(6) NOT NULL,
logmsg VARCHAR(200) NOT NULL
);

--This is PostgreSQL specific. Oracle functions are similar but other databases
may have different syntax (ie - MS SQL Server). 
CREATE OR REPLACE FUNCTION trig_products_log() RETURNS TRIGGER AS' 
DECLARE 
	v_logmessage VARCHAR; 
BEGIN 
	IF TG_OP = ''INSERT'' THEN 
		v_logmessage := ''Inserted '' || NEW.productname || ''.''; 
	ELSE 
		v_logmessage := INITCAP(LOWER(TG_OP)) || ''d '' || OLD.productname || ''.''; 
	END IF; 
	INSERT INTO products_log (actiontype,logmsg) VALUES (TG_OP,v_logmessage); 
	IF TG_OP = ''INSERT'' THEN 
		RETURN NEW; 
	ELSE 
		RETURN OLD; 
	END IF; 
END; 
' language 'plpgsql';

Finally we connect the trigger function to the table as an event listener to fire AFTER an action has taken place. We could specify BEFORE here however actions that do not change data should go AFTER an action:

CREATE TRIGGER trig_log_access AFTER INSERT OR UPDATE OR DELETE ON products FOR
EACH ROW EXECUTE PROCEDURE trig_products_log();

Note that we only connected the insert, update or delete events on the table to the trigger as these are the only ones we're interested in. On a fecascious spelling note our trigger would handle selects incorrectly anyhow because of the IF statement.

Of course triggers can be used to do a lot more than just logging. You could also use a trigger to perform cascading updates of other fields or adjust account balances following a purchase etc...
The atomicty of updates for finicial applications is kept safe with most databases because the trigger is performed as part of the transaction. If the trigger part fails the whole transaction will roll back.

6. Stored Procedures

(The previous trigger example gave an example of Stored Procedure syntax in pl/pgsql which is similar to Oracles pl/sql. I am not going to go into Stored Procedures in any more depth in this document.)

7. Transactions

Transactions are a crucial part of any modern database. Essentially everything that takes place within a transaction must succeed. If even one item within that transaction fails then the entire action is rolled back to how it was before the transaction took place. Some databases like Oracle even have nested transactions providing very intricate data integrity.

Each transaction has what is known as an isolation level. This level determines how much isolation must exist between the current transaction by one user and the transactions of another. The two most well known transaction isolation levels are READ COMMITED and SERIALIZABLE. READ COMMITED transactions mean that other users (even those within transaction) can view results that have been commited (and perhaps flushed to disk). SERIALIZABLE means that each transaction is executed serially, ie ñ one after the other, and increases the consistency of changes within a single transactions. SERIALIZABLE is a mode best used for consistent SELECT's however it also provides the most data accuracy because each block has to be committed serially. A serial transaction will error if any version of the underlying dataset that is being acted upon is modified. This gives the client a chance to refresh the new data and alert the user. The best way to demonstrate is via example:

CREATE TABLE transtest (
id INT4 NOT NULL PRIMARY KEY,
name VARCHAR(30)
) WITHOUT OIDS;

Read Commited:

User Action
USER1
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO transtest VALUES (1,'Jim');
INSERT INTO transtest VALUES (2,'John');
SELECT * FROM transtest ;
USER2
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT * FROM transtest;
USER1
COMMIT;
USER2
SELECT * FROM transtest;
--Returns User1s committed rows.
USER1
BEGIN;
SELECT * FROM transtest;
UPDATE transtest SET NAME = 'Peter' WHERE NAME = 'Jim';
COMMIT;
--Note that if we don't COMMIT here and user2 proceeds with their next
statement then User2's statement will appear to 'hang'. It is actually
waiting on the outcome of User1's statement to see who will 'win'. If user1
issues a ROLLBACK now then user2's command will be successful. This is a VERY
different concept to record locks - here it is our data driving the supposed
'lock', not a lock on a physical object on disk.
USER2
UPDATE transtest SET NAME = 'Paul' WHERE NAME = 'Jim';
--Returns UPDATE 0
COMMIT;
--We didn't actually get a chance to update Jim's record because another user
was able to get there first however we weren't told that our update failed
either!

Serializable:

User Action
USER1
DELETE FROM transtest;
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO transtest VALUES (1,'Jim');
INSERT INTO transtest VALUES (2,'John');
SELECT * FROM transtest ;
USER2
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM transtest;
USER1
COMMIT;
USER2
SELECT * FROM transtest;
--Does NOT return User1s committed rows. This is because they were committed
AFTER our serial transaction began. Note the difference with read committed
transactions.
USER1
BEGIN;
SELECT * FROM transtest;
UPDATE transtest SET NAME = 'Peter' WHERE NAME = 'Jim';
COMMIT;
--Note that if we don't COMMIT here and user2 proceeds with their next
statement then User2's statement will appear to 'hang'. It is actually
waiting on the outcome of User1's statement to see who will 'win'. If user1
issues a ROLLBACK now then user2's command will be successful.
USER2
UPDATE transtest SET NAME = 'Paul' WHERE NAME = 'Jim';
--Returns an error! Note the difference with read committed transactions.
ROLLBACK;
--We are forced to rollback here because the consistency of our data has been
modified by another user before we got a chance to update Jim's record.
Difference here is that we WERE told that our transaction failed. This is
where Concurrency Control with transactions over locks has a HUGE advantage.
We are always creating a new version of a row rather than updating the current
version.

Some Recommended Resources (freely available RDBMS)

http://www.oracle.com/
http://www.postgresql.org/
http://www.mysql.com/
http://www.firebird.org/

Whilst researching this topic I made extensive use of http://www.postgresql.org/ and many of the articles found there. Whilst concepts may match those found in documents at www.postgresql.org, no part of this document uses verbatim or derived text from any of those sources.

http://www.mikeskinner.net/