|
Melbourne Linux Users Group Presentation (22/08/2003)
August 22nd, 2003 By Jason Godden Sourced From: http://members.optusnet.com.au/jasongodden/pg2.html Summary: This presentation describes the features available in a modern RDBMS with a practical approach to their application. Practical Use of SQL and Author: Jason Godden 1. Introduction
Part I . The Basics 1. A Modern RDBMS Any modern RDBMS provides a framework for developing backend elements of an application. Whilst MS Access and FileMaker Pro are certainly RDBMS' they do not provide the flexibility and power of a true client server model and so are not regarded as such in this docment. Recommended RDBMS are Oracle, Microsoft SQL Server, PostgreSQL, MySQL, SyBase, Informix and IBM DB2. A modern RDBMS usually provides a basic object heirarchy of the form:
Always chose an RDBMS appropriate for the problem domain. If you require unparrelled reliability and data consistency Oracle, Informix, Sybase and PostgreSQL are good choices. If you require raw speed MySQL is a good choice. If you require strict adherence to a particular devleopment environment (ie - Windows) then SQL Server may be a good choice. With the exception of SQL Server, all databases mentioned run quite well under a modern Unix such as GNU/Linux, *BSD and Solaris and are all available freely to developers and/or the general public. 2. TABLES: Object Classes Made Easy Tables should always be regarded as classes/objects on disk and not simply as a storage mechanism. A class usually provides object storage, row or object instantiation (as types) and methods including methods that govern object storage, retrieval and specialised functions (stored procedures). Tables model an object or relationship containing datum. Each item of data has associated types, modifiers (constraints), checks and defualts. To create a table one uses the the SQL CREATE TABLE statement. If we wanted to express some products we may define each product as having a product code, name, description and cost. We might also decide to place the number of products in stock in the definition as well (although this isn't strictly normalised). So to create our products table: CREATE TABLE products ( productcode INT4 NOT NULL PRIMARY KEY, productname VARCHAR(40) NOT NULL, productdesc VARCHAR(200), numstock INT4 NOT NULL DEFAULT 0, unitprice NUMERIC(8,2) NOT NULL CHECK (unitprice > CAST(0 AS NUMERIC(8,2))) );
Hint: ALWAYS use NUMERIC or DECIMAL types for any monetary values. FLOAT types are a little unpredictable and the size of the float values in a NUMERIC type forces two digits after the decimal point. There are many data types in defined in the ANSI SQL standard and most databases have their own extended types or an extensible type system so the types for your system may vary. 3. SQL: Structured Queries Made Easy SQL stands for Structured Query Language. It provides an easy conceptual interface to the underlying data that is stored within a database. A query can generally be broken down into several parts. COMMAND TYPE RANGE TABLE RESULT RELATION TARGET LIST QUALIFICATION JOIN TREE OTHERS ORDER BY [ASC|DESC] LIMIT etc... Given our earlier products table the query: SELECT productcode,productname,numstock,unitprice FROM products WHERE unitprice < 200; ...can be broken down into the following tree: COMMAND TYPE ->SELECT TARGET LIST ->productcode,productname,numstock,unitprice RANGE TABLE ->FROM products QUALIFICATION ->WHERE unitprice < 200; 4. Accessing Tables SELECT -> Retrieve data from a table based on qualification SELECT productcode,productname,numstock,unitprice FROM products WHERE unitprice < 200; INSERT -> Insert data into a table (with no qualification) INSERT INTO products (productcode,productname,numstock,unitprice) VALUES (1234,'Linux in a nutshell',10,20); UPDATE -> Update data in a table based on qualification UPDATE products SET numstock = 9, unitprice = 19.95 WHERE productcode = 1234; DELETE -> Delete data from a table based on qualification DELETE FROM products WHERE productcode = 1234; Part II . Using RDBMS Features 1. Constraints 1.1 Check Constraints Use check constraints and database error messages to facilitate your application. The database can already take care of all these things for you so don't try and re-invent the wheel in your client app unless you have to. CREATE TABLE checkdemo ( productcode INT4 NOT NULL PRIMARY KEY, availablefrom DATE NOT NULL DEFAULT current_date, availableto DATE NOT NULL DEFAULT current_date+1 CHECK (availableto > availablefrom), nextorder DATE NOT NULL CHECK (nextorder BETWEEN availablefrom AND availableto) ); Hint: Migrating from MySQL's non standard ENUM to another DB: CREATE TABLE checkdemo2 (
productcode INT4 NOT NULL PRIMARY KEY,
producttype VARCHAR(20) NOT NULL
CHECK (producttype in ('BOOK','DVD','CD','SOFTWARE PACKAGE'))
);
1.2 Not Null Constraints Use NOT NULL constraints to specify that a column cannot be null or empty. 1.3 Unique Constraints Use UNIQUE constraints (as indexes) to specify that a column must be unique. Note that NULL is always unique because each NULL is unknown ergo different and so you may need to specify the column with NOT NULL also. CREATE TABLE products (
productcode SERIAL NOT NULL PRIMARY KEY,
productname VARCHAR(20) NOT NULL UNIQUE
);
INSERT INTO products (productname) VALUES ('Linux in a nutshell');
INSERT 0 1
INSERT INTO products (productname) VALUES ('IPTABLES Firewalls');
INSERT 0 1
INSERT INTO products (productname) VALUES ('Linux in a nutshell');
ERROR: Cannot insert a duplicate key into unique index products_productname_key
1.4 Foreign Keys Use FOREIGN KEYS to enforce your business rules and data relationships. They can also save you SQL statements by implemented updates for you and usually do so inside a transaction so data is guaranteed to be consistent. CREATE TABLE fkdemo1 ( suppliername VARCHAR(20) NOT NULL PRIMARY KEY, supplieraddress VARCHAR(40) NOT NULL ); CREATE TABLE fkdemo2 ( productcode SERIAL NOT NULL PRIMARY KEY, productname VARCHAR(40) NOT NULL, suppliername VARCHAR(20) NOT NULL, CONSTRAINT suppliername_fk FOREIGN KEY (suppliername) REFERENCES fkdemo1 (suppliername) ON UPDATE CASCADE ON DELETE CASCADE ); Events are UPDATE or DELETE 1.5 Indexes Indexes create special look up tables and additional constraints for your database objects. The most common index type is a primary key although there are academic regards for the primary key having super index properties that exclude it from begin classified as an index. We have already dealt with some indexes however for example: CREATE TABLE products ( productcode SERIAL NOT NULL, productname VARCHAR(40) NOT NULL, suppliername VARCHAR(20) NOT NULL ); --Creating a primary key ALTER TABLE products ADD CONSTRAINT products_pkey PRIMARY KEY (productcode); --Creating a unique index ALTER TABLE products ADD CONSTRAINT productname_unq UNIQUE (productname); --Creating a generic index CREATE INDEX suppliername_idx ON products (suppliername); --Creating a multi-column unique index CREATE INDEX suppliername_productname_unq ON products (productname,suppliername); --Creating a generic index on certain row values only CREATE INDEX suppliername_acme ON products (suppliername) WHERE suppliername LIKE 'ACME%'; --Creating a generic functional index CREATE INDEX suppliername_acme2 ON products (UPPER(suppliername)); 2. Views Views are invaluable as security and predefined sql mechanisms to access data. You may have a situation requiring particular output formatting or table joins that you don't wish to re-code in each client application. CREATE VIEW vw_suppliers_products AS (SELECT fkdemo1.suppliername, fkdemo1.supplieraddress, productcode, productname FROM fkdemo1 INNER JOIN fkdemo2 ON fkdemo1.suppliername = fkdemo2.suppliername); 3. Rules Rules are an integral part of a databases VIEW system. Essentially a rule instructs the query parser in a database to rewrite the query so that it performs a different action. When dealing with rules remember that a rule operates at STATEMENT level and as such it doesn't have the same powers as a trigger however it can be useful for various situtions: BEGIN; CREATE TABLE shipments ( containernumber SERIAL NOT NULL PRIMARY KEY, productcount INT4 NOT NULL, arrivaldate DATE NOT NULL DEFAULT current_date, deleted BOOL NOT NULL DEFAULT FALSE ); REVOKE ALL ON TABLE shipments FROM public; CREATE VIEW tvw_shipments AS (SELECT containernumber,productcount,arrivaldate FROM shipments WHERE deleted = FALSE); CREATE RULE shipments_insert AS ON INSERT TO tvw_shipments DO INSTEAD (INSERT INTO shipments VALUES (DEFAULT,NEW.productcount, NEW.arrivaldate,DEFAULT)); CREATE RULE shipments_update AS ON UPDATE TO tvw_shipments DO INSTEAD (UPDATE shipments SET containernumber = NEW.containernumber, productcount = NEW.productcount, arrivaldate = NEW.arrivaldate WHERE containernumber = OLD.containernumber); CREATE RULE shipments_delete AS ON DELETE TO tvw_shipments DO INSTEAD (UPDATE shipments SET deleted = TRUE WHERE containernumber = OLD.containernumber); GRANT SELECT ON VIEW tvw_shipments TO public; COMMIT; Note: We wrap these statements in a transaction to ensure that nobody else can edit data in the shipments table whilst we are creating it. 4. Domains Domains provide a means of extending and fine graining types. Most of us are now familiar with the VARCHAR type which can be used to store text. If we have a common requirement in our database to store object names as VARCHAR (40) NOT NULL we can use domains to ease our programming load and provide some basic documentation about our methods: CREATE DOMAIN objectname AS VARCHAR(40) DEFAULT 'New Object' NOT NULL; Now we can simply use the domain to standardise our tables: CREATE TABLE products ( productcode SERIAL NOT NULL, productname objectname ); So productname field is defined as VARCHAR(40), NOT NULL and with a default value of 'New Object'. 5. Triggers These are some of the most powerful objects in a RDBMS and many developers cannot understand their power until they've used them in a project. Essentially a trigger is an action that occurs when a particualr event happens in your database. One of the most transparent examples is logging. Why program multiple inserts for logging on your client application or backend overnight jobs to look at changed fields when you can have the database do it all in the middle of the transaction and without the overhead of client<->server communication? Triggers can be programmacticlaly instructed to fire after or before inserts, selects, updates and deletes on a table and are usually written in a procedural language (we will touch on that next). If we wanted to log every action (except selects) on our products table from Part 1, point 2 we must first create our log table and then our trigger procedure: CREATE TABLE products ( productcode SERIAL NOT NULL PRIMARY KEY, productname VARCHAR(40) NOT NULL, productdesc VARCHAR(200), numstock INT4 NOT NULL DEFAULT 0, unitprice NUMERIC(8,2) NOT NULL CHECK (unitprice > CAST(0 AS NUMERIC(8,2))) ); CREATE TABLE products_log ( actionwhen TIMESTAMP NOT NULL DEFAULT current_timestamp, actionwho VARCHAR(20) NOT NULL DEFAULT current_user, actiontype VARCHAR(6) NOT NULL, logmsg VARCHAR(200) NOT NULL ); --This is PostgreSQL specific. Oracle functions are similar but other databases may have different syntax (ie - MS SQL Server). CREATE OR REPLACE FUNCTION trig_products_log() RETURNS TRIGGER AS' DECLARE v_logmessage VARCHAR; BEGIN IF TG_OP = ''INSERT'' THEN v_logmessage := ''Inserted '' || NEW.productname || ''.''; ELSE v_logmessage := INITCAP(LOWER(TG_OP)) || ''d '' || OLD.productname || ''.''; END IF; INSERT INTO products_log (actiontype,logmsg) VALUES (TG_OP,v_logmessage); IF TG_OP = ''INSERT'' THEN RETURN NEW; ELSE RETURN OLD; END IF; END; ' language 'plpgsql'; Finally we connect the trigger function to the table as an event listener to fire AFTER an action has taken place. We could specify BEFORE here however actions that do not change data should go AFTER an action: CREATE TRIGGER trig_log_access AFTER INSERT OR UPDATE OR DELETE ON products FOR EACH ROW EXECUTE PROCEDURE trig_products_log(); Note that we only connected the insert, update or delete events on the table to the trigger as these are the only ones we're interested in. On a fecascious spelling note our trigger would handle selects incorrectly anyhow because of the IF statement. Of course triggers can be used to do a lot more than just logging. You could also use a trigger to perform cascading updates of other fields or adjust account balances following a purchase etc... 6. Stored Procedures (The previous trigger example gave an example of Stored Procedure syntax in pl/pgsql which is similar to Oracles pl/sql. I am not going to go into Stored Procedures in any more depth in this document.) 7. Transactions Transactions are a crucial part of any modern database. Essentially everything that takes place within a transaction must succeed. If even one item within that transaction fails then the entire action is rolled back to how it was before the transaction took place. Some databases like Oracle even have nested transactions providing very intricate data integrity. Each transaction has what is known as an isolation level. This level determines how much isolation must exist between the current transaction by one user and the transactions of another. The two most well known transaction isolation levels are READ COMMITED and SERIALIZABLE. READ COMMITED transactions mean that other users (even those within transaction) can view results that have been commited (and perhaps flushed to disk). SERIALIZABLE means that each transaction is executed serially, ie ñ one after the other, and increases the consistency of changes within a single transactions. SERIALIZABLE is a mode best used for consistent SELECT's however it also provides the most data accuracy because each block has to be committed serially. A serial transaction will error if any version of the underlying dataset that is being acted upon is modified. This gives the client a chance to refresh the new data and alert the user. The best way to demonstrate is via example: CREATE TABLE transtest ( id INT4 NOT NULL PRIMARY KEY, name VARCHAR(30) ) WITHOUT OIDS; Read Commited:
Serializable:
Some Recommended Resources (freely available RDBMS) http://www.oracle.com/http://www.postgresql.org/ http://www.mysql.com/ http://www.firebird.org/ Whilst researching this topic I made extensive use of http://www.postgresql.org/ and many of the articles found there. Whilst concepts may match those found in documents at www.postgresql.org, no part of this document uses verbatim or derived text from any of those sources. |