Visual Basic and databases

 

Database concepts review

Record - Field - table relationship
Indices
 

Creating a table in SQL - some examples

create table contact(
  ID integer,
  LName varchar,
  FName varChar,
  Relationship varChar,
  HomeID integer
);

Filling the table, with SQL commands

insert into contact values (1,'Harris', 'Andy', 'me', 1);
insert into contact values (2,'Harris', 'Heather', 'wife', 1);
insert into contact values (3,'Harris', 'Elizabeth', 'daughter', 1);
insert into contact values (4,'Harris', 'Matthew', 'son', 1);
insert into contact values (5,'Moon', 'Judy', 'Mother in law', 2);
insert into contact values (6,'Moon', 'Chris', 'Brother in law', 2);

Querying the table
SELECT * FROM contact;

grabbing a subset of fields
SELECT Fname, relationship FROM contact;

grabbing a subset of records
SELECT * from contact WHERE LName = 'Harris';

combining field, record selection
SELECT FName, LName FROM contact
WHERE LName = 'Harris';

Using Query by Example to get same result:

 
 

Relational databases

Why go relational?

The flat file database approach is easy to understand, and it is handy for simple databases, but it does not really reflect the way larger datasets work in the real world.  The flat file model reflects a world where everything is of the same type, and there's no interaction between things that are not of the same record definition.

Here's what our contact database OUGHT to look like:

Notice that all the Harris' live at one address, and the Moons live at another address. This of course could be done in a flat file database, but it would have some problems.  There are several people at each address, so there is a lot of data being duplicated.  In addition, the typing load on the person entering the data is filled with repetition of the same data.  There's lots of room for error here.  If the Harris family moved, we will have to find and change four different addresses, and type the new address correctly.

Many databases contain this kind of information, that is very repetitive.  States, account types, and lots of similar kinds of information could be pre-defined.  Take a look at our original database again, and see how the address is configured:

The home address is stored as an INTEGER!?!?!

There's only two addresses, so it would be much better if we entered each address in the system only ONE time, then determined a way to incorporate it into the tables automatically.  This concept is called Normalization.

Let's build a new table of Homes.  It looks like this:

The table is very simple.  It contains only an ID, which is of course the key field, for this table, and a text field containing the actual address.  This kind of index - value table is often called a lookup table or a data dictionary.  It gives the database designer a way to store data that will be repeated and preserver referential integrity.

Now you can see the relationship between the tables.  In the contact table, there is a homeID Field.  This field contains nothing more than an index to the value of the address in the Home table.  Contact.homeID is an example of a foreign key.  A foreign key is a field who's purpose is to contain values that are the keys of other tables. In order for the rule of referential integrity to be upheld, we must provide an element in the Home table for every value that appears in the HOMEID field of the contact database.  Each table in a relational database will have a primary key.  Some will also have a foreign key (or maybe several) which will point to the primary keys of OTHER tables in the system.

If you were to look back at our goal:

You see that it looks a lot like the contact database, except that we have replaced the HomeID field with the appropriate field from the Home table.
 

Joins.

Somehow we must have a mechanism for joining two different tables together.  We have such a mechanism, and it is called the JOIN.  Take a look at the following SQL:
SELECT contact.FName,contact.LName, contact.relationship, Home.address
FROM contact , home

It looks promising.  Let's see what it produces:

Yikes!

There's something really wrong here.
If you look closely, you will notice that there are two entries for each element in the contact table, or 6 entries for each element in the home table.  In other words, if there are n elements in one table, and m elements in another, we will have m x n possible values here.

That kind of join is incredibly crude.  It is called a union, because it works like a union operation from mathematics.  Remember in set theory, a union of two sets includes all the elements of both sets.  That is not what we want here.  What we are looking for is more like an intersection.

In database terms, we are looking for something called an inner join.

Here's the syntax in SQL:
SELECT contact.FName, contact.LName, contact.relationship, Home.address
FROM contact INNER JOIN home ON contact.homeID = home.id
ORDER BY contact.ID;

The main thing that has changed is the FROM line.  Rather than simply telling the system that we want to grab from two tables, we need to inform it that the tables are related. (thus, it is a relational database!)  We need to give it some help figuring out how the relationship works.

contact INNER JOIN home  informs us of the general type of relationship.  If these two tables had a field with the same name, the rest of the relationship might be automatically generated, but you should not rely on it. The rest of the line:
ON contact.homeID = home.id
tells us how to pair up the records.  It is an inner join because we are looking only for the records where there is a matching value in these two fields.

Once we generate an inner join relationship, the relationship will often be invisible to the end user.

Generating relationships in GUI DBMS environments
In Access and similar modern DBMS environments, there is often a tool which allows you to set up relationships between tables.

Here's how it looks in access with the database we've been playing with:

What we want to do is define a relationship between contact.homeID and home.ID.  What you do is select one of the fields, then drag until you are on the other.  I'll drag from contact.homeID to home.ID.  Here's the results:

Now you will see a little line between the HomeID field of contact and the ID field of home.  This is a standard notation for a relationship.  In fact, this is an example of a one to many relationship.  Each ID element in home will potentially have many HomeIDs in contact referring to it.This is often denoted with an infinity symbol near the 'many' field in the relationship.

Once you have defined a join, things are a little different in the query engine.

Notice that the tables are now shown with the join, and, when we run the query, we automatically get an inner join.  Just for fun, let's look in SQL view and see what we get...

SELECT contact.FName, home.address
FROM contact INNER JOIN home ON contact.HomeID = home.ID;

Haw!!  Just like when we did it by hand.  When GUI interfaces allow you to play with relationships, they are just designing joins.

Had we done exactly the same QBE on the tables without the relationship, we would have gotten the union of the tables, rather than the join.
 

So, how do we use this in VB?

Our users could not care less about referential integrity, inner joins, and the like.  They just want a reliable database that works well.  Our job is to find a way to make it reliable and to shield them from these details.  There are two major strategies here.  You can predefine a query that accounts for the join(s), then save that query.  The query will look to the end user like a table.  They will not need to know that it is something else.  Set the recordsource property of the data control to the query rather than the actual tables.

You can also generate the appropriate select statement by hand in your code.  In our example, if they ask for the address, you will need to generate the results as a join query.  If not, no join will be needed.  Once you have generated the query, you attach it to the recordsource of the data control.

In both of these cases, the results will be READ ONLY.  A query is a VIRTUAL table.  You cannot write to it directly, only read from it.  If they want to edit the values, we will need some other controls.  We'll look at them next time.
 

Assignment.

Your assignment is to generate a simple relational database with at least one inner join.  Create two tables and a query using the visual data manager packed with VB.  You should have a master form that allows users to go to several other forms:  An editor for each table (with read / write access, so they can add new records), and a querying tool.  The querying tool should have a place where a user who knows SQL can type it in to get a result, and should also have some kind of assisted query building for people who do not know any SQL.  Here's an example screen shot using my database...