18 April 2010

Information Retrieval - Understanding Databases

LESSON #32: INFORMATION RETRIEVAL - UNDERSTANDING DATABASES

WHAT YOU WILL LEARN
This lesson will introduce you to the fundamentals of databases and how they can be applied to your business.

DATABASES
A database is defined as a collection of structured similar data organized for rapid search and retrieval.

Let's say that you want to create and maintain a list of all the products you can sell on commission. Let's say you also want to create and maintain yet another list of all potential customers (prospects) you have for these products. Your first inclination may be to use a word processor like Microsoft Word or Corel WordPerfect to create and maintain these lists. Even though the products change from time to time and your prospects and their information change from time to time, word processor files are easy to edit, so there should be no problem...But, let's say that you have need to organize your products according to the affiliate program through which you can sell them. Most of your products are SFI products, but you also have some books, that closely match the subject matter of your Website, that you sell through the Amazon.com affiliate program. Say you also have some traffic generation products that you use to build your business that have their own affiliate program as well. If you used a word processor, you would have to create another list, organized by affiliate program, separate from your original list. You now have two lists instead of one.

Let's also say that you want to organize your products by category. Some of your products are books, some are nutritional products, some are online information products, some are leads and traffic generation services, some are telecommunication products, some are cleaning products - and you also sell coffee...Now you have to create yet another list.

If you wanted to organize your products alphabetically under each affiliate program and organize them alphabetically under each product category, you have to create two more lists.

As you can see, using word processor files to maintain this data soon becomes quite cumbersome. If information on a product changes, you have to change that information on several lists. If information about a prospect changes, you have to change that information on several lists. You wind up with a lot of files and spending a lot of time editing them to keep all your information current.

Plus, the more you have to type the same information in, the more likely you are to make a mistake. When, due to a typing error, you wind up with two different spellings for the same prospect or product, how do you know which one is right?

So, you need a better application to maintain and manipulate your data. The answer - a database.

Using a database, you list all your product information in one place and you list all your prospects' information in one place. When the information changes, you only have to change it once. If you make a typo, you only have to correct it in one place. When you want a list of your products organized by affiliate program, you simply create a report to pull the information organized by affiliate program. When you want a list of products organized by category, you simply create a report to pull the information organized by category. If you want a list of products organized by both category and affiliate program - no problem - you just set up your report to do that.

TABLE STRUCTURE
Most database applications use a table structure (also called a two-dimensional grid). Imagine a table to record your products that looks like this:
PRODUCT NAME PRODUCT CATEGORY AFFILIATE PROGRAM
IAHBE Subscriptions Online Informational SFI
Veriuni Wireless Telecommunications SFI
Internet Income, Vol I Book Book SFI
Ultimate Weight Solution Book Amazon

Each horizontal row contains information about a single product. The rows are divided into vertical columns. Each column contains a characteristic of the product. In database terminology, each row is called a record and each column is called a field. The first column in our example contains only the names of the products. The second column contains only the category to which you have assigned the product. The third column identifies the affiliate program through which the product is available for you to sell.

Any list of information that can be broken into records containing similar characteristics, which can be defined by specific fields, can be made into a database.

A collection of data, like the one we have above for our products, is called a table. Again, the individual rows in the table are called records (although often just referred to as rows). The individual columns are called fields (although often just referred to as columns). A collection of tables is called a database.

There can be more than one table in a database. A given database may contain a table for products, another table for prospects, and yet another table for active customers. As we will discuss below, different tables in a database can be related to each other through common fields. When a prospect becomes a customer by purchasing something, you can just pull the information (address, phone number, etc.) about that person from the prospect table, where it was first entered, rather than re-entering all the information in the customer table.

DATABASE PROGRAMS AND STANDARDS
There are many different database programs (aka, applications) available for you to use to organize your information. Database applications included in common office suites are Microsoft's Access and Corel's Paradox. A unique and user friendly database program that has become popular in certain segments is the AskSam software package.

It is also important to be familiar with database standards and languages. SQL (short for Structure Query Language) is an industry standard language specifically designed for databases. You can use this language to work with databases across different applications.

The need for sharing database formatted information across networks and the Internet has skyrocketed in recent years. Incompatibility of different software applications has been a major obstacle. For example, if you had data stored in an old dbase format and needed to share that data with a network running Microsoft Access; it could not be done without hours of effort to convert the data. The solution to this problem is to use ODBC and SQL to share the data. ODBC (Open Database Connectivity) is a standard interface between the data and the database application that works with that data. SQL is the standard language used to manipulate the data accessed through an ODBC connection.

That is, with an ODBC connection to a database, you can use the SQL language to create tables, add or modify data, and retrieve information; regardless of the database application with which the data is created and maintained on its source computer.

Also important are the applications that integrate database data with the World Wide Web. More technically stated, we refer here to the applications which convert database information into the hypertext transfer protocol (http) used by the World Wide Web. This can be done by scripting languages or API's, such as Java, MacroMedia's Cold Fusion, or the open source PHP. These applications integrate and display database information within web pages. They also allow you to manipulate database information from a web interface.

Many of the forms you fill out on the Web will automatically enter the information submitted into a database on the server network. Many of the Web forms used to correct or update your information will automatically modify the information about you on the database. (The alternative procedure is for the form to generate an e-mail to the administrator who will then make or modify the database entries using the main database application.)

When you view your Powerline Genealogy at the SFI Resource Center, you are viewing database information integrated with a Web page. When I log into the SFIMG Resource Center with my SFI ID and password, it knows to display my Powerline when I select the PTL Genealogy Report. When you log in with your SFI ID and password, it knows to display your Powerline when you select the PTL Genealogy Report. The affiliates in both of our Powerlines are stored in the same SFI database. The code used at the SFIMG Resource Center knows how to sort them out to display those in my powerline to me and those in your powerline to you.

DATA TYPES
Each column in a database table contains a single piece of data, part of the complete record stored in that row. When you create a table in a database you have to define each column by setting its name, its size, and the data type it can accept.

In our example table above, the name of the first column is "Product Name." The name of the second column is "Product Category." The name of the third column is "Affiliate Program." These column names identify the information that will be stored in each column for each record.

In earlier days, setting the appropriate size limitations for each column was very important to conserve resources. Database programs tend to set aside disk space to hold data for each column that is created. If you didn't limit the size, you would run out of disk space. This is not so much of an issue any more as database programs have become better at dynamically assigning space and computers have much more disk space available now. Size restrictions can also be useful, however, for certain formatted data such as phone numbers or zip codes. Within a particular country, there will be limits to the length of phone numbers and postal codes. By limiting the size of your column for these entries to the standard limitations, you can prevent incorrect entries due to inattention, while at the same time conserving disk space. In other words, if a key sticks when someone is typing in a phone number and it turns out to be 25 numbers long, you know this is incorrect for a U.S. customer. You may not notice the key sticking while typing in the data, but it will be brought to your attention with the error message that will result from attempting to enter too many characters in the form that will populate this column of the customer record.

Data type limitations, the other thing you can limit when creating a new column, are also very useful. Some data types allow the entry of free-form alphanumeric data, while others restrict data entry to just numbers or dates, or "true/false" switches. Common data types are: character (aka, "text"), numeric, date, money, boolean, and binary. These refer, respectively, to: letters; numbers; date formats such as "7/15/04"; numbers to 2 decimal points with dollar signs; yes/no flags; and non-textual data such as pictures, sound, or video files.

If you have a form that allows someone to submit their picture, they shouldn't be typing text into this field of the form. The reverse is even more important for security reasons - you definitely do not want anyone uploading a potentially harmful binary file (such as a virus) where they should just be typing their name. Data type limitations are used to make sure the right type of data is entered into each field.

Different data types are also useful in controlling how data is sorted for reports. Data in a text field is sorted alphabetically, one character at a time, from left to right. Numbers come first, then letters. Because each character is evaluated individually, the number 10 comes after the number 1, but before 2. Thus, you would not want to store quantitites in a text field because they would not sort in the correct order. Quantities need to be stored in a numeric field in order to sort out in numerical order, rather than alphabetical order.

Different database applications use different terminology to define the various data types. When setting up data types, make sure you use the terminology and the conventions of the application you are using.

RELATIONAL DATABASES
Let's return to our example of keeping a list of prospects in a database table. Let's say you have expanded that table to include last name, first name, street address, city, state or province, country, and phone number. You also add a field (column) to your table to indicate how you became aware of each prospect. Now, you want to keep up with your history of contacts with each prospect. It's beginning to get pretty complicated when you think of all the ways you can sort out the information. For example, you can think of four ways that you can follow up with a prospect:

1) You can phone the prospect,

2) You can e-mail the prospect,

3) You can send a snailmail letter through the postal service, and

4) You can chat with the prospect online with an instant messaging service such as ICQ.

There are also many types of outcomes you can have for each contact attempt:

1) You can receive no response,

2) The prospect can express indifference or disinterest,

3) The prospect can request to be removed from your contact list,

4) The prospect can express interest in further contact, or

5) The prospect can purchase a product (the outcome you want!).

Even more troublesome is the realization that you will need to record many contacts with each prospect. How many columns can you create for each prospect and retain manageability? If you used this approach, you would wind up with columns like 'first contact', 'second contact', 'third contact' and so on. Then you would need 'first contact date', first contact type' and 'first contact outcome', 'second contact date', 'second contact type', 'second contact outcome' and so on. As you can see, if you tried to keep all this information in a single table, it would become way too cumbersome.

The best way to keep up with this contact history is to create another table. You can create columns in this new table for 'type of contact method', date of contact, and 'outcome'. Each row will be tied to a specific prospect. It is not necessary to reenter all the name and address information for each prospect in this table because you already have it in the other table. You just need to tie the rows in this table to the specific prospects in your original prospects table. You do this by using primary and foreign keys.

Every row in every table of your database should have one field that uniquely identifies that row. In your original prospects table, you might be tempted to use the name fields to uniquely identify your prospects. This will not work, however. Names may be duplicated. You may have two people both named John Smith on your prospect list. The rule for primary keys is that they can never be duplicated! If they are duplicated, they can no longer serve as unique identifiers. Most database applications will not allow entry of duplicate data in the primary key field. Thus, the second John Smith could not even be entered.

The way to handle this is to create a special id number for each prospect as they are entered into your database table. No other prospect will ever be assigned this same number. (In fact, most database applications prevent the number from ever being used again even after the original record using that primary key number has been deleted.) The unique id number assigned to each prospect in the original prospect table is then used as a 'foreign key' to identify that prospect in the contact history table.

There can be more than one row using the same foreign key in the contact history table. The foreign key in the secondary table is not a primary key in that table. It is only a primary key in the original table. Thus, it can be duplicated in the contact history table. And, this is good because you will want to make several entries for each prospect as you contact them over time.

To make this relationship work as we have described, you need to create a new column in your original prospect table and name it something like "prospect_id". Then, when you set up your contact history table, you also need to create a column named 'prospect_id'. In the original prospects table you set the prospect_id field as a primary key. You do not set it as a primary key, however, in the contact history table. It serves as a foreign key in that table - not a primary key. Every entry into the contact history table will contain a prospect_id which matches a unique value in the primary table (and thus matches a particular prospect). When you generate reports, you can pull the name and other information about the prospect from the primary table and include it with the contact history information from the secondary table. Notice, however, that no information from the primary table has to be repeated in the secondary table except for the prospect_id.

Whenever your primary table is becoming too cumbersome due to repeating information, consider creating another table for the additional data and relating that new table back to existing table through primary and secondary keys.

FORMS AND REPORTS
To put information into a database, we use forms. These can be forms created within the database application or forms in some other format but tied to the database through ODBC and SQL. As we mentioned above, many of the forms you encounter on the web are tied to a database somewhere.

To get information out of a database, we use reports. If you wanted the contact history for a particular prospect, you would create a report that would look up the prospect_id of that prospect from the primary prospect table and then retrieve every entry with that prospect_id as a foreign key in the contact history table. The design of your report would determine what information was displayed and in what order.

It may be that you want to know every prospect that you contacted last weekend. You would design your report to query the database by date and retrieve all of the entries in the contact history table that have a date of last Friday, Saturday, or Sunday. Then your query would match those entries with the prospects' information by searching the primary table using the prospect_id's resulting from the contact history table search. Your output could then be sorted by prospects. That is, all of the contacts with each prospect would be listed together under that prospect's information - unless you wanted them listed chronologically - in which case your report could do that.

Perhaps you want a list of all prospects who have not been contacted in over thirty days. This query would identify the prospect_id's for all contacts from the last thirty days. It would then search the primary contacts table for all of the prospects except the ones you have contacted within the last thirty days.

Once you have identified the information that you want, you can then design your report to sort the information in the fashion that you need. There are numerous ways you can sort the information. Prospects that you have not contacted within the last thirty days can be sorted by country or by area code. Or, they can be sorted first by country and then by area code. They can be sorted by date of first contact or they can be sorted by date of last contact. It's up to you. There are so many different ways to query and organize your data when database technology is used!

GARBAGE IN / GARBAGE OUT
An old concept dealing with computers and databases is worth repeating here as we begin to wrap up our discussion of databases. The accuracy of your reports depends upon the accuracy of your information input. If you put the information in wrong, it will come out wrong in your reports. Since with databases you only have to enter the information once, you can take the time to enter it correctly. Since you only have to make changes in one place, you can keep up with the changes and make sure they are accurate. Thus, your reports are more likely to be accurate.

(One big problem with computers is that they tend to perpetuate errors. If something is entered wrong, it is difficult for many organizations to get it corrected. Take the time to make corrections as you learn of them to avoid this problem.)

CONCLUSION
The more efficiently we process information, the more successful we will be. Database applications are much more efficient than word processing applications for dealing with certain types of information. Database applications are very useful tools to store, organize, process and retrieve structured similar information (such as records of our products or records of our prospects). Using ODBC and SQL, we can share database information across other networks. Additional applications such as ColdFusion or PHP allow us to share and work with database information on the Internet. As we determine our information processing needs, we can design tables to record the characteristics that are important for each item in the records that we need to keep. When necessary to avoid duplication of data, we can create additional tables that relate back to records in our primary table by using primary and foreign keys. We use forms to input data and reports to retrieve and sort the data. For our reports to be accurate, we must input the data accurately into the database. Our reports can retrieve and sort data in numerous ways depending upon our needs at the time. Database applications can make us much more efficient in running our businesses.

WHAT'S COMING NEXT
In our next lesson we will discuss information retrieval science as it relates to searching and retrieving textual information.

Author Bio
Article by George Little.
For more information on the Internet Income Course and other works and courses by George Little, see www.profitpropulsion.com.
For Web Hosting services specially designed for SFI affiliates, see www.profitpropulsion.com.

No comments:

Post a Comment