Terminologies

Database
A very large collection (of files) of organized and related data.
DBMS
A software package/system that can be used to store, manage and retrieve data from databases that persist for long periods of time!
Database System
DBMS + data [+ applications]
Data Model
A framework for describing data, data relationships, data semantics, and data constraints.
Schema
The structure of the database using a particular model.

Data Models

It has two layers.

  • Conceptual or Logical Level: describes data stored in database, and their relationships among the data.
  • Physical Level: describes how files and records are stored.

We have multiple choices for data model [wiki], [wiki].

Relational Model
It is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values.
We will be working mostly around Relational Model.
Entity Relationship Model
In software engineering, an entity–relationship model (ER model) is a data model for describing the data or information aspects of a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented in a database such as a relational database.
Hierarchical Model
In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.

Relational Databases

Relational Database
A set of relations
Relation
Made up of 2 parts: Schema and Instance
Schema
specifies name of relation, plus name and type of each column
Instance
the actual data at a given time. Cardinality: number of rows, Degree/Arity: number of fields
Super Key
A set of fields is a superkey if no two distinct tuples can have same values in all key fields.
Key
A set of fields is a key for a relation if it is a superkey and no subset of the fields is a superkey. One such key is chosen as Primary Key, others (if exist) are called Candidate Keys.
Foreign Key
A set of fields in a tuple in one relation that refer to a tuple in another relation. It reference to primary key of the other relation.

SQLite

The SQLite project provides a simple command-line utility named sqlite3 (or sqlite3.exe on windows) that allows the user to manually enter and execute SQL statements against an SQLite database. [ref]

We will use this for SQL query exercise.

  1. Download the CLI from http://www.sqlite.org/download.html, select precompiled binaries for your operating system.
  2. Download movie.sqlite, our database to run queries upon. [from Dave]

Exercises

Here, we are working on a IMDB database, containing information about movies, actor/actress, directors, Oscar awards info, etc,. In our database, the Movie relation has the following attributes:

Attribute Name Data Type Description
id CHAR(7) a unique id assigned to the film
name VARCHAR(64) the name of the film
year INTEGER the year the film was released
rating VARCHAR(5) the MPAA rating of the film; if the film is unrated or has a non-standard rating, this value is NULL.
runtime INTEGER the running time of the film in minutes
genre VARCHAR(16) the genre(s) of the film, formed by concatenating one-letter genre codes. no assumptions should be made about the order in which these codes appear in a given genre string
earnings_rank INTEGER the earnings rank of the film (only included for the top 200; NULL otherwise)
Genre: action (A), adventure (V), animation (N), comedy (C), drama (D), family (F), fantasy (Y), horror (H), mystery (M), musical (L), romance (R), sci-fi (S), thriller (T), war (W), western (E);

And part of the actual table looks like this.

id name year rating runtime genre earnings_rank
1300854 Iron Man 3 2013 PG-13 130 A 13
1690953 Despicable Me 2 2013 PG 98 N 24
0770828 Man of Steel 2013 PG-13 143 A 54
1453405 Monsters University 2013   104 N 64
1905041 Furious 6 2013 PG-13 130 A 86
1623205 Oz the Great and Powerful 2013 PG 130 V 90
1408101 Star Trek Into Darkness 2013 PG-13 132 A 96
0816711 World War Z 2013 PG-13 123 A 135
0481499 The Croods 2013 PG 98 N 152
0848228 The Avengers 2012 PG-13 143 A 3

This is actually generated by the following SQL query, FYI.

select * from movie order by year desc limit 10;
  1. Write down the schema of this table according to the description.
  2. Explain the meaning of those datatype keywords, e.g. varchar(64), char(7), integer.
  3. Discussion: Why do we choose these datatypes? Can we choose other types instead?
  4. Based on these 10 example records, what field or set of fields must not be a super key?
  5. Based on these 10 example records, what field or set of fields might be a super key?
  6. Based on these 10 example records, what field or set of fields might be a candidate key?
  7. Based on our precise definition, what’s our primary key? What about candidate key?
  8. Try to execute the example query in SQLite.
    SQL Language Reference (SQLite Dialect): https://www.sqlite.org/lang.html

    Please open the movie.sqlite database using the following command, bash ./sqlite3 movie.sqlite In SQLite command line interface, you can type .help for help information.

  9. With the help information, try to display all the tabls schemas in the database. And then try to display only the schema info for movie table.
  10. Run an query to show the name and earnings rank of top 30 grossing movies in 2013.
  11. Show name and year of animations who runs over one hour and a half.

Solution

  1. Write down the schema of this table according to the description.

    sql movie (id char(7), name varchar(64), year integer, rating varchar(5), runtime integer, genre varchar(16), earnings_rank integer)

  2. Explain the meaning of those datatype keywords, e.g. varchar(64), char(7), integer. http://www.w3schools.com/sql/sql_datatypes_general.asp

  3. Run an query to show the name and earnings rank of top 30 grossing movies in 2013.

    select name, earnings_rank from movie where year = 2013 and earnings_rank < 30;
    
  4. Show name and year of animations who runs over one hour and a half.

    select name, year from movie where genre like "%N%" and runtime > 90;
    

Extra

SQL Dialect: http://en.wikibooks.org/wiki/SQL_Dialects_Reference