Link Tables Together¶
So far, we have worked with a single table table to hold readings taken from balloon flights.
What if we wanted to hold details of the flights themselves?
First, let's connect to our database:
import sqlalchemy as sa
import pandas as pd
engine = sa.create_engine('sqlite:///flight.db')
connection = engine.connect()
And now, let's create a table to hold a name for each flight along with the country code, latitude and longitude of where that flight took place:
sql = """
CREATE TABLE flights (
name VARCHAR(10) NOT NULL,
country_code VARCHAR(2) NOT NULL,
latitude REAL NOT NULL DEFAULT 0.0,
longitude REAL NOT NULL DEFAULT 0.0,
CONSTRAINT flights_pk PRIMARY KEY (name),
CONSTRAINT lat_ck CHECK (latitude BETWEEN -90 AND 90),
CONSTRAINT long_ck CHECK (longitude BETWEEN -180 AND 180)
)
"""
connection.execute(sql)
<sqlalchemy.engine.result.ResultProxy at 0x10639b4a8>
We dropped our readings table earlier, so it no longer exists.
We'll need to re-create it but this time we'll add an extra 'Foreign Key' constraint.
This will ensure that any values in the 'flight' column must be valid entries in our new 'flights' table:
CREATE TABLE readings (
flight VARCHAR(10) NOT NULL,
ts TIMESTAMP NOT NULL,
temp NUMERIC(3,1) NOT NULL,
pressure NUMERIC(4,0) NOT NULL,
humidity NUMERIC(3,0) NOT NULL,
accel_x REAL DEFAULT 0 NOT NULL,
accel_y REAL DEFAULT 0 NOT NULL,
accel_z REAL DEFAULT 0 NOT NULL,
CONSTRAINT readings_pk PRIMARY KEY (flight, ts),
CONSTRAINT temp_ck CHECK (temp BETWEEN -70 AND 70),
CONSTRAINT pres_ck CHECK (pressure BETWEEN 0 AND 2000),
CONSTRAINT hum_ck CHECK (humidity BETWEEN 0 AND 100),
CONSTRAINT flights_fk FOREIGN KEY (flight) REFERENCES flights(name)
CONSTRAINT flights_fk FOREIGN KEY (flight) REFERENCES flights(name)
can be read as:
- Create a constraint and name it 'flights_fk'
- It's a foreign key constraint on the 'flight' column in this table
- Any value in the flight column must refer to a value in the 'name' column on the 'flights' table
But what if we have readings for a flight which we then delete? That would break the constraint we just defined.
We can specify what should happen in the readings table when a record in the flights table is updated or deleted.
We'll say that any deletions should be cascaded to our readings table meaning that all readings for that flight should also be deleted. (You can read more detail about the other possible options in the SQLite documentation):
sql = """
CREATE TABLE readings (
flight VARCHAR(10) NOT NULL,
ts TIMESTAMP NOT NULL,
temp NUMERIC(3,1) NOT NULL,
pressure NUMERIC(4,0) NOT NULL,
humidity NUMERIC(3,0) NOT NULL,
accel_x REAL DEFAULT 0 NOT NULL,
accel_y REAL DEFAULT 0 NOT NULL,
accel_z REAL DEFAULT 0 NOT NULL,
CONSTRAINT readings_pk PRIMARY KEY (flight, ts),
CONSTRAINT temp_ck CHECK (temp BETWEEN -70 AND 70),
CONSTRAINT pres_ck CHECK (pressure BETWEEN 0 AND 2000),
CONSTRAINT hum_ck CHECK (humidity BETWEEN 0 AND 100),
CONSTRAINT flights_fk FOREIGN KEY (flight)
REFERENCES flights(name) ON DELETE CASCADE
)
"""
connection.execute(sql)
<sqlalchemy.engine.result.ResultProxy at 0x10d1bb9e8>
For reasons of backwards compatibility, foreign keys are not enabled by default in SQLite and we need to issue a command to switch them on:
connection.execute("PRAGMA foreign_keys=on")
<sqlalchemy.engine.result.ResultProxy at 0x10d1bbe10>
Let's try to re-create the readings records from chapter 3:
sql = """
INSERT INTO readings(flight, ts, temp, pressure, humidity)
VALUES
('hab1', '2015-01-01 09:00:00', 25.5, 1020, 40),
('hab1', '2015-01-01 09:01:00', 25.5, 1019, 40),
('hab1', '2015-01-01 09:02:00', 25.5, 1019, 41)
"""
try:
connection.execute(sql)
except Exception as e:
print(e)
(sqlite3.IntegrityError) FOREIGN KEY constraint failed [SQL: "\n INSERT INTO readings(flight, ts, temp, pressure, humidity)\n VALUES \n ('hab1', '2015-01-01 09:00:00', 25.5, 1020, 40),\n ('hab1', '2015-01-01 09:01:00', 25.5, 1019, 40),\n ('hab1', '2015-01-01 09:02:00', 25.5, 1019, 41)\n"] (Background on this error at: http://sqlalche.me/e/gkpj)
SQLAlchemy raises an Integrity error because we tried to create readings for flight 'hab1' but no such record exists in our 'flights' table.
Let's put that right and try again. We'll create a record for the 'hab' flight but leave it with default values for latitude and longitude:
connection.execute("INSERT INTO flights(name, country_code) VALUES ('hab1', 'GB')")
pd.read_sql('flights', connection)
name | country_code | latitude | longitude | |
---|---|---|---|---|
0 | hab1 | GB | 0.0 | 0.0 |
Now, we should be able to create our readings:
connection.execute(sql)
pd.read_sql('readings', connection)
flight | ts | temp | pressure | humidity | accel_x | accel_y | accel_z | |
---|---|---|---|---|---|---|---|---|
0 | hab1 | 2015-01-01 09:00:00 | 25.5 | 1020 | 40 | 0.0 | 0.0 | 0.0 |
1 | hab1 | 2015-01-01 09:01:00 | 25.5 | 1019 | 40 | 0.0 | 0.0 | 0.0 |
2 | hab1 | 2015-01-01 09:02:00 | 25.5 | 1019 | 41 | 0.0 | 0.0 | 0.0 |
Let's create a second flight:
connection.execute("INSERT INTO flights(name, country_code) VALUES ('hab2', 'GB')")
pd.read_sql('flights', connection)
name | country_code | latitude | longitude | |
---|---|---|---|---|
0 | hab1 | GB | 0.0 | 0.0 |
1 | hab2 | GB | 0.0 | 0.0 |
And insert some readings for that second flight:
sql = """
INSERT INTO readings(flight, ts, temp, pressure, humidity)
VALUES
('hab2', '2015-01-01 09:00:00', 27.5, 1020, 40),
('hab2', '2015-01-01 09:01:00', 25.0, 1019, 40),
('hab2', '2015-01-01 09:02:00', 26.2, 1021, 41)
"""
connection.execute(sql)
pd.read_sql('readings', connection)
flight | ts | temp | pressure | humidity | accel_x | accel_y | accel_z | |
---|---|---|---|---|---|---|---|---|
0 | hab1 | 2015-01-01 09:00:00 | 25.5 | 1020 | 40 | 0.0 | 0.0 | 0.0 |
1 | hab1 | 2015-01-01 09:01:00 | 25.5 | 1019 | 40 | 0.0 | 0.0 | 0.0 |
2 | hab1 | 2015-01-01 09:02:00 | 25.5 | 1019 | 41 | 0.0 | 0.0 | 0.0 |
3 | hab2 | 2015-01-01 09:00:00 | 27.5 | 1020 | 40 | 0.0 | 0.0 | 0.0 |
4 | hab2 | 2015-01-01 09:01:00 | 25.0 | 1019 | 40 | 0.0 | 0.0 | 0.0 |
5 | hab2 | 2015-01-01 09:02:00 | 26.2 | 1021 | 41 | 0.0 | 0.0 | 0.0 |
We can now see our 'ON CASCADE DELETE' in action. We'll delete flight 'hab2' from the 'flights' table:
connection.execute("DELETE FROM flights where name = 'hab2'")
pd.read_sql('flights', connection)
name | country_code | latitude | longitude | |
---|---|---|---|---|
0 | hab1 | GB | 0.0 | 0.0 |
We should see that the readings for hab2 were also deleted:
pd.read_sql('readings', connection)
flight | ts | temp | pressure | humidity | accel_x | accel_y | accel_z | |
---|---|---|---|---|---|---|---|---|
0 | hab1 | 2015-01-01 09:00:00 | 25.5 | 1020 | 40 | 0.0 | 0.0 | 0.0 |
1 | hab1 | 2015-01-01 09:01:00 | 25.5 | 1019 | 40 | 0.0 | 0.0 | 0.0 |
2 | hab1 | 2015-01-01 09:02:00 | 25.5 | 1019 | 41 | 0.0 | 0.0 | 0.0 |