Understanding Database Normalization in PostgreSQL

By | 6 months ago

interviewMYSQLdatabasebackendcareersNormalization PostgreSQLSQLperformance

# Understanding Database Normalization in PostgreSQL

Database normalization is a technique used to design a database schema such that it reduces redundancy and improves data integrity. The process typically involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. This blog post explores the core concepts of normalization in PostgreSQL with practical examples.

What is Normalization?

Normalization involves organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Normalization is typically done in several stages, each corresponding to a specific normal form (NF). The most commonly used normal forms are:

  • **First Normal Form (1NF):** Ensures that the table only contains atomic (indivisible) values and each column contains only a single value.

  • **Second Normal Form (2NF):** Ensures that all attributes in a table depend solely on the table’s primary key.

  • **Third Normal Form (3NF):** Ensures that all the fields in a table are dependent only on the primary key and not on other non-key attributes.

Let’s break these down with examples.

Examples of Normalization

0. Unnormalized Table

Consider a table that records information about sales in a bookstore:

| SaleID | BookTitle | Author | Price | CustomerName |

|--------|-----------------|--------------|-------|--------------|

| 1 | PostgreSQL Intro| A. Author | 20 | John Doe |

| 2 | PostgreSQL Intro| A. Author | 20 | Jane Smith |

| 3 | SQL Queries | B. Writer | 25 | John Doe |

This table is not normalized as data redundancy and repetition of information are evident.

1. First Normal Form (1NF)

Objective: Eliminate duplicate columns from the same table. Create separate tables for each group of related data and identify each row with a unique column (the primary key).

Implementation:

Divide the table into two:

Books:

| BookID | Title | Author | Price |

|--------|-----------------|--------------|-------|

| 1 | PostgreSQL Intro| A. Author | 20 |

| 2 | SQL Queries | B. Writer | 25 |

Sales:

| SaleID | BookID | CustomerName |

|--------|--------|--------------|

| 1 | 1 | John Doe |

| 2 | 1 | Jane Smith |

| 3 | 2 | John Doe |

2. Second Normal Form (2NF)

Objective: Ensure that all non-key attributes are fully functional dependent on the primary key.

Since our **Books** and **Sales** tables are already in 2NF (each column in both tables depends only on the primary key), we don’t need to make further changes for 2NF.

3. Third Normal Form (3NF)

Objective: Remove columns that do not depend directly on the primary key.

Suppose in our **Books** table, the `Price` is always determined by the Author. To move to 3NF, we need to separate this dependency into another table.

Authors:

| AuthorID | Author | StandardPrice |

|----------|--------------|---------------|

| 1 | A. Author | 20 |

| 2 | B. Writer | 25 |

Books (revised):

| BookID | Title | AuthorID |

|--------|------------------|----------|

| 1 | PostgreSQL Intro | 1 |

| 2 | SQL Queries | 2 |

This design ensures that changes in the price of books by a specific author require updates in just one place, thereby enhancing data integrity.

Conclusion

Normalization is a powerful technique for structuring a database efficiently. By following the rules of each normal form, you can minimize redundancy and maximize data integrity in your PostgreSQL database. Remember, normalization often comes at the expense of more complex queries and sometimes slower response times due to the need to join tables. Therefore, understanding the trade-offs between normalization and performance is crucial, especially when designing large-scale databases.