Understanding Database Concurrency Problems with Simple Examples
See how everyday transactions in the real world illustrate the critical database problems.
Imagine a busy library where multiple librarians are updating the same catalog system simultaneously. Without proper coordination, chaos would ensue! This is exactly what happens in databases when multiple transactions run at the same time without proper controls.
In this article, we’ll explore common concurrency problems in database systems using simple, real-world examples that anyone can understand.
What is Concurrency in Databases?
Concurrency refers to multiple transactions accessing and modifying data simultaneously. While this improves efficiency, it can lead to inconsistencies if not properly managed.
Think of a database like a shared document that multiple people are editing at once. Without rules, people might overwrite each other’s changes or see incomplete information.
The Five Common Concurrency Problems
1. Dirty Read (Temporary Update Problem)
Real-world example: Imagine checking your bank account balance online while a transaction is being processed. Your balance shows $500, so you decide to make a $400 purchase. However, the original transaction fails and your actual balance is only $100. You’ve just experienced a dirty read!
-- Transaction 1: Transfer $400 to savings (might fail)
BEGIN;
UPDATE accounts SET balance = balance - 400 WHERE account_id = 'checking';
-- Transaction fails before completion
-- Transaction 2: Read the temporary, uncommitted value
SELECT balance FROM accounts WHERE account_id = 'checking';
-- Shows $500 (incorrect temporary value) instead of $900
2. Incorrect Summary Problem
Real-world example: A store manager runs a sales report while cashiers are still processing the day’s final transactions. The report shows $9,000 in sales, but actually counts some transactions before updates and others after, resulting in an incorrect total.
-- Transaction 1: Update sales records
BEGIN;
UPDATE sales SET amount = 100 WHERE id = 1;
-- Some delay
UPDATE sales SET amount = 200 WHERE id = 2;
COMMIT;
-- Transaction 2: Calculate total (runs during Transaction 1)
SELECT SUM(amount) FROM sales;
-- Might include the updated amount for id=1 but old amount for id=2
3. Lost Update Problem
Real-world example: Two customer service representatives update a customer’s address at the same time. Rep A changes it to “123 Main St”, while Rep B changes it to “456 Oak Ave”. If Rep A’s change completes after Rep B’s, the final address will be “123 Main St” and Rep B’s update is completely lost.
-- Transaction 1
BEGIN;
SELECT address FROM customers WHERE id = 100; -- Gets "789 Pine St"
-- Rep decides to update to "123 Main St"
-- Some delay happens here
UPDATE customers SET address = '123 Main St' WHERE id = 100;
COMMIT;
-- Transaction 2 (runs concurrently)
BEGIN;
SELECT address FROM customers WHERE id = 100; -- Also gets "789 Pine St"
-- Rep decides to update to "456 Oak Ave"
UPDATE customers SET address = '456 Oak Ave' WHERE id = 100;
COMMIT;
-- Final address will be "123 Main St" (Rep B's update is lost)
4. Unrepeatable Read Problem
Real-world example: You’re booking a flight and check the price ($300). When you decide to purchase 5 minutes later, the price has suddenly changed to $400. The same query returned different results within your booking session.
-- Transaction 1: Check flight price
BEGIN;
SELECT price FROM flights WHERE flight_id = 'FL789'; -- Shows $300
-- Some time passes
-- Meanwhile, another transaction updates the price
UPDATE flights SET price = 400 WHERE flight_id = 'FL789';
-- Back to Transaction 1: Check price again
SELECT price FROM flights WHERE flight_id = 'FL789'; -- Now shows $400
COMMIT;
5. Phantom Read Problem
Real-world example: A housing agent searches for available properties in a neighborhood and finds 10 listings. When preparing the client paperwork a minute later, the agent runs the same search and finds 12 listings because another agent just added new properties.
-- Transaction 1: Count available properties
BEGIN;
SELECT COUNT(*) FROM properties WHERE status = 'available'; -- Returns 10
-- Some time passes
-- Meanwhile, another transaction adds new properties
INSERT INTO properties (property_id, status) VALUES ('P101', 'available');
INSERT INTO properties (property_id, status) VALUES ('P102', 'available');
-- Back to Transaction 1: Count available properties again
SELECT COUNT(*) FROM properties WHERE status = 'available'; -- Now returns 12
COMMIT;
Solutions to Concurrency Problems
1. Transaction Isolation Levels
Databases provide different isolation levels to control how transactions interact:
-- Set isolation level for a transaction
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Your transaction operations here
COMMIT;
The main isolation levels (from least to most strict):
- Read Uncommitted: Allows dirty reads (least isolation)
- Read Committed: Prevents dirty reads
- Repeatable Read: Prevents dirty and non-repeatable reads
- Serializable: Prevents all concurrency problems (most isolation)
2. Locking Mechanisms
Databases use locks to control access to data:
-- Explicit row lock example
BEGIN;
SELECT * FROM products WHERE product_id = 123 FOR UPDATE;
-- Now other transactions cannot update this row until this transaction completes
UPDATE products SET stock = stock - 1 WHERE product_id = 123;
COMMIT;
Types of locks:
- Shared (Read) Locks: Multiple transactions can read data simultaneously
- Exclusive (Write) Locks: Only one transaction can modify data at a time
3. Optimistic Concurrency Control
Instead of locking, we can check if data has changed before updating:
BEGIN;
SELECT price, version FROM products WHERE id = 456;
-- User decides to update price from $10 to $12
-- Check if data changed before our update
UPDATE products
SET price = 12, version = version + 1
WHERE id = 456 AND version = 1;
-- If 0 rows updated, someone else modified the data first
COMMIT;
Which Solution Should You Use?
The right approach depends on your specific needs:
- High throughput, can tolerate some inconsistency: Lower isolation levels
- Financial applications, accuracy critical: Serializable isolation
- Somewhere in between: Consider row-level locking or optimistic concurrency
Conclusion
Understanding concurrency problems is essential for building reliable database applications. By recognizing these issues and applying appropriate solutions, you can ensure your system maintains data consistency even under heavy concurrent usage.
Remember the golden rule: the stricter the concurrency control, the more consistent your data will be — but potentially with lower performance. Always balance these trade-offs based on your application’s specific requirements.
If you’re interested in diving deeper into system design and backend development, be sure to follow me for more insights, tips, and practical examples. Together, we can explore the intricacies of creating efficient systems, optimizing database performance, and mastering the tools that drive modern applications. Join me on this journey to enhance your skills and stay updated on the latest trends in the tech world! 🚀
Read the system design in bahasa on iniakunhuda.com