Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
In this article, we’ll analyze the differences between PostgreSQL and MySQL, the benefits they offer, and the problems they can introduce.
Data is all around us: events, dates, thoughts, ideas, and so on (plus the awesome articles we publish on our blog). Nowadays, more and more aspects of our lives have been digitized: the things we buy and the things we enjoy can be quantified, accounted for, and stored in a database. Therefore, databases — vessels for storing these precious assets — are crucial; managing and working with databases is an essential skill for many developers.
The world of relational (i.e. SQL-based) databases is rich with various database solutions. In this article, however, we’ll focus on two of them: PostgreSQL and MySQL. Although the topic of this article follows the “Technology A vs. Technology B” structure, we’ll try and stay objective. Without declaring that either PostgreSQL or MySQL reign supreme, we’ll instead analyze their differences, benefits they offer, and the problems they can introduce. After this article, do make sure to check our SQL interview questions out! 🙂
Craig Kerstiens, head of the Cloud team at a Microsoft-owned company, provides a great write-up on the reasons why SQL is such an essential skill for almost every developer. The article is titled SQL: One of the Most Valuable Skills and details the language’s importance in three key points:
Database architecture is always easy to criticize because companies are usually reluctant to change it. Just like many websites are still running outdated technologies like JQuery, many companies are still using outdated MySQL versions. This problem has been highlighted by an SQL expert Markus Winand who often starts his conference speeches with a (somewhat) humorous question: If you’re still using SQL-92, why aren’t you using Windows 3.1? It was released in 1992 as well!
Our experience tells us that this problem of “database conservativeness” is rooted in the costs. Switching to a different database system (or upgrading to a new version, for that matter) is too costly, so business/users are stuck with older software. Of course, this is a more typical scenario for businesses rather than individual users — the IT departments often cannot explain the benefits of investing X thousand dollars and Y man-hours into upgrading, and so all in-house developers of this company are stuck with, say, MySQL 3.0.
In this environment, it’s easy to miss all the changes that happened to MySQL over the years. A good example is the check
constraints which are ignored in the create table
statement. MySQL has always been criticized for this odd behavior — after all, other relational databases (Db2, Oracle DB, PostgreSQL, SQLite, etc.) don’t have this problem. Here’s the good news: MySQL fixed it! However, it only happened in April 2019, so it’s easy for most developers to end up out of the loop.
Upon acquisition by Oracle, MySQL has improved tremendously. The positive effects of this acquisition is outlined by Markus Winand in his blog post titled One Giant Leap For SQL: MySQL 8.0 Released:
A major new feature of modern MySQL is window functions. Its functionality is crucial because it allows you, among other things, to find the best N rows per group, move averages, and group consecutive events. While other databases embraced window functions a long time ago, MySQL only implemented it in version 8 (released in 2018).
But wait — MySQL’s implementation actually turned out to be the best one! Markus provides a neat visualization of how relational databases adopted these features:
Although MySQL is arguably the most popular database (at least out of free databases), its functionality has been lagging behind. The improvements made by Oracle show that the company is determined to prove that MySQL’s popularity is earned.
The same logic we outlined in the previous section applies to PostgreSQL: 2019 saw a release of PostgreSQL 11. The first feature we should focus on is the over
clause, which determines the rows’ visibility to the given window function. PostgreSQL was the first database to support this clause; in version 11, it added support for the frame unit group
and frame exclusion. As of October 2019, PostgreSQL’s implementation remains the most up-to-date.
Speaking of frame exclusion, let’s see what this feature can offer. We can use it to delete selected rows from the given frame; the frames we delete must be related to the current row. This function provides several options to fine-tune its behavior:
EXCLUDE CURRENT ROW
excludes the row from the frame.EXCLUDE GROUP
removes the current row along with all its peers from the frame.EXCLUDE TIES
removes the peers of the current row, but not the current row itself.The database’s performance is arguably the most crucial factor — as explained in the “Database as a Bottleneck” section below, inefficient queries can stall the performance of the whole project. A simpler answer would be “PostgreSQL performs better, what’s there to discuss?” This would be an oversimplification — and it can be easily countered by the fact that Facebook uses MySQL (albeit a modified version of it called MyRocks) to handle petabytes of data.
To sum it up, the database’s performance doesn’t really depend on the specific database itself — both PostgreSQL and MySQL can handle “heavy data” well. It all comes down to the rather simple optimization practices — be ready to invest some time into implementing them:
In the IT world, one of the key metrics we like to use is performance — how fast does the software run? How well is it optimized? How can it be optimized even further? The drive to make the software perform well often turns into an unhealthy obsession, forcing the developers to compare various technologies solely on the basis of its performance.
Therefore, when discussing a topic like “Python vs. Java”, people ignore all other aspects that make the given programming language worth learning or using. “Python is slow! It’s a subpar choice for web development! Or any computation-heavy development, for that matter! Use C!”
If we take web development as an example, the not-so-obvious truth is this: Python is indeed not the fastest language, but the real bottleneck frequently lies in the database. This is why we often see the following scenario in numerous tech companies: instead of changing, say, their main programming language, they invest lots and lots of resources into switching to a different database. Let’s use Uber as an example: they made the switch in 2016 and explained it in their blog post titled Why Uber Engineering Switched from Postgres to MySQL.
When an article’s called “Technology A vs. Technology B”, it’s usually expected that the authors provide a definitive answer. Is Technology A ultimately better? Does Technology B actually reign supreme? In case of PostgreSQL and MySQL, there is no clear winner — before MySQL 8, we’d probably choose PostgreSQL for its rich functionality. Nowadays, however, MySQL is starting to catch up and even outpace other relational databases in some regards.
Another question you may be asking is “Which one should I learn?” To answer it, let’s look at these databases in perspective:
It’s tempting to try to guess which database will be better in, say, 5 years, but stating “Learn Database B — it’ll outpace Database A in just two years” would simply be fortune-telling. Our advice, therefore, is as follows: pick any of the two, learn it, and have no regrets.