Kedare's Technical Blog

Exploring the latest in backend development, DevOps and more.

Introduction

In this article we are going to see how to use the Keycloak IDP as identity broker.

Keycloak is one of the major opensource identity provider in the world.

It is used in many large institutions, for example France Connect, which if the official french identity system for public services is built on top of it.

Initial setup

We are going to start from a simple Angular SPA template with ASP.NET API (it could be the react template also, we are not going to do any changes on the Frontend code).

Start by creating your application (no need if you already have one you want to use with this)

dotnet new angular

It should have most of the code needed to work on this project.

You can already make sure it is working fine by running it and making sure that you can properly access the local website, create a local user, connect and browse protected pages (the Fetch Data one for example).

Once you made sure everything is working fine, we can start by the Keycloak setup (if you have one already, skip this step)

Keycloak initial setup

You will need to download keycloak from the official website (it can be also used with docker, but I did not test it), and run it directly.

wget https://github.com/keycloak/keycloak/releases/download/22.0.1/keycloak-22.0.1.tar.gz
tar xvf keycloak-22.0.1.tar.gz
cd keycloak-22.0.1.tar.gz
./bin/kc.sh start-dev

This is not a production setup, to make your keycloak useable in production, please follow the official documentation

You can now connect to the local interface and initialize it by browsing to The Keycloak root page

You will be asked to configure an initial user on the master realm, which is the administrative realm that allows you to configure Keycloak and all the other realms.

Once you have created your initial user and connected to the administration interface, you would need to create a new realm for your applications.

Let's go to the expanding menu on the top left, select Create Realm and create a new one (external for example), you should now see your 2 realms in the menu, make sure you now select the external one.

Realms list

Make sure you have at least 1 user you can use for this article, check the Users tab on the left, create one if you don't have any (just put an Username, email, and after saving it, set its password in the Credentials tab)

OIDC configuration

Now are are going to configure the OpenID Connect protocol between our ASP.NET application and our Keycloak setup.

The good thing with OIDC is that is provices more things than oAuth2 out of the boxes, you will have less things to configure than if you were doing oAuth2 (like to get the user profile, etc...)

First thing, we are going to create the client on the Keycloak server.

The Client is not the user but the application that will request the authentication and get the final token from the provider, in OIDC, the user is usually called a Resource Owner

To create the client, go to Clients on the left menu, then Create client on the client list (you already have a few clients used internally by Keycloak and the various internal apps, you can ignore them).

Make sure the Client type is set to OpenID Connect, then you can fill the form with your data, for example:

Form Field Form value
Client ID my-aspnet-app
Name My ASP.NET App

Leave the rest by default, then click Next

Here you get more important fields to configure

Form Field Form value
Client Authentication On
Authorization On
Authentication flow Standard flow

You may need more authentication flows enabled depending on the use case, but for what is covered by this article, this is the only flow needed.

Then on the last configuration page, you would have a few urls to configure.

Here we assume the root URL of our ASP.NET application is https://localhost:44418/

Form Field Form value
Root URL https://localhost:44418/
Home URL https://localhost:44418/
Valid redirect URIs https://localhost:44418/.well-known/openid-callback
Web origins https://localhost:44418/*

We are using the https://localhost:44418/.well-known/openid-callback url instead of for example https://localhost:44418/oidc/callback, as I could not find an easy way to configure the ASP.NET SPA proxy to force this URL to be forwarded to the backend and not the frontend. (if anyone has an idea)

Then click Save

The configuration of Keycloak is now done, you will just need to get the Client secret from the Credentials tab to use it in the ASP.NET configuration after.

ASP.NET Identity configuration

Now let's go back to our application code that was working fine.

We need update the Program.cs file that contains the service configuration.

Locate the part of the the code related to your identity configuration and make sure it matches the configuration required for Keycloak, you normally only have to add the AddOpenIdConnect part.

builder.Services.AddAuthentication()
    .AddIdentityServerJwt()
    .AddOpenIdConnect("keycloak", "Keycloak", options =>
    {
        options.CallbackPath = "/.well-known/openid-callback";
        options.Authority = "http://localhost:8080/realms/external"; // Use https in production
        options.ClientId = "my-aspnet-app";
        options.ClientSecret = "Jmqezm95ZG3JJCVRxypZBafst5VjSivK";
        options.ResponseType = "code";
        options.GetClaimsFromUserInfoEndpoint = true;
        options.RequireHttpsMetadata = false; // Don't use this for production
    });

This is all you need to do, now compile the application, run is, and you should now see a new option on the login page:

ASP.NET identity login page with Keycloak option

You can click on the Keycloak button to initiate the login with the user you previously created.

Once you are logged in, you should now be back to the ASP.NET application, you may have a confirmation of your email before creating the account, then once everything has been validated, you now see your email in the top menubar on the right.

You can now test the authentication by accesing a page that requires an authenticated endpoint like the "Fetch Data" tab of the app.

There's more

This is the only thing we will cover in this article, as an introduction to the usage of Keycloak with ASP.NET Identity.

Do not use the keycloak setup like this in production as this not a reliable nor secure setup for something in production :)

There are a lot more features available on both Keycloak and on the OpenID protocol if you are more curious.

One advantage of Keycloak is that you can use it as identity broker, meaning that you can configure keycloak so it would allow you to connect using external identity providers.

This is useful if for example you can to keep your appication code simple or if you have a lot of different applications but don't want to integrate each of them with all the external providers.

You can for example integrate Keycloak with "Sign in with Apple" to provide seamless SSO automatically to allow all your applications (both web or native apps) to take advantage of it via Keycloak.

This also allows you to have multiple social network associated with a single Keycloak account that your applications will use, if you have more apps to integrated in the future, just connect them to Keycloak and this will automatically give you access to all the identity providers connected to it (same if you want to just add a new identity providers to all your existing applications)

I recently decided to get a new laptop as my 2013's Macbook pro was starting to fail from all sides (broken speaker, randomly working screen backlight, getting really slow), I wanted the following features for my new laptop :

  • Good battery life: Because it's a laptop.
  • Touch screen: Because for some workflows it's much better than the mouse, having a 2-in-1 is also a big plus but optional.
  • Small'ish screen with a good resolution: I don't like bulkly laptops, but I want a good screen on it (understand high DPI)
  • Robust hardware: Because I want it to last for a few years.
  • Developer friendly: Because I'm mostly using it for development and SRE stuff.

Those are the tasks I want to be able to do on my laptop :

  • Web browsing
  • Media playback (Netflix, Youtube, Spotify, etc...)
  • Web / Go / Python development
  • SRE tools (Remote SSH, kubectl, etc..)
  • SDR (Software Defined Radio)

I considered the following options :

  • Getting a new Macbook: I don't like their unreliable keyboard, and I didn't really like the last versions of MacOs (Much less reliable and more buggy than a few years ago), also they are quite expensive, and this would mean no touch-screen.
  • Getting a new ThinkPad: That was the best other option, Linux first citizen, ok to good battery life (As Windows is still much more optimized than Linux on that because of better hardware support), but I didn't really find anything good on my budget.
  • Getting an old ThinkPad: I really liked the x230 in the past, but today it's quite hard to find good batteries for them, and the display is really bad for today's standards and the trackpad is really obsolete.
  • Getting a Chromebook: This was the last option, I was not completely sure at the beginning because I didn't know much about Chromebooks and ChromeOS and there is the cliché that you can only use Chromebook to run web apps and nothing local.

So I started to check for Chromebooks, the best option I found was the Asus Chromebook C434, I wanted at least 8GB of memory, a good and robust design, an backlight keyboard, so this one is the perfect match.

Asus C434

So I decided to order it, from the US as it's a mess in France to get a QWERTY keyboard, I got very surprised by the build quality once received it, it feels very solid (Even more than the recent Macbooks).

ChromeOS

ChromeOS Architecture

ChromeOS itself is a Linux distribution but you will not get access to the root account or even to a full shell (Except if you go to developer mode but that will basically shut most of the security systems and that's not what we want).

You basically have 3 ways of running applications on ChromeOS:

  • ChromeOS Web applications, you install them directly in ChromeOS, those are basically SPA or Chrome extensions (with a few more things when running in ChromeOS)
  • Android applications: You can install any Android application on your ChromeOS, I use this for Google Keep or Spotify for example.
  • Linux applications: You have a Linux VM available in Crostini that allows you to have full access and install what you want (I talk about it now)

As ChromeOS top priority is security, each of those subsystem is sandboxed from each others, it's even more true for the Linux mode as you get 2 levels of isolation (LXD and KVM)

In this article I will talk about the the third one as this is the one interesting in our case.

First, let's pick a quick text from the Google documentation:

Crostini is the umbrella term for making Linux application support easy to use and integrating well with Chrome OS. It largely focuses on getting you a Terminal with a container with easy access to installing whatever developer-focused tools you might want. It's the default first-party experience.

The Terminal app is the first entry point to that environment. It‘s basically just crosh. It takes care of kicking off everything else in the system that you’ll interact with.

crosvm is a custom virtual machine monitor that takes care of managing KVM, the guest VM, and facilitating the low-level (virtio-based) communication.

Termina is a VM image with a stripped-down Chrome OS linux kernel and userland tools. Its only goal is to boot up as quickly as possible and start running containers. Many of the programs/tools are custom here. In hindsight, we might not have named it one letter off from “Terminal”, but so it goes.

Maitred is our init and service/container manager inside of the VM, and is responsible for communicating with concierge (which runs outside of the VM). Concierge sends it requests and Maitred is responsible for carrying those out.

Garcon runs inside the container and provides integration with Concierge/Chrome for more convenient/natural behavior. For example, if the container wants to open a URL, Garcon takes care of plumbing that request back out.

Sommelier is a Wayland proxy compositor that runs inside the container. Sommelier provides seamless forwarding of contents, input events, clipboard data, etc... between applications inside the container and Chrome. Chrome does not run an X server or otherwise support the X protocol; it only supports Wayland clients. So Sommelier is also responsible for translating the X protocol inside the container into the Wayland protocol that Chrome can understand.

Source: https://chromium.googlesource.com/chromiumos/docs/+/8c8ac04aed5d45bb6a14605c422dbbd01eeadf15/containers_and_vms.md

So we are already getting a lot of information, and you know what, as all this is part of the Chromium project, everything is open source and available for us to look at : https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools

When you enable the Linux mode (called Crostini) on the ChromeOS setting, ChromeOS will download a Linux image and start it with KVM, this container host is called Termina.

And you also get a Terminal application installed, I would not recommend using it much as it's quite limited. You will quickly see that the installed container is in fact a Debian (9 at this time, but it's easy to upgrade with the usual procedure) and of course you get full root access.

You can install without issue graphical applications like Visual Studio Code or another Terminal application (I use gnome-terminal in my case), and as Google do things correctly, you even have a GTK theme matching Material design installed by default.

However be careful when using some heavy GUI application, the GPU acceleration is still in beta (Like the whole Crostini) and not available everywhere yet.

From now you have your usual Linux shell and can setup everything you want and still have a great battery life.

So now let's say you want something else than Debian, as this is standard LXD, you can use any image coming for example from https://us.images.linuxcontainers.org/

To do so, you need to start crosh that is the native ChromeOS limited shield, to do so, start Google Chrome and use ctrl+alt+t, you will get a terminal tab inside your web browser:

[Pro Tip] Use 'Open as Window' or 'Fullscreen' to prevent Ctrl-W from closing your terminal!
[Pro Tip] See [crosh] for more information.

Welcome to crosh, the Chrome OS developer shell.

If you got here by mistake, don't panic!  Just close this tab and carry on.

Type 'help' for a list of commands.

If you want to customize the look/behavior, you can use the options page.
Load it by using the Ctrl+Shift+P keyboard shortcut.

crosh> 

From there need to jump inside the Termina VM that is the LXD container host and we can already start using the standard lxc CLI and check for our default container running (The one you to go with the Terminal app by default):

crosh> vsh termina
(termina) chronos@localhost ~ $ lxc list
+---------|---------|-----------------------|------|------------|-----------+
|  NAME   |  STATE  |         IPV4          | IPV6 |    TYPE    | SNAPSHOTS |
+---------|---------|-----------------------|------|------------|-----------+
| penguin | RUNNING | 100.115.92.195 (eth0) |      | PERSISTENT | 0         |
+---------|---------|-----------------------|------|------------|-----------+

So one thing you may want to do is being able to interract with LXD without having to go crosh then jumping to termina, good thing, LXD allows remote control, so let's set it up so we can control it from our penguin container, we will need to spawn a temporary ubuntu instance to get a compatible LXC client:

(termina) chronos@localhost ~ $ lxc config set core.https_address :8443
(termina) chronos@localhost ~ $ lxc config set core.trust_password wowsupersecret
(termina) chronos@localhost ~ $ lxc launch images:ubuntu/18.04 ubuntu
(termina) chronos@localhost ~ $ lxc exec ubuntu -- apt install -y lxd-client
(termina) chronos@localhost ~ $ lxc file pull ubuntu/usr/bin/lxc /tmp/lxc
(termina) chronos@localhost ~ $ lxc file push /tmp/lxc penguin/usr/local/bin/lxc
(termina) chronos@localhost ~ $ lxc stop --force ubuntu
(termina) chronos@localhost ~ $ lxc delete ubuntu

Then on your penguin container:

> $ ip -4 route show
default via 100.115.92.193 dev eth0 
100.115.92.192/28 dev eth0 proto kernel scope link src 100.115.92.195

> $ lxc remote add crostini 100.115.92.193
Admin password for crostini:
Client certificate stored at server:  crostini
> $ lxc remote set-default crostini
> $ lxc list                                                                                                                                      
+---------|---------|-----------------------|------|------------|-----------+                                                                     
|  NAME   |  STATE  |         IPV4          | IPV6 |    TYPE    | SNAPSHOTS |                                                                     
+---------|---------|-----------------------|------|------------|-----------+                                                                     
| penguin | RUNNING | 100.115.92.195 (eth0) |      | PERSISTENT | 0         |                                                                     
+---------|---------|-----------------------|------|------------|-----------+

There you should be able to control everything from your penguin container.

So far I didn't had the need for another VM than the basic one, I do my full development workflow inside the penguin VM, but I understand than sometime you may want something different than Debian.

You need to know that by default you won't have the same level of integration with others containers, like automatic integration of the desktop apps (They will appear in ChromeOS once installed like if it was a Gnome desktop) and shared folders between ChromeOS and your container, most of those features are available after some configuration on others containers.

So far everything is working fine, I do see some issues sometimes, with Jetbrains IDE typically, I get some random freezes on the whole ChromeOS, that should not happen and I reported the issue to the Chromium OS issue tracker.

Also as you know Chromebook are fanless and low-power computers, you may expect thermal throttling during intense compute (So if you plan to use Rust or Java... you'll understand).

The fact that for now you can't expose directly sockets to the outside world (Just to the ChromeOS host) can be problematic.

The USB limitation also make impossible the use of ChromeOS for SDR (Software Defined Radio) as the container will not see connected USB devices. (EDIT: See at the end of the article, there is a flag that allow to expose USB devices to the Crostini containers, but I could not make it work with everything)

Except that, everything is working perfectly fine, ChromeOS may finally be the Linux coming to the desktop reason ? And having this separated environment make mostly impossible to crash your whole computer because developers like to tweak everything. (And here you get snapshots for free on your containers by the way)

So at the end, about ChromeOS for developers/SRE :

  • The design is secure by default, it's a good thing as it's very hard to break into (Don't enable development mode) but can be limiting for some specific cases (see after)
  • Not having raw USB access can be an issue, no way to connect specific external devices like usb-to-serial adapters or SDR systems (EDIT: There is a flag (#crostini-usb-allow-unsupported) that allows to connect any USB device to your Linux environment, I confirm that after testing it, I could use my SDR HackRF in the Linux environment like on a native Linux, but no luck with my USB to serial adapter so far (PL2303 chipset))
  • Your containers are isolated from the network, you don't have bridged network to your interfaces and can't expose sockets to the outside
  • Being able to run Android applications is cool
  • Battery life is excellent (10 hours)
  • Linux GUI applications work fine, but be careful as there are no GPU acceleration on those

But let's remember that Crostini is still in beta, so hopefully most of those limitations will be fixed (in a secure way)

I hope you liked this article, feel free to comment, hopefully I will post more in the future about ChromeOS.

There you have some related links :

Here is one of the most complicated and debated choice in the open source world. Selecting your relational database system.

I will speak here only of MySQL and PostgreSQL as they are the databases systems.

Raw Data

Let's start with some raw data comparison before going into details (Let's put SQL Server too here for comparison):

Information MySQL PostgreSQL Microsoft SQL Server
Licence GNU GPL or proprietary PostgreSQL Proprietary
Year of creation 1995 1996 (based on Ingres, 1982) 1989
Main contributors Oracle,
Google,
Percona,
Facebook
EnterpriseDB,
2ndQuadrant
Microsoft
Projects/Companies using it Facebook,
Github,
LinkedIn,
Flickr,
Wikipedia,
Twitter,
Digg
Openstreetmap,
Disqus,
Yahoo,
Reddit,
Skype,
Github
Microsoft,
StackOverflow
Slogan The most popular open source database The world's most advanced open source database None ?
Last version 5.7 9.6 2016

Take in considerations that projects and companies using them are not exclusive, for example, LinkedIn also use Apache Cassandra and probably more :)

Clichés and misconceptions

http://www.myfrenchlife.org/2013/11/28/guilty-french-cliches-stereotypes/

Credits to myfrenchlife.org

Something to know if that both MySQL and PostgreSQL have their clichés, I will try to explain them and tell you if they are true or not.

PostgreSQL

PostgreSQL is slow !

WRONG !

This was true in the early versions of PostgreSQL and even before, but since Postgresql 8.0, there has been a LOT of improvements in term of speed. PostgreSQL has similar performances with MySQL.

PostgreSQL is complicated !

WRONG !

This is maybe one of the things you will see the most in the forums around (Especially from MySQL fanboys), PostgreSQL is not more complicated than MySQL. What will usually scare people is that you will not have all those nonstandard SHOW commands that make your life easier. In PostgreSQL, we prefer to use standard SQL queries on tables, or views, to have the equivalent, or there are some meta commands available in the psql shell, and for many things, PostgreSQL is actually far easier than MySQL for everyday life (excluding specific stuff like advanced replication with failover or PITR), here is a simple example, wanna know the size of a database ?

postgres=# \l+
	                                                                    List of databases
	           Name           |  Owner  | Encoding | Collate | Ctype | Access privileges |  Size   | Tablespace |                Description
	--------------------------+---------+----------+---------+-------+-------------------+---------+------------+--------------------------------------------
	 django                   | django  | UTF8     | C       | C     |                   | 9513 kB | pg_default |
	 postgres                 | kedare  | UTF8     | C       | C     |                   | 7151 kB | pg_default | default administrative connection database
	 spring                   | spring  | UTF8     | C       | C     |                   | 7065 kB | pg_default |
	 template0                | kedare  | UTF8     | C       | C     | =c/kedare        +| 7041 kB | pg_default | unmodifiable empty database
	                          |         |          |         |       | kedare=CTc/kedare |         |            |
	 template1                | kedare  | UTF8     | C       | C     | =c/kedare        +| 7041 kB | pg_default | default template for new databases
	                          |         |          |         |       | kedare=CTc/kedare |         |            |
	(7 rows)

vs MySQL:

mysql> SELECT table_schema "Data Base Name",
    -> sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB",
    -> sum( data_free )/ 1024 / 1024 "Free Space in MB"
    -> FROM information_schema.TABLES
    -> GROUP BY table_schema ;
+--------------------+----------------------+------------------+
| Data Base Name     | Data Base Size in MB | Free Space in MB |
+--------------------+----------------------+------------------+
| information_schema |           0.15625000 |      80.00000000 |
| mysql              |           2.42471313 |       4.00000000 |
| performance_schema |           0.00000000 |       0.00000000 |
| sys                |           0.01562500 |       0.00000000 |
+--------------------+----------------------+------------------+
4 rows in set (0.07 sec)

This is of course, without installing any tool others than the bundled ones, of course, you can do a similar query with PostgreSQL, actually this is what this meta command does in the background, you can see it on the Postgres process, if for example you have it running in a terminal with the -d3 option (What I do for development) :

SELECT d.datname as "Name",
       pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
       pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
       d.datcollate as "Collate",
       d.datctype as "Ctype",
       pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges",
       CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
            THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
            ELSE 'No Access'
       END as "Size",
       t.spcname as "Tablespace",
       pg_catalog.shobj_description(d.oid, 'pg_database') as "Description"
FROM pg_catalog.pg_database d
  JOIN pg_catalog.pg_tablespace t on d.dattablespace = t.oid
ORDER BY 1;

Another example for you, wanna know what is the configuration file used by the server process? PostgreSQL :

postgres=# SHOW config_file;
                         config_file
--------------------------------------------------------------
 /Users/kedare/Documents/Databases/Postgresql/postgresql.conf
(1 row)

MySQL, Well... You know what? There is no way to know what is the loaded configuration file on your server instance. Well if you are playful you can restart your process with an attached strace, you may have a list of the default configuration files that will be loaded when your server process when running it with mysqld --verbose --help but nothing about the actually loaded file.

PostgreSQL sucks at scaling out

Kind of

Well... This is I think the worst point about PostgreSQL.

If you want a simple master-slave replication, it can be done in a few minutes (Thanks, pg_basebackup -R). If you want something similar to SQL Server Database Mirroring, meaning a master-slave replication with automatic failover and master election. Well, good luck :)

There are some ways to do this, I had the occasion to have (Well to try to setup this) using Postgresql 9.6, PGPool2 and REPMGR and this has been hell, after many days trying to have it working, I gave up.

There are other solutions that would require more knowledge that I didn't try like Stolon that makes this kind of stuff more or less automatic but require knowledge of Kubernetes. In my case, the deployment was for production systems and I don't know a lot about Kubernetes so that was a big no (But Stolon itself looks great).

So for this point, I would say, it really depends on of what you are looking for and of the time you wanna spend setting it up.

MySQL

MySQL is not ACID compliant

WRONG!

This has been true like 15 years ago at the MyISAM era. Today, everyone should use InnoDB or any equivalent storage engine that is completely ACID compliant.

MySQL doesn't care about your data

WRONG!

MySQL cares about your data as much as you tell him to, you want it to take care of your data? use STRICT mode and InnoDB. Fortunately, if you are using a recent version of MySQL, this should be your default.

MySQL doesn't have hot backups

WRONG!

Again, this is from the MyISAM era, but you will still see a lot of people that will think that MySQL cannot do hot backup without any locking. OF COURSE IT CAN. Thanks transactions and InnoDB.

Presenting themselves

Let's check some of the "official" presentations of both of them.

MySQL Community Edition

MySQL is a database management system.

A database is a structured collection of data. It may be anything from a simple shopping list to a picture gallery or the vast amounts of information in a corporate network. To add, access, and process data stored in a computer database, you need a database management system such as MySQL Server. Since computers are very good at handling large amounts of data, database management systems play a central role in computing, as standalone utilities, or as parts of other applications.

Obviously.

MySQL databases are relational.

A relational database stores data in separate tables rather than putting all the data in one big storeroom. The database structures are organized into physical files optimized for speed. The logical model, with objects such as databases, tables, views, rows, and columns, offers a flexible programming environment. You set up rules governing the relationships between different data fields, such as one-to-one, one-to-many, unique, required or optional, and “pointers” between different tables. The database enforces these rules, so that with a well-designed database, your application never sees inconsistent, duplicate, orphan, out-of-date, or missing data.

The SQL part of “MySQL” stands for “Structured Query Language”. SQL is the most common standardized language used to access databases. Depending on your programming environment, you might enter SQL directly (for example, to generate reports), embed SQL statements into code written in another language, or use a language-specific API that hides the SQL syntax.

SQL is defined by the ANSI/ISO SQL Standard. The SQL standard has been evolving since 1986 and several versions exist. In this manual, “SQL-92” refers to the standard released in 1992, “SQL:1999” refers to the standard released in 1999, and “SQL:2003” refers to the current version of the standard. We use the phrase “the SQL standard” to mean the current version of the SQL Standard at any time.

Yes, MySQL does use MySQL but it's not the most standard SQL you could find (PostgreSQL and MSSQL are both better on this point)

MySQL software is Open Source.

Open Source means that it is possible for anyone to use and modify the software. Anybody can download the MySQL software from the Internet and use it without paying anything. If you wish, you may study the source code and change it to suit your needs. The MySQL software uses the GPL (GNU General Public License), http://www.fsf.org/licenses/, to define what you may and may not do with the software in different situations. If you feel uncomfortable with the GPL or need to embed MySQL code into a commercial application, you can buy a commercially licensed version from us. See the MySQL Licensing Overview for more information (http://www.mysql.com/company/legal/licensing/).

Yes it is open source, but not completely, MySQL both has the community edition, and many other proprietary editions that are closed source and paid, so Oracle being the "owner" of MySQL (Well, of the brand at least), they will have some features not included in the open source edition, for example a thread pool that would increase the performance, or some backup tools (that have open source alternative), or the MySQL Enterprise Firewall that would allow to blacklist/whitelist query patterns.

PostgreSQL

PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

We will talk later about the "object-relational database system" thing.

Technical overview

Let's talk more technical.

Here are the questions you need to ask you when you have to choose between those database systems.

Do you need complex data types ? (Object database or GIS)

This is one of the big things of PostgreSQL, it's not just an RDBMS that thinks of tables and simple datatypes, it's an ORDBMS where you have advanced types, here are some examples:

Network related types

This may be useful to you if you are from the telecom world, PostgreSQL support natively IPv4, IPv6 addresses, subnets, and MAC address, and thanks for the object oriented database, PostgreSQL has built-in operators for those that would allow you for example to do this kind of things:

postgres=# SELECT '2001:4f8:3:ba::32'::inet <<= '2001:4f8:3:ba::/64'::inet;
 ?column?
----------
 t
(1 row)

postgres=# SELECT '192.168.1.42'::inet <<= '192.168.1.0/24'::inet;
 ?column?
----------
 t
(1 row)

Let's explain those queries.

::inet, the :: tells PostgreSQL to do casting from on the value (character varying) to the inet type that represents either an IP or a subnet. As everything is open in PostgreSQL I can, for example, open PgAdmin and explores the pg_catalog catalog (catalogs are like namespaces, more or less like a database in the database).

When checking the catalog, I can see the type definitions and the operators (As the inet type is an internal one, there is no much to see as it's basically coded in C in the PostgreSQL core), but here is some example of things you could see.

Let's say I want to add a < operator for the point data type and the = that will call the internal point_eq(point, point) function.

I would first create the function used by this operator and the operators :

create operator = (leftarg = point, rightarg = point, procedure = point_eq, commutator = =);

create function point_lt(point, point)
returns boolean language sql immutable as $$
    select $1[0] < $2[0] or $1[0] = $2[0] and $1[1] < $2[1]
$$;

create operator < (leftarg = point, rightarg = point, procedure = point_lt, commutator = >);

Then now, I want to make those operators useable with the indexes, I have to create an operator class, but before, the operator class needs a function that would allow the database engine to compare 2 data and sort them:

create function btpointcmp(point, point)
returns integer language sql immutable as $$
    select case 
        when $1 = $2 then 0
        when $1 < $2 then -1
        else 1
    end
$$;

Then the operator class :

create operator class point_ops
    default for type point using btree as
        operator 1 <,
        operator 2 <=,
        operator 3 =,
        operator 4 >=,
        operator 5 >,
        function 1 btpointcmp(point, point);

Examples from this StackOverflow post

Of course, you can have far more complex operators as you can define more or less any operators, and also have support for negation, etc.

Something to know about this, is that for any table you create, PostgreSQL create an equivalent type, here is a small demo :

test=# create table inet_allocation(id serial, network inet unique, description character varying);
CREATE TABLE
test=# select ((0, '192.168.0.0/24'::inet, 'Main office network')::inet_allocation).description;
     description
---------------------
 Main office network
(1 row)

Then you can start using this type anywhere in your database (So basically you have objects with operators and attributes, also useable in your stored procedures).

Unfortunately, MySQL doesn't offer anything for custom data types, operators overloading or object orientation. So if you plan to use those features (You will probably not if you are using an ORM), PostgreSQL is a good candidate.

About GIS, PostgreSQL is probably the most advanced database (of all), I'm not an expert in GIS, but I found many articles saying so. Also, OpenStreetMap us using it (was using MySQL before).

MySQL has a limited GIS support (Geometrical only, no geography, limited to SRID 0 "Infinite flat Cartesian plane with no units assigned to its axes." )

Do you need advanced JSON features ?

Here is another domain where PostgreSQL shines, sometimes called "Better MongoDB than MongoDB". PostgreSQL has an excellent JSON support.

You can store JSON-like on most of the databases, but you can also put index and constraints on your JSON fields, and thanks again to the object system, there are operators to do most of what you need.

For this part, I will redirect you to this article that has all the example I would like to show you: https://www.compose.com/articles/is-postgresql-your-next-json-database/

MySQL has a quite limited feature set on JSON, yes it has JSON support but it's FAR from being sexy like on PostgreSQL.

You can forget indexes or constraints like on PostgreSQL, you will have to rely on generated columns, that is far less... interesting. And you need a very recent version of MySQL, at least 5.7.6, here is what is looks like:

mysql> CREATE TABLE jemp (
    ->     c JSON,
    ->     g INT GENERATED ALWAYS AS (c->"$.id"),
    ->     INDEX i (g)
    -> );
Query OK, 0 rows affected (0.28 sec)

mysql> INSERT INTO jemp (c) VALUES
     >   ('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
     >   ('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');
Query OK, 4 rows affected (0.04 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT c->>"$.name" AS name
     >     FROM jemp WHERE g > 2;
+--------+
| name   |
+--------+
| Barney |
| Betty  |
+--------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT c->>"$.name" AS name
     >    FROM jemp WHERE g > 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: jemp
   partitions: NULL
         type: range
possible_keys: i
          key: i
      key_len: 5
          ref: NULL
         rows: 2
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select json_unquote(json_extract(`test`.`jemp`.`c`,'$.name'))
AS `name` from `test`.`jemp` where (`test`.`jemp`.`g` > 2)
1 row in set (0.00 sec)

Let's hope that MySQL will get something better in the future. (Most of the limitations are related to the leak of having an extensible typing system like PostgreSQL).

Do you need CTE ?

I'll be short here, MySQL doesn't have CTE support (yet).

This is a planned feature for MySQL 8.0.

Do you need advanced indexes ?

Here are some kind of indexes supported by PostgreSQL and not MySQL (yet?)

Do you plan to scale out easily ?

Here is where MySQL shines, especially in the last versions, you have 2 official solutions to reach a real H.A, both are really easy to setup.

MySQL Group Replication

Ever hear of MongoDB replication set? Here you will have something similar, a master-slave replication system, managed automatically, new members can join without any action, and failover is managed automatically using quorum.

https://dev.mysql.com/doc/refman/5.7/en/group-replication.html

Here is a typical architecture with MySQL Group Replication :

MySQL Group Replication Schematics

MySQL Cluster

Here is another H.A system for MySQL, this one is master-master

It doesn't work like a standard MySQL, and it required more nodes than the group replication, basically you will have NDB nodes that will host the data itself (and it needs to fit in memory), the SQL nodes that will basically be MySQL instances querying your NDB tables from the NDB hosts and then you have others nodes used for management.

This mode has limitations and will probably required some changes from a standard MySQL application, so before starting deploying it, READ the documentation :

https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster.html

Here is the typical architecture with MySQL Cluster

MySQL Cluster Schematics

If you want to play with MySQL cluster, I invite you to read this article so you can get your cluster ready in a few minutes : http://mikaelronstrom.blogspot.com.es/2017/01/mysql-cluster-up-and-running-in-less.html

PostgreSQL

PostgreSQL doesn't have any official solution for this kind of high availability, the PostgreSQL core offers replication, but nothing that would allow automatic failover.

There are some projects trying to achieve high availability, I can't talk much about them as I either just read about them (Postgres-XL) or played a very little bit (Stolon over Kubernetes).

PostgreSQL Stolon

https://github.com/sorintlab/stolon

stolon is a cloud native PostgreSQL manager for PostgreSQL high availability. It's cloud native because it'll let you keep an high available PostgreSQL inside your containers (kubernetes integration) but also on every other kind of infrastructure (cloud IaaS, old style infrastructures etc...)

PostgreSQL Stolon Components

It can be deployed in a few minutes in a Kubernetes cluster, from my test it was working quite fine, but I didn't have the required knowledge to put a Kubernetes cluster in production, so I didn't explore more yet this solution as I preferred to learn Kubernetes first (Not just for this but in general).

Postgres-XL

http://www.postgres-xl.org/

Postgres-XL is a horizontally scalable open source SQL database cluster, flexible enough to handle varying database workloads:

  • OLTP write-intensive workloads
  • Business Intelligence requiring MPP parallelism
  • Operational data store
  • Key-value store
  • GIS Geospatial
  • Mixed-workload environments
  • Multi-tenant provider hosted environments

Postgres-XL Components

This is a fork of PostgreSQL made for scaling-out. I didn't have the occasion to experiment with it, as I think, from what I've read that it is probably overkill for most of the deployments.

Also, check the What's Coming section as some features will be merged to PostgreSQL.

Differences in the permission system between MySQL and PostgreSQL

Something interesting to know is the difference on the permission management and ownership system of MySQL and PostgreSQL.

In PostgreSQL, everything has an owner, even a database. When you create the database, you have to specify an owner that will have full right on it (the connected user by default).

In MySQL, there is no concept of owner. You just have to specify the path to the database or tables when you GRANT a permission.

For me, the PostgreSQL way is better. As this allow simple scenarios like allow a user to create databases and gives it full right on them, to do the equivalent in MySQL you have to basically give the grant to the user to databases starting with a common prefix, like this:

GRANT ALL PRIVILEGES ON 'testuser\_%'.* TO 'testuser'@'%';

In PostgreSQL, just do :

ALTER USER testuser WITH CREATEDB

What's coming ?

PostgreSQL 10

Here is a summary of the information I found on the coming features and improvements on PostgreSQL 10:

MySQL 8

  • InnoDB System Catalog: No more MyISAM based MySQL.* schema. NO MORE .FRM
  • Roles: You can now define roles for the user's accounts and inherit permissions or parameters.
  • InnoDB temporary tables are now on a separated tablespace
  • InnoDB tablespace encryption
  • Invisible index: Allows to make indexes invisible, so you can disable and enable them without having to overhead of the index deletion and creation

Summary

Here is a little summary of the go and no-go for MySQL and PostgreSQL :

Subject MySQL PostgreSQL
I want easy H.A. YES NO
I want easy PITR YES NO
I want good management GUI's YES NO
I want GIS. NO YES
I want commercial support YES YES
I'm an open source extremist NO YES
I want transactional DDL. NO YES
I want integration with LDAP. NO YES
I want to write advanced procedures. NO YES
I want functional indexes. NO YES
I want partitioning. YES NO

Bonus

Cool Projects

PostgreSQL

MySQL

Last words

I hope this article helped you in your decision or with your curiosity.

Feel free to leave comments on this article or in Twitter.