UPSERT in PostgreSQL 9.5
One of the new features of PostgreSQL 9.5 is UPSERT.
UPSERT is short hand for
UPDATE or INSERT and deals with conflicts of constraints within a database table. There are times, when inserting new rows into a database table, that you may be hitting a unique value constraint or another conflict. This is generally caused by the presence of a primary key.
UPSERT provides an easy way to insert values, while providing a merge strategy in the case of issues with constraints. It’s also faster than the traditional way of doing a SELECT then an INSERT or UPDATE, as you only need to send through one query.
ON CONFLICT Clause
While the feature is called UPSERT, it’s implemented within SQL as the
ON CONFLICT clause.
ON CONFLICT clause normally takes two arguments:
conflict_target: the column/constraint to which the constraint belongs
DO NOTHING. If you specify
DO UPDATEyou need to include details of what to do by specifying the columns to update with the normal
DO NOTHING will simply avoid inserting an extra row, which means that it’s more like a no-op, rather than an exception.
DO UPDATE, you have access to a special table called
excluded which contains all the values within the INSERT statement. To put it another way: it’s the pre-committed data.
Full documentation can be read here: https://www.postgresql.org/docs/9.5/static/sql-insert.html
Users Table Example
Say you have a table which is quite simple:
CREATE TABLE users ( id bigserial primary key, email varchar(255) unique, first_name varchar(255), last_name varchar(255) )
This table allows us to have a list of users, and have the email unique, so that we can prevent any duplications.
We can insert a new row easily:
test=> INSERT INTO users (email, first_name, last_name) VALUES ('email@example.com', 'Example', 'User'); INSERT 0 1
But if we try and insert the same row again:
test=> INSERT INTO users (email, first_name, last_name) VALUES ('firstname.lastname@example.org', 'Example', 'User'); ERROR: duplicate key value violates unique constraint "users_email_key" DETAIL: Key (email)=(email@example.com) already exists.
We get an ERROR. We have two options with UPSERT to change this behaviour.
Sometimes it might be appropriate to simply discard the insert, as there is already a row there. To do this, we simply add
ON CONFLICT DO NOTHING to the end of the SQL query:
test=> INSERT INTO users (email, first_name, last_name) VALUES ('firstname.lastname@example.org', 'Example', 'User') ON CONFLICT DO NOTHING; INSERT 0 0
We do get information back that we’re not going to be adding an extra row, but it’s not a total error in this case.
What if, instead of doing nothing, we wanted to update the first name and last name with our newly inserted row. We can do this by providing a merge strategy.
Firstly, there may be multiple conflicts, so we need to address which conflict we want to address. In this case, we want to execute our merge strategy when email is constrained:
ON CONFLICT (email) DO UPDATE
Secondly, we want to say what happens when we update. In our case, we want the first name and last name to be overwritten:
SET first_name = excluded.first_name, last_name = excluded.last_name
What this is saying is set
last_name to the values in the
excluded table. The
excluded table is a special table which contains the values from our insert query. This statement will replace both values with our new ones.
Running this query returns results pretty similar to the initial insert:
test=> INSERT into USERS (email, first_name, last_name) VALUES ('email@example.com', 'Updated', 'Name') ON CONFLICT (email) DO UPDATE SET first_name = excluded.first_name, last_name = excluded.last_name; INSERT 0 1
Doing a select we can see that our columns have been updated with our new values:
test=> select * from users; id | email | first_name | last_name ----+------------------+------------+----------- 1 | firstname.lastname@example.org | Updated | Name
Camel Upsert Component
With the advent of UPSERT we’ve created a Camel Component that makes use of this feature heavily. This component provides a simple way of generating tables for data warehouses and populating data from Camel exchanges.
What this will do is allow you to create database tables from a variety of sources without having to hand craft SQL.
One of the harder things is to automatically create and manage database schema. As upsert was designed for use in data-warehousing, it abstracts some of the more complex schema management away, allowing simple tables to be created.
A table is created by a yaml definition file, which indicates the table name, the column and the column’s data types.
As an example, a table definition for documents within alfresco is as follows:
documents: uuid: uuid name: string created: date modified: date creator: string modifier: string path: string size: long mimetype: string
As documents are tracked by UUID, we can use UPSERT to simply update the table if there are any changes to the UUID
Sometimes some systems will provide types of objects which don’t match the database schema. Upsert will, where possible, automatically convert into the appropriate table type.
Most of the time translation from one system to another can be provided by a simple map lookup, mapping fields to values. Upsert provides a mechanism within camel’s configuration to enable a simple mapping of values to database fields.
File system example
Utilising Apache camel’s filesystem component, you can easily index a file system
Given a table such as:
files: name: string modified: date path: string size: long
You can use a camel configuration like so:
<?xml version="1.0" encoding="UTF-8"?> <blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cm="http://aries.apache.org/blueprint/xmlns/blueprint-cm/v1.1.0" xsi:schemaLocation=" http://www.osgi.org/xmlns/blueprint/v1.0.0 http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd http://camel.apache.org/schema/blueprint http://camel.apache.org/schema/blueprint/camel-blueprint.xsd http://aries.apache.org/blueprint/xmlns/blueprint-cm/v1.1.0 "> <bean id="fsUpsert" class="com.parashift.upsert.UpsertHeaderToBody"> <property name="table" value="files" /> <property name="translations"> <map> <entry key="CamelFileName" value="name" /> <entry key="CamelFileLastModified" value="modified" /> <entry key="CamelFileParent" value="path" /> <entry key="CamelFileLength" value="size" /> </map> </property> </bean> <camelContext id="testRoute" xmlns="http://camel.apache.org/schema/blueprint"> <route id="upsert"> <from uri="file:///Users/cetra/Desktop?noop=true&recursive=true" /> <bean ref="fsUpsert" /> <to uri="upsert://jdbc:postgresql://localhost/test?username=test&password=test&modelFile=filesystem.yaml" /> </route> </camelContext> </blueprint>