Category Archives: SSIS

Upserts in SSIS Using SQL Server Business Intelligence Studio

SQL Server Business Intelligence Studio provides an excellent graphical interface for creating some very complex ETL packages, but there are a few simple operations that are surprisingly absent. One of these operations is the simple upsert operation, where you want to either insert or update a record in the target from the source depending on whether or not it already exists in the target.

If you’ve already tried looking for an upsert transformation in the tool bar, you can forget about it – it simply isn’t there. There are two common ways to set up and upsert, one being the clean-but-slow way (using Lookup transformations), and the other being the fast-but-dirty way (using insert failures).

Method 1 – Clean and Slow

  1. Perform the requisite boilerplate steps by adding a new Data Flow Task, then add and configure your source to pull whatever data you need, including whatever key (or surrogate key) you will need to check if the record exists in the target.
  2. Add a Lookup transformation and connect the source’s output to it. Double-click the Lookup transformation, and ensure you set the ‘Specify how to handle rows with no matching entries’ dropdown to ‘Redirect rows to no match output’
  3. In the Lookup’s “Connection” menu, specify the connection for the data target (since you will be looking up the source data’s key in the target).
  4. In the Lookup’s “Columns” menu, link up the source key column with the target key column. Note that if the data types of the two columns don’t match, you’ll need to set up a Data Conversion Transformation between your source and the lookup. This is shown in the screenshot below.
  5. Drag a destination object (e.g. OLE DB Destination) onto the workspace and connect your lookup’s Match Output to it. You’ll point the object’s connection to the target and peform column mappings as needed. This path handles the ‘insert’ part of the upsert.
  6. Drag an OLE DB Command object to the workspace and connect your lookup’s No Match Output to it. You’ll set up an update command here, setting the columns in your source to map to the parameters specified in the SQL statement. This path handles the ‘update’ part of the upsert.
  7. That’s it! Below is a screenshot of the final product

This method is nice and clean, but the lookup can drag performance on large datasets. If performance is at a premium and you are a.) confident in the integrity of your database schema (i.e. you have good primary keys set up on your target table), and b.) are fairly certain that no errors exist in your data – like those pesky truncation or foreign key violations - then you may opt for a more simple version.

Method 2 – Quick and Dirty

  1. Perform the requisite boilerplate steps by adding a new Data Flow Task, then add and configure your source to pull whatever data you need, including whatever key (or surrogate key) you will need to check if the record exists in the target.
  2. Set up a data conversion task if you need one.
  3. Add your destination object, configuring it to redirect rows on error. Map your columns as needed, making sure that you map the keys of the two tables together.
  4. Add an OLE DB Command transformation, and connect the error output of the destination object to it. Configure your command object to update the target table where the key field equals the value of the key from the source.
  5. That’s it! Below is a screenshot of the final product:

The performance on this method is much faster, but this operation is a whole heck of a lot more dangerous. Any and all errors coming out of the destination object will be “swallowed” and you’ll never know what they are. Use this method only if you really know what you’re doing.

There are, of course, other ways to perform upserts (e.g. using Slowly Changing Dimensions), but I’ve found these methods fairly cumbersome. The two methods above get the run-of-the-mill upsert job done for me every time.

More questions about upserts or SSIS/BIS/ETL in general? Email the author at cnewman@cpointe-inc.com.

Follow

Get every new post delivered to your Inbox.