Niall's Data Blog

A Data Engineer / Architect writing about Tech, Data and the Community

Build & Deployment Warnings in Azure DevOps

It's all in the details...

I use Azure DevOps with clients a lot, mostly for deploying data platforms. Sometimes have a task within our pipeline that fails, but the failure is actually something that is a problem, but not something that should stop a build or a deployment. Some examples of this situation might be Deploying to an environment and configuring a user or group to have access to the table and the user or group does not exist Sending an alert from a build or deployment, and the code to send the alert fails Catching specific known errors, and dropping them down to a warning.

This is my New Blog!

Migrating from WordPress to Hugo

Migrating from WordPress to Hugo I’ve had a blog for years at sqlsmarts.com (no link, its dead now) that I very occasionally wrote on. I was on WordPress, and every time I wanted to just write a post, there seemed to be a hundred things that needed fixing or updating. Most of the time it honestly felt like it was more effort than it was worth. My blog had always been hosted by an old colleague, and when that was coming to and end it was either move it or lose it time.

Associative Grouping using Spark - Part 3

This is part of series of posts about associative grouping: Part 1 - Associative Grouping using tSQL Recursive CTE’s Part 2 - Associative Grouping using tSQL Graph In the first two parts of this series we looked at how we could use recursive CTE’s and SQL Server’s graph functionality to find overlapping groups in two columns in a table, in order to put them into a new super group of associated groups.

Introducing AzureDataPipelineTools

A few months ago my friend Richard Swinbank posted a blog, More Get Metadata in ADF, about the limitations of using the Get Metadata activity in ADF to get information about files in a data lake. This to a twitter conversation as a bunch of other data engineers had been building the same tools for different companies. Due to "popular" demand I've released the definition of my #Azure #DataFactory pipeline to Get Metadata recursively https://t.

Azure Data Factory: Dev Mode vs Published Code

I’ve worked with quite a few people new to Azure Data Factory, and one thing that seems to confuse new users is the difference between the developer sandbox where we build pipelines, and the published/deployed code. Understanding this is key to working with Git and using CI/CD pipelines to deploy your code, and getting other Azure services to integrate nicely to call your pipelines. Connecting to ADF A good first place to start is to understand the different ways we can interact with a data factory.