Niall's Data Blog

A Data Engineer / Architect writing about Tech, Data and the Community

Associative Grouping using Spark - Part 3

This is part of series of posts about associative grouping: Part 1 - Associative Grouping using tSQL Recursive CTE’s Part 2 - Associative Grouping using tSQL Graph In the first two parts of this series we looked at how we could use recursive CTE’s and SQL Server’s graph functionality to find overlapping groups in two columns in a table, in order to put them into a new super group of associated groups.

Introducing AzureDataPipelineTools

A few months ago my friend Richard Swinbank posted a blog, More Get Metadata in ADF, about the limitations of using the Get Metadata activity in ADF to get information about files in a data lake. This to a twitter conversation as a bunch of other data engineers had been building the same tools for different companies. Due to "popular" demand I've released the definition of my #Azure #DataFactory pipeline to Get Metadata recursively https://t.

Azure Data Factory: Making Non-Dynamic Linked Services Dynamic

Linked Service Options Using the UI Note: The example here is the Salesforce linked service, but this technique also works for other linked services where the UI does not support adding parameterised properties. One of my clients has been adding data from multiple Salesforce instances to their data platform this week. One of their developers asked me if the Salesforce linked service could be made dynamic, as there is no place in the GUI to add parameters, or a dynamic values for the URL, user name or credentials.

Azure Data Factory Lookup: First Row Only & Empty Result Sets

When using the lookup activity in Azure Data Factory V2 (ADFv2), we have the option to retrieve either a multiple rows into an array, or just the first row of the result set by ticking a box in the UI. The 'First Row Only' Checkbox at the bottom This allows us to either use the lookup as a source when using the foreach activity, or to lookup some static or configuration data.