Azure Data Factory Lookup: First Row Only & Empty Result Sets

Posted by Niall's Data Blog on Saturday, March 23, 2019

When using the lookup activity in Azure Data Factory V2 (ADFv2), we have the option to retrieve either a multiple rows into an array, or just the first row of the result set by ticking a box in the UI.

The ‘First Row Only’ Checkbox at the bottom
The 'First Row Only' Checkbox at the bottom

This allows us to either use the lookup as a source when using the foreach activity, or to lookup some static or configuration data. Recently when connecting to a source system to retrieve configuration data, I discovered that the development environment did not always return a row. As we did not have access to fix the problem in the source system, I had to provide a default value.

Diving into the details

Let’s have a look at what the lookup activity returns in each of these situations. First up, the JSON for the activity output when retrieving multiple rows from a demo table.

{
    "count": 5,
    "value": [
        {
            "TableName": "SalesLT.Customer",
            "NullValue": null
        },
        {
            "TableName": "SalesLT.ProductModel",
            "NullValue": null
        },
        {
            "TableName": "SalesLT.ProductDescription",
            "NullValue": null
        },
        {
            "TableName": "SalesLT.Product",
            "NullValue": null
        },
        {
            "TableName": "SalesLT.ProductModelProductDescription",
            "NullValue": null
        }
    ],
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (UK South)"
}

We can see the returned JSON is an object with a count property, and a value property which is the array of objects, each with properties for each of the columns in the result set. We can use the count to check if rows have been returned. If no rows are returned the count property is 0, and we have an empty array of objects.

When we tick the First row only checkbox on the lookup activity, the JSON output changes. We now have an object with a property firstRow, which has properties for the columns of the first row returned.

{
    "firstRow": {
        "TableName": "SalesLT.Customer",
        "NullValue": null
    },
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (UK South)"
}

This gives us a nice way to access the properties of the single row we looked up in expressions in subsequent activities, where the action the lookup took is clearer than if we used a property indexer.

So What’s the Problem

So what happens in the case where no rows are returned and the First row only checkbox is ticked?

{
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (UK South)"
}

This time the JSON returned does not contain the firstRow property, and if we try and reference it in an expression in a subsequent activity, we get a runtime error.

Here’s the error
Here's the error

How Can We Fix It?

So how can we work around this limitation. We could use an If Condition activity to check the property exists using the contains() function. At the time of writing the documentation for this states:

Returns true if dictionary contains a key, list contains value, or string contains substring.

I don’t find this particularly clear, as JSON objects behave like dictionaries, so we can check for a property existing like so.

Check if a property exists on a lookup result set
Check if a property exists on a lookup result set

The if condition can then use the returned values if they exist, or some default values otherwise.

Better error handling
Better error handling

This approach can work, but the subsequent activities are then hidden within the If Condition activity. We also need to consider that in ADFv2 we are charged on activity executions, so maybe we can do better.

Instead of thinking of how we can use an activity to validate that the lookup returned a row, maybe we can force the query to return default values if there is no row returned.

SELECT ISNULL( t.fiscalPeriod, '2019Q1' )
FROM etl.TableMetadata AS t
-- Join to a dummy table to guarantee a row is returned
RIGHT JOIN (SELECT TOP 1 object_id FROM sys.objects) AS d ON 1=1

We altered the query used by the lookup activity to add a right join to a dummy row. This forces the query to always return a single row, and the output JSON always contains the firstRow property. We then coalesce the values to be the default values we want to use. This pattern simplifies the logic in the pipeline, and adds resilience when the lookup does not return a single row.