1. Source Configuration

You can find an overview of this section here

1.1. Introduction

This may seem optional, but we start our model implementation by specifying our data source configuratins in a .yml file, coinciding with our model directory.

This is great for long-term reference & configurations.

1.1.1. Why should I Care?

Shrug

1.1.1.1. My thoughts

Well, there are cases where you have a reference to table "CuteDB"."NiceSchema"."Customers" in several locations of your Data Model. However, what happense when you migrate to a new databse due to poor naming conventions from "CuteDB"."NiceSchema"."Customers" to "AwesomeDB"."NiceSchema"."Customers".

You can do a mass replacement in your repository, depending on your Code editor. Or, you can make the changes all in one data reference Before starting our model,

1.1.1.2. Quoting dbt Docs:

Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools.

By declaring these tables as sources in dbt, you can then:

  • Use {{ source() }} function, helping define the lineage of your data

  • Test your assumptions about your source data

  • Calculate the freshness of your source data

1.2. Using Sources

We configure our yml file for specifying source table references, in our upcoming modeling. This includes:

  • Snowflake specific table references

  • aliases

Note: We can add more to this file, like built-in test & documentation functionality, but we’ll cover that in a later topic

Now, let’s see how we can use Sources to specify our source path configurations

Here is a snippet of what I implemented. Everything in comments will be in #’s

Sources Configuration

Here’s a snippet of what we implement for our source configuration

version: 2

sources:
  - name: customers # Customer alias for our table references in {{ source() }}
    database: SNOWFLAKE_SAMPLE_DATA
    schema: TPCDS_SF100TCL
    loaded_at_field: _loaded_at
    tables:
      - name: CUSTOMER
      - name: CUSTOMER_DEMOGRAPHICS
      - name: CATALOG_SALES

1.3. Tips

1.3.1. Errors

Why do I receive an error:

  • The selector 'snowflake_demo_catalog_sales' does not match any nodes and will be ignored?

    • Ans: This is because either our profiles.yml file does not have the right configuration, or our models have not been explicitly mentiones

Try out a prefix of dbt --warn-error run in your dbt run (Available in 0.17+)

1.4. Other Questions

  • Why is my schema like dev_{name}_tablename?

    • Recall our created profiles.yml file. We specified our default target to be dev. In the dev configuration, we specify our table & view creation under a prefix dev_{name}

    E.g. I had schema: dev_raulingaverage

  • How does dbt know to find my tablename in the sources file?

    • Ans: We have our alias customer equivalent to {DATABASE}.{SCHEMA}

  • How can it access such table reference?

    • Ans: In this tutorial, I am running on SYSADMIN in Snowflake. However, in a production environment, you may have to coordinate with your Snowflake admin to provide permissions to your particular table reference :/