Skip to main content

Identifiers

What are the identifiers in the data source yaml?#

Identifiers are the join key columns in your data source that can be used to join to other data sources. The identifier types in Transform are primary, foreign, or unique. Identifiers can be single identifiers or composite identifiers. Additionally, identifiers can be referenced directly by column name from the data source or they can be altered using an expression. Identifiers can also be used as a dimension allowing you to aggregate a metric to the granularity of that identifier. Identifiers need to be unique in the same data source, but do not have to be unique in different data sources.

Identifier Types#

Transform uses the identifier's type to determine how to join data sources. The join logic depends on the type of identifier. Check out our join logic page for more info on how Transform joins data sources.

Primary: A primary key has one and only one record for each row in the table and it is also inclusive of every record in the data warehouse.

Unique: A unique key has one and only one record for each row in the table, but it may only have a subset of every record in the data warehouse. They can also have nulls.

Foreign: A foreign key can have zero, one or many of the same records. They can also have nulls.

Example:

identifiers:  - name: transaction    type: primary    expr: id_transaction  - name: order    type: foreign    expr: id_order  - name: user    type: foreign    expr: SUBSTRING(id_order FROM 2)

Composite Keys#

Transform's framework supports configuring data sources that contain composite keys. In database design, a composite key is a key that consists of two or more attributes (table columns) that together uniquely identify an entity occurrence or a table row instead of a key that uses a single column or attribute. Composite keys can be used in foreign or primary identifiers, but not with unique identifiers. Identifiers with composite keys can be defined in the identifiers section of the data source like any other identifiers.

Examples

A table has a composite key made up of two fields: user_id and message_id. Additionally, for this example, assume the user_id field cannot be joined on its own and is only joinable when tied to a particular message

Example 1:

To configure a composite key

  • Name the Identifier using the same name field as when naming any other Identifier. In this example we'll define a Identifier called user_message
  • Specify the identifier type via type field. Here it's primary .
  • Use the identifiers field to define the components of the composite key.
  • You can specify new identifiers directly in the composite key like user_id or use the ref field to reference identifiers you've already defined i.e ref: message
data source: users  identifiers:    - name: message      expr: message_id      type: foreign      #Here we define the compostite key by refrecing the message identifier    - name: user_message      type: primary #Composite keys can also be a foreign       identifiers:        - name: user          expr: user_id        - ref: message

Transform will never implicitly create a composite key. If a data source happens to have team_id and user_id, we won't assume that data source can be joined with user_team unless explicitly defined.

Note: If keys that make up an identifier cannot stand alone or be joined to other data sources we recommend that you define those fields directly in the composite statement as we did with user. Notice in this example that we only defined one field before the composite key definition (message_id) and defined the other (user_id) directly in the composite key. You can still choose to explicitly define all identifiers that make up your composite key. Our recommended approach is to consider whether the field can stand alone or whether it is only usable as part of a composite key.

Example 2:

You can optionally define all fields of a composite key ahead of the definition of the composite primary key.

data_source:      name: users  description: Users and relationships to the teams they are on  owners:     - owner@company.com   sql_table: people.users  identifiers:    - name: team_id      type: foreign    - name: id      type: foreign    - name: user_team      type: primary      identifiers:        - ref: team_id        - ref: id

Example 3:

You can define any number of columns as your composite key. Here we have three components in a composite key. In this case, the fields are team_id, id, and team_region, respectively.

data_source:      name: users  description: Users and relationships to the teams they are on  owners:     - owner@company.com   sql_table: people.users  identifiers:    - name: team_id      type: foreign    - name: id      type: foreign    - name: team_region       type: foreign    - name: user_team      type: primary      identifiers:        - ref: team_id        - ref: id        - ref: team_region

Querying Composite Keys in MQL#

You can query using a where clause in MQL by specifying each specific field that makes up your composite key. For example, if your composite key is made up of user_id and message_id, you can run a query such as.

mql query --metrics messages --dimensions ds --where "user_id = 10 and team_id = 7"

Derived Identifiers#

A common design pattern you may have in your data warehouse is to have differently prioritized dimensions. For example, a user_id from one table and person_id from a different table may refer to the same user. One field may be preferred to use in a query, with the other only being used if the first is null.

You can use a derived_identifier to create the coalesce in Transform! See an example below.

derived_identifier:  name: c_user_id  owners:    - support@transformdata.io  expr: COALESCE(user_id, person_id)  expr_elements:    - person_id    - user_id

This identifier example is derived from the following two datasources (summarized):

---data_source:  name: user_core_source  table: user_core  identifiers:    - name: user_id      type: primary...---data_source:  name: bookings  table: fct_bookings  identifiers:   - name: person_id     type: foreign

After defined, you can reference the identifier in any mql query :

mql query --metrics bookings --dimensions c_user_id