Identifiers
#
What are the identifiers in the data source yaml?Identifiers are the join key columns in your data source that can be used to join to other data sources. The identifier types in Transform are primary, foreign, or unique. Identifiers can be single identifiers or composite identifiers. Additionally, identifiers can be referenced directly by column name from the data source or they can be altered using an expression. Identifiers can also be used as a dimension allowing you to aggregate a metric to the granularity of that identifier. Identifiers need to be unique in the same data source, but do not have to be unique in different data sources.
#
Identifier TypesTransform uses the identifier's type to determine how to join data sources. The join logic depends on the type of identifier. Check out our join logic page for more info on how Transform joins data sources.
Primary: A primary key has one and only one record for each row in the table and it is also inclusive of every record in the data warehouse.
Unique: A unique key has one and only one record for each row in the table, but it may only have a subset of every record in the data warehouse. They can also have nulls.
Foreign: A foreign key can have zero, one or many of the same records. They can also have nulls.
Example:
identifiers: - name: transaction type: primary expr: id_transaction - name: order type: foreign expr: id_order - name: user type: foreign expr: SUBSTRING(id_order FROM 2)
#
Composite KeysTransform's framework supports configuring data sources that contain composite keys. In database design, a composite key is a key that consists of two or more attributes (table columns) that together uniquely identify an entity occurrence or a table row instead of a key that uses a single column or attribute. Composite keys can be used in foreign or primary identifiers, but not with unique identifiers. Identifiers with composite keys can be defined in the identifiers section of the data source like any other identifiers.
Examples
A table has a composite key made up of two fields: user_id
and message_id
. Additionally, for this example, assume the user_id
field cannot be joined on its own and is only joinable when tied to a particular message
Example 1:
To configure a composite key
- Name the Identifier using the same
name
field as when naming any other Identifier. In this example we'll define a Identifier calleduser_message
- Specify the identifier type via
type
field. Here it'sprimary
. - Use the
identifiers
field to define the components of the composite key. - You can specify new identifiers directly in the composite key like
user_id
or use theref
field to reference identifiers you've already defined i.eref: message
data source: users identifiers: - name: message expr: message_id type: foreign #Here we define the compostite key by refrecing the message identifier - name: user_message type: primary #Composite keys can also be a foreign identifiers: - name: user expr: user_id - ref: message
Transform will never implicitly create a composite key. If a data source happens to have team_id and user_id, we won't assume that data source can be joined with user_team unless explicitly defined.
Note: If keys that make up an identifier cannot stand alone or be joined to other data sources we recommend that you define those fields directly in the composite statement as we did with user
. Notice in this example that we only defined one field before the composite key definition (message_id
) and defined the other (user_id
) directly in the composite key. You can still choose to explicitly define all identifiers that make up your composite key. Our recommended approach is to consider whether the field can stand alone or whether it is only usable as part of a composite key.
Example 2:
You can optionally define all fields of a composite key ahead of the definition of the composite primary key.
data_source: name: users description: Users and relationships to the teams they are on owners: - owner@company.com sql_table: people.users identifiers: - name: team_id type: foreign - name: id type: foreign - name: user_team type: primary identifiers: - ref: team_id - ref: id
Example 3:
You can define any number of columns as your composite key. Here we have three components in a composite key. In this case, the fields are team_id
, id
, and team_region
, respectively.
data_source: name: users description: Users and relationships to the teams they are on owners: - owner@company.com sql_table: people.users identifiers: - name: team_id type: foreign - name: id type: foreign - name: team_region type: foreign - name: user_team type: primary identifiers: - ref: team_id - ref: id - ref: team_region
#
Querying Composite Keys in MQLYou can query using a where clause in MQL by specifying each specific field that makes up your composite key. For example, if your composite key is made up of user_id
and message_id
, you can run a query such as.
mql query --metrics messages --dimensions ds --where "user_id = 10 and team_id = 7"
#
Derived IdentifiersA common design pattern you may have in your data warehouse is to have differently prioritized dimensions. For example, a user_id
from one table and person_id
from a different table may refer to the same user. One field may be preferred to use in a query, with the other only being used if the first is null.
You can use a derived_identifier
to create the coalesce in Transform! See an example below.
derived_identifier: name: c_user_id owners: - support@transformdata.io expr: COALESCE(user_id, person_id) expr_elements: - person_id - user_id
This identifier example is derived from the following two datasources (summarized):
---data_source: name: user_core_source table: user_core identifiers: - name: user_id type: primary...---data_source: name: bookings table: fct_bookings identifiers: - name: person_id type: foreign
After defined, you can reference the identifier in any mql query
:
mql query --metrics bookings --dimensions c_user_id