Join Logic
Transform uses the identifiers
to act as the join keys between tables. The Metrics Framework Framework constructs the appropriate joins with other tables based on the identifier types (primary, foreign, or unique).
When creating a data source it's important to think about which other data sources will be able to use the data source to combine metrics and dimensions. The Metrics Framework avoids fan-out and chasm joins by avoiding foreign to foreign, primary to foreign, and unique to foreign joins.
See the table below for a quick reference to what join types the Metrics Framework constructs based on the identifiers.
Identifier Type - Table A | Identifier Type - Table B | Join Type |
---|---|---|
Primary | Primary | Left |
Primary | Unique | Left |
Primary | Foreign | Fanout (Not Allowed) |
Unique | Primary | Left |
Unique | Unique | Left |
Unique | Foreign | Fanout (Not Allowed) |
Foreign | Primary | Left |
Foreign | Unique | Left |
Foreign | Foreign | Fanout (Not Allowed) |
See a quick overview of SQL Joins below!
#
Multi-Hop JoinsTransform allows users to join measures and dimensions across a graph of identifiers. We call this a 'multi-hop join' as a user can 'hop' across one table to another.
For example: given the schema below, it would be possible in Transform to create the metric 'Average Purchase Price by Country'.
Using Multi-Hop Joins
Assuming that the identifiers are set to join the tables together, our Framework can perform multi-hop joins under the hood by creating a graph using the Data Sources as nodes and the join paths as edges.
When you are initiating a query to do a multi-hop join via Transform's MQL Command Line Interface, you need to describe the join path and nodes in the dimension name. To query dimensions without a multi-hop join involved, you use the dimension prefixed with a dunder (double underscore) and identifier. The same paradigm applies for dimensions retrieved by a multi-hop join, except you may have multiple sets of dundered identifiers, since you are hopping across multiple data sources.
An example query in the CLI using this structure would be as follows. Notice how the dimension selection includes two sets of dunders -- one for the dimension user_id__country
and one for the dimension country_id__country
:
mql query --metrics average_purchase_price dimensions --ds --dimensions user_id__country_id__country_name
In this example - Transform joins the user_signup
to sales
on user_id
, then joins country_dim
to user_signup
on country_id
.
Note: If you have multiple dimensions with the same name, it would be possible for an ambiguous join path to be created. One key way to avoid this is to use a dimension from the list-metrics
feature in the CLI, which includes the prefixed identifier that's meant to disambiguate where the dimension comes from.