Skip to main content

Dimensions

Dimensions Overview#

Dimensions are non-aggregatable expressions that are used to define the level of aggregation that a Metrics Framework user would like a metric to be aggregated to. Dimensions cannot be aggregated, so they are then considered to be a property of the primary or unique identifiers of the table.

Dimensions are defined within data sources along with measures and identifiers. You can think of dimensions as the columns in your data table or query that are non-aggregatable โ€” they provide categorical or time-based context to enrich queries, and are usually included in the GROUP BY clause of your SQL query.

Consider the following Transform data source as the basis for the examples to follow.

data_source:  name: transactions  description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU.   owners: support@transform.co  sql_table: schema.transactions
# --- IDENTIFIERS ---  identifiers:     - name: transaction_id      type: primary    - name: customer_id      type: foreign    - name: store_id      type: foreign    - name: product_id      type: foreign
# --- MEASURES ---   measures:     - name: revenue      description:        expr: price * quantity      agg: sum
# --- DIMENSIONS ---  dimensions:    - name: ds      type: time      expr: date_trunc('day', ts)      type_params:        is_primary: true        time_format: YYYY-MM-DD        time_granularity: day    - name: is_bulk_transaction      type: categorical      expr: case when quantity > 10 then true else false end

All dimensions require a name, type and in most cases, an expr parameter.

name refers to the name for the dimension, which will be exposed to the end user. This works effectively as an alias, if the underlying column or SQL query reference is a different string. The input should be a valid string.

type refers to the type of dimension created in the data source, which will be used to group queries. The type can reflect the underlying data type of the underlying warehouse column, or the new type to be assigned if a custom SQL query is written into the expr parameter. Current types include:

  • categorical: usually a way to group rows within a table, such as geography, product type, color, etc.
  • time: points to underlying date field in the warehouse

expr refers to the underlying column or SQL query to define the dimension. Valid inputs are the simple column name (no need to scope to schema as it is included in the sql_table reference) or a valid SQL expression, such as case when quantity > 10 then true else false end. If a dimension is created without an expr parameter, Transform will look for a column that matches the name parameter.

Dimensions Types#

Dimensions as Identifiers#

Dimensions can act as the join keys between tables and using their types (primary, foreign, or unique) the Metrics Framework constructs the appropriate joins with other tables. They also have all the properties of a dimension in that they can act as a granularity for aggregation.

There are three key types in the Metrics Framework: primary, unique and foreign. and then 9 potential join pairs.

By defining these identifiers appropriately for every data source, the Metrics Framework is able to avoid fan-out and chasm joins by avoiding foreign to foreign, primary to foreign, and unique to foreign joins.

In the above data source, transaction_id is defined as a primary identifier for the transactions data source.

Category Dimensions#

Category dimensions allow metrics to be grouped by different "category" slices and dices, such as product type, color, or geographical area. These may point to existing columns in the data table, or be calculated using a SQL expression.

As different business functional areas may have different ways to categorize the data, it can be very valuable to create new dimensions to capture those bespoke groupings.

Examples:

In the following example, the is_bulk_transaction dimension creates a new category for which data can be described by. If a row in the quantity column exceeds 10, the transaction will be considered a bulk transaction.

    - name: is_bulk_transaction      type: categorical      expr: case when quantity > 10 then true else false end

is_bulk_transaction will then be available to the end user to filter or group by in the UI.

Time Dimensions#

Time dimensions are used to aggregate metrics against different levels of time granularity. When defining a data source with a primary time dimension, it is now possible to use other granularities other than day, which include week, month, quarter, and year. This will appropriately constrict the options possible when working with a metric (e.g. can't query at a DAY granularity if the metric is defined at a WEEK granularity), and also properly aggregate when querying metrics with different time granularities (e.g. querying two metrics, one with a DAY granularity and another with a MONTH granularity will return a result with a MONTH granularity by default). In addition to having appropriate metadata, the specified granularity will allow graphs in the UI to render correctly (e.g. a DAY granularity can't be specified for a MONTHLY metric).

To add a primary time dimension, add the is_primary: true parameter to the time dimensions type_params. It is best practice to ensure that the primary time dimension across different data sources are named the same. For data sources that contain a measure, a primary time dimension is required.

tip

Time dimension columns must be datetime data type if you are using BigQuery as your Data Warehouse. If they are stored as timestamp or another data type, you can cast the dimensions to datatime in the expr property. i.e expr: cast(timestamp as datetime)

- name: ds  type: time  expr: date_trunc('day', ts)  type_params:    is_primary: true    time_format: YYYY-MM-DD    time_granularity: day

To assert that a dimension exists over a specific time window(e.g. a date-partitioned dimensional table), add the is_partition: true parameter to the dimension. The framework will add this date to any joins on this table to ensure that the correct dimensional values are joined to measures in other tables.

- name: ds  type: time  expr: date_trunc('day', ts)  is_partition: true  type_params:    is_primary: true    time_format: YYYY-MM-DD    time_granularity: day