Intro to cohort analysis

Cohort analysis is a method to track how different groups of users behave over time. A cohort represents a group of users with shared characteristics, for instance:

  • Users who began using your product in a specific month
  • Users from a particular country or segment

While a cohort can also refer to (for example) a group of products, it's usually about users.

Cohort analysis involves examining how these groups of users behave over time. For example, you might observe how many of them return each month, or how much they spend over time.

Some scenarios where you might use cohort analysis:

  • Activation analysis - Group users by their start date and monitor their activation progress over time.
  • Retention analysis - Group users by their start or activation date and observe how many continue to use the product over time.
  • Revenue analysis and customer lifetime value - Group users by their start or first purchase date to identify trends in revenue and customer lifetime value.

Exploring cohorts

Explore cohorts table

How to read a cohorts table

  • Row by row - Each row gives you insights into the behavior of user groups over time. For instance, you can see how long different user groups remain active and at what point you start to lose them.
  • Comparing rows - Comparing rows gives insights about your product's lifetime. You can compare the behavior of user groups who began using your product early with those who started recently. This can give you insights on things like product updates or marketing efforts.
  • Diagonals - Any diagonal pattern in your cohorts table can provide insights into seasonality or indicate that something significant happened at a specific time. This is because calendar periods, for example 'May', will always be represented diagonally in a cohorts table.

Breakdown

Choose how to break down your cohorts, grouping user cohorts by a start date or a dimension. You can also adjust the granularity of both rows and columns.

Filters

Your cohorts table can be filtered like any other visualization.

Colors

Select from two color themes

  • monochrome
  • red-yellow-green

Customize color thresholds to find patterns or highlighting insights within your cohorts table.

Color cohorts table

Defining cohorts

To add cohorts to a metric, use the Cohorts section in the metric definition. As a workspace administrator, navigate to the metric and select 'Edit'. Once defined, cohorts are visualized as a table.

Define cohorts table

There are two ways of defining cohorts in Steep:

  • Standard cohorts – Easily define cohorts from your existing tables, dynamically calculated for quick insights. No adjustments to your data structure required.
  • Advanced cohorts – Built from tables explicitly created for cohorts, supporting more complex analyses like advanced retention cohorts.

Admins only

Only workspace administrators can define cohorts

Standard cohorts

Standard cohorts are defined from general detailed tables, for example orders or transactions, and dynamically calculated. They are flexible and provide a great option for most basic use cases.

Setting up a standard cohort

  1. Define your metric basics and add Cohorts using the + button
  2. Choose a Cohort time - this defines how cohorts will be grouped
  3. Label your cohorts
  4. Decide if you want to break down by calendar period or rolling period (e.g., 7, 30, 90, or 365 days).
  5. Let the % of initial value be toggled off (this is used for standard retention cohorts)

Setting up a standard retention cohort

  1. Set it up like a standard cohort. Use a join if necessary.
  2. Toggle on % of initial value to create a breakdown where the current value for each column is divided by the cohort's initial value.
Standard retention cohort example

Suppose you want to measure retention by customers with orders by setting up a metric for Active customers based on a join by the tables customers and orders. An active customer is defined as a customer with orders. We set the cohort time to be registration_date to create user cohorts grouped by the date of the registration.

By enabling % of initial value , the cohort breakdown will show:

Current value (customers with orders this period) / Initial value (customers with orders in period 0) = Retention rate.

Example: Data tables

Table: customers

idregistration_datecountry...
abc1232024-01-01SE...
abc1242024-01-02FR...

Table: orders

customer_idorder_dateamount...
abc1232024-03-01100...
abc 1232024-04-05100...

Example: Define a standard retention metric

Define standard retention metric

Example: Explore a standard retention metric

Explore standard retention metric

Dimensions

Any defined dimension can be used to group cohorts.

Rolling period or calendar period

By default your cohorts table will breakdown your cohort over calendar periods, but you can choose to change this to rolling periods (7, 30, 90 or 365 days)

Granularity

For standard cohorts, all workspace members can automatically explore all granularities of the cohort table, for both rows and columns. If needed, it is possible to limit the granularity options, for the cohort table, in the metric definition.

Adjusting the granularity will impact both the rows and columns.

Please note

The rows in the cohorts tables (cohorts) will be determined by the cohorts’ granularity. The columns in the cohorts table (breakdown over time) will be determined by the overlap of time grains between the metric time grain and the cohorts’ granularity.

Cohort only

Enable the Cohorts Only option for metrics that are only used for cohort analysis. An example of this could be a user retention metric. This option disables all other visualization types (line, bar, rank and pivot table) for the metric.

Advanced cohorts

Advanced cohorts are built from a pre-aggregated table created specifically for the cohorts you want to set up. They offer less flexibility but support more advanced use cases, such as more advanced retention cohorts.

Granularity

Decide on one specific granularity when pre-aggregating a cohorts table, for example daily or monthly. Use the Granularity option to set Steep cohort definition to the same granularity. Adjusting the granularity will impact both the rows and columns of your cohorts table.

Please note

The rows in the cohorts tables (cohorts) will be determined by the cohorts’ granularity. The columns in the cohorts table (breakdown over time) will be determined by the overlap of time grains between the metric time grain and the cohorts’ granularity.

Setting up advanced retention cohorts

  1. Prepare a new cohorts table by combining data from other tables and pre-calculating cohorts values. You have full control over how to calculate your numbers; for example, you can make them cumulative or remove deleted accounts from the relevant periods, among other options.
  2. Define your metric in Steep by referencing columns in your prepared cohorts table.
  3. Define what granularity your prepared data is stored in.
Advanced retention cohort example

You could create an advanced retention cohort for orders by pre-aggregating numbers from the users and orders tables. This allows us to group user cohorts based on when the user was created, and also include users that haven’t yet made their first order.

We set it up as a ratio metric and refer to the appropriate columns in our newly created cohorts table. The breakdown over time would be 'active users' divided by 'created users'. By pre-aggregating the data, we have full control and can, for example, remove deleted users from the time they were deleted, or decide to calculate cumulative numbers.

In this example the granularity is monthly

Advanced cohorts table (example)

Table: monthly_customer_retention

customer_registration_dateorder_monthno_of_ordering_customersno_of_registered_customers
2024-01-012024-01-01116118
2024-01-012024-02-01103118
............
2024-02-012024-02-01107109

Advanced retention cohorts definition (example)

Define advanced retention metric

Explore the metric (example)

Explore advanced retention metric

Cohort only

Enable the Cohorts Only option for metrics that are only used for cohort analysis. An example of this could be a user retention metric. This option disables all other visualization types (line, bar, rank and pivot table) for the metric.

Data source support

Cohorts support in dbt cloud and Cube

Support for cohorts in metrics from a semantic layer integration is planned and will be released later this year.

Supported

Cohorts functionality is currently supported using any of the following data sources:

  • Athena
  • BigQuery
  • Databricks
  • PostgreSQL
  • Redshift
  • Snowflake

Not currently supported

The following data sources are currently not supported for defining cohorts:

  • MySQL
  • Microsoft SQL Server
  • Synapse SQL