Learn how Neon compares to Aurora Serverless v2 - TL;DR: faster cold starts, responsive autoscaling, 80% lower costs

Postgres lead() window function

Use lead() to access values from subsequent rows in a result set

The lead() function in Postgres is a window function that allows you to access values from subsequent rows in a result set without the need for a self-join.

It's useful for comparing values between the current row and a later row, for example, when calculating the time until the next event, determining the next event in a sequence, or analyzing trends in time series data.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

Function signature

The lead() function has the following forms:

lead(value any [, offset integer [, default any ]]) over (...)
  • value: The value to return from the subsequent row. This can be a column, expression, or subquery.
  • offset (optional): The number of rows ahead of the current row to retrieve the value from. If omitted, it defaults to 1. Must be a non-negative integer.
  • default (optional): The value to return when the offset goes beyond the scope of the window. If omitted, it defaults to null.
  • over (...): The OVER clause defines the window frame for the function. It can be an empty OVER (), or it can include a PARTITION BY and/or ORDER BY clause.

Example usage

Consider a table shipments that contains information about product shipments. We can use lead() to determine the next scheduled shipment date for each product.

WITH shipments AS (
  SELECT 1 AS product_id, date '2023-01-01' AS ship_date
  UNION ALL
  SELECT 1 AS product_id, date '2023-01-15' AS ship_date
  UNION ALL
  SELECT 2 AS product_id, date '2023-01-05' AS ship_date
  UNION ALL
  SELECT 1 AS product_id, date '2023-02-01' AS ship_date
  UNION ALL
  SELECT 2 AS product_id, date '2023-01-20' AS ship_date
)
SELECT
  product_id,
  ship_date,
  lead(ship_date) OVER (PARTITION BY product_id ORDER BY ship_date) AS next_ship_date,
  lead(ship_date) OVER (PARTITION BY product_id ORDER BY ship_date) - ship_date AS days_until_next_shipment
FROM shipments;

This query calculates the next shipment date (next_ship_date) and the number of days until the next shipment (days_until_next_shipment) for each product. The OVER clause partitions the data by product_id and orders it by ship_date within each partition.

product_id | ship_date  | next_ship_date | days_until_next_shipment
------------+------------+----------------+--------------------------
          1 | 2023-01-01 | 2023-01-15     |                       14
          1 | 2023-01-15 | 2023-02-01     |                       17
          1 | 2023-02-01 |                |
          2 | 2023-01-05 | 2023-01-20     |                       15
          2 | 2023-01-20 |                |
(5 rows)

You can also use lead() to access values from rows further ahead by specifying an offset. For example, to compute the net return on investment for a stock ticker over each 2-year period:

WITH stock_prices AS (
  SELECT 'AAPL' AS ticker, date '2018-01-01' AS price_date, 41.54 AS price
  UNION ALL
  SELECT 'AAPL' AS ticker, date '2019-01-01' AS price_date, 39.48 AS price
  UNION ALL
  SELECT 'AAPL' AS ticker, date '2020-01-01' AS price_date, 74.60 AS price
  UNION ALL
  SELECT 'AAPL' AS ticker, date '2021-01-01' AS price_date, 131.96 AS price
  UNION ALL
  SELECT 'AAPL' AS ticker, date '2022-01-01' AS price_date, 182.01 AS price
  UNION ALL
  SELECT 'AAPL' AS ticker, date '2023-01-01' AS price_date, 129.93 AS price
)
SELECT
  ticker,
  price_date,
  price,
  lead(price, 2) OVER (PARTITION BY ticker ORDER BY price_date) AS price_2_years_later,
  round(100.0 * (lead(price, 2) OVER (PARTITION BY ticker ORDER BY price_date) - price) / price, 2) AS two_year_return_pct
FROM stock_prices;

This query calculates the price of each stock ticker 2 years later (price_2_years_later) and the percentage return on investment (two_year_return_pct) for each ticker. The OVER clause partitions the data by ticker and orders it by price_date within each partition.

ticker | price_date | price  | price_2_years_later | two_year_return_pct
--------+------------+--------+---------------------+---------------------
 AAPL   | 2018-01-01 |  41.54 |               74.60 |               79.59
 AAPL   | 2019-01-01 |  39.48 |              131.96 |              234.25
 AAPL   | 2020-01-01 |  74.60 |              182.01 |              143.98
 AAPL   | 2021-01-01 | 131.96 |              129.93 |               -1.54
 AAPL   | 2022-01-01 | 182.01 |                     |
 AAPL   | 2023-01-01 | 129.93 |                     |
 (6 rows)

Advanced examples

Using lead() with a default value

When the offset in lead() goes beyond the end of the window frame, it returns null by default. You can specify a default value to use instead, so the resulting column does not contain nulls.

WITH tasks AS (
  SELECT 1 AS project_id, 1 AS task_id, date '2023-01-01' AS start_date, date '2023-01-05' AS end_date
  UNION ALL
  SELECT 1 AS project_id, 2 AS task_id, date '2023-01-07' AS start_date, date '2023-01-10' AS end_date
  UNION ALL
  SELECT 1 AS project_id, 3 AS task_id, date '2023-01-10' AS start_date, date '2023-01-15' AS end_date
  UNION ALL
  SELECT 2 AS project_id, 1 AS task_id, date '2023-01-01' AS start_date, date '2023-01-10' AS end_date
  UNION ALL
  SELECT 2 AS project_id, 2 AS task_id, date '2023-01-11' AS start_date, date '2023-01-20' AS end_date
)
SELECT
  project_id,
  task_id,
  start_date,
  end_date,
  lead(start_date, 1, end_date) OVER (PARTITION BY project_id ORDER BY start_date) AS next_start_date
FROM tasks;

This query determines the start date of the next task in each project. For the last task in each project, where there is no next start date, it uses the current task's end date as the default value.

project_id | task_id | start_date |  end_date  | next_start_date
------------+---------+------------+------------+-----------------
          1 |       1 | 2023-01-01 | 2023-01-05 | 2023-01-07
          1 |       2 | 2023-01-07 | 2023-01-10 | 2023-01-10
          1 |       3 | 2023-01-10 | 2023-01-15 | 2023-01-15
          2 |       1 | 2023-01-01 | 2023-01-10 | 2023-01-11
          2 |       2 | 2023-01-11 | 2023-01-20 | 2023-01-20
(5 rows)

Using lead() with multiple partitions

You can use lead() with multiple partitions to perform calculations within different groups of rows simultaneously.

WITH readings AS (
  SELECT 1 AS device_id, date '2023-01-01' AS reading_date, 25.5 AS temperature
  UNION ALL
  SELECT 1 AS device_id, date '2023-01-02' AS reading_date, 26.0 AS temperature
  UNION ALL
  SELECT 2 AS device_id, date '2023-01-01' AS reading_date, 22.1 AS temperature
  UNION ALL
  SELECT 1 AS device_id, date '2023-01-03' AS reading_date, 25.8 AS temperature
  UNION ALL
  SELECT 2 AS device_id, date '2023-01-02' AS reading_date, 21.9 AS temperature
)
SELECT
  device_id,
  reading_date,
  temperature,
  lead(temperature) OVER (PARTITION BY device_id ORDER BY reading_date) AS next_temperature,
  lead(temperature) OVER (PARTITION BY device_id ORDER BY reading_date) - temperature AS temperature_change
FROM readings;

This query calculates the next temperature reading (next_temperature) and the change in temperature (temperature_change) for each device. The OVER clause partitions the data by device_id and orders it by reading_date within each partition, allowing the analysis to be performed separately for each device.

device_id | reading_date | temperature | next_temperature | temperature_change
-----------+--------------+-------------+------------------+--------------------
         1 | 2023-01-01   |        25.5 |             26.0 |                0.5
         1 | 2023-01-02   |        26.0 |             25.8 |               -0.2
         1 | 2023-01-03   |        25.8 |                  |
         2 | 2023-01-01   |        22.1 |             21.9 |               -0.2
         2 | 2023-01-02   |        21.9 |                  |
(5 rows)

Additional considerations

Correctness

The lead() function relates each row in the result set to a subsequent row in the same window frame. If the window frame is not explicitly defined, the default frame is the entire partition or result set. Make sure to specify the correct ORDER BY and PARTITION BY clauses to ensure the desired behavior.

Performance implications

Window functions like lead() perform calculations across a set of rows defined by the OVER clause. This can be computationally expensive, especially for large datasets or complex window definitions.

To optimize performance, make sure to:

  • Include an ORDER BY clause in the OVER clause to avoid sorting the entire dataset.
  • Use partitioning (PARTITION BY) to divide the data into smaller chunks when possible.
  • Create appropriate indexes on the columns used in the OVER clause.

Alternative functions

  • lag - Access values from previous rows in a result set. Similar to lead() but looks behind in the partition instead of ahead.
  • first_value() - Get the first value within a window frame.
  • last_value() - Get the last value within a window frame.

Resources

Last updated on

Was this page helpful?