Retry resilience strategy

About


The retry reactive resilience strategy re-executes the same callback method if its execution fails. Failure can be either an Exception or a result object indicating unsuccessful processing. Between the retry attempts the retry strategy waits a specified amount of time. You have fine-grained control over how to calculate the next delay. The retry strategy stops invoking the same callback when it reaches the maximum allowed number of retry attempts or an unhandled exception is thrown / result object indicating a failure is returned.

Usage

// Retry using the default options.
// See https://www.pollydocs.org/strategies/retry#defaults for defaults.
var optionsDefaults = new RetryStrategyOptions();

// For instant retries with no delay
var optionsNoDelay = new RetryStrategyOptions
{
    Delay = TimeSpan.Zero
};

// For advanced control over the retry behavior, including the number of attempts,
// delay between retries, and the types of exceptions to handle.
var optionsComplex = new RetryStrategyOptions
{
    ShouldHandle = new PredicateBuilder().Handle<SomeExceptionType>(),
    BackoffType = DelayBackoffType.Exponential,
    UseJitter = true,  // Adds a random factor to the delay
    MaxRetryAttempts = 4,
    Delay = TimeSpan.FromSeconds(3),
};

// To use a custom function to generate the delay for retries
var optionsDelayGenerator = new RetryStrategyOptions
{
    MaxRetryAttempts = 2,
    DelayGenerator = static args =>
    {
        var delay = args.AttemptNumber switch
        {
            0 => TimeSpan.Zero,
            1 => TimeSpan.FromSeconds(1),
            _ => TimeSpan.FromSeconds(5)
        };

        // This example uses a synchronous delay generator,
        // but the API also supports asynchronous implementations.
        return new ValueTask<TimeSpan?>(delay);
    }
};

// To extract the delay from the result object
var optionsExtractDelay = new RetryStrategyOptions<HttpResponseMessage>
{
    DelayGenerator = static args =>
    {
        if (args.Outcome.Result is HttpResponseMessage responseMessage &&
            TryGetDelay(responseMessage, out TimeSpan delay))
        {
            return new ValueTask<TimeSpan?>(delay);
        }

        // Returning null means the retry strategy will use its internal delay for this attempt.
        return new ValueTask<TimeSpan?>((TimeSpan?)null);
    }
};

// To get notifications when a retry is performed
var optionsOnRetry = new RetryStrategyOptions
{
    MaxRetryAttempts = 2,
    OnRetry = static args =>
    {
        Console.WriteLine("OnRetry, Attempt: {0}", args.AttemptNumber);

        // Event handlers can be asynchronous; here, we return an empty ValueTask.
        return default;
    }
};

// To keep retrying indefinitely or until success use int.MaxValue.
var optionsIndefiniteRetry = new RetryStrategyOptions
{
    MaxRetryAttempts = int.MaxValue,
};

// Add a retry strategy with a RetryStrategyOptions{<TResult>} instance to the pipeline
new ResiliencePipelineBuilder().AddRetry(optionsDefaults);
new ResiliencePipelineBuilder<HttpResponseMessage>().AddRetry(optionsExtractDelay);

Defaults

Property Default Value Description
ShouldHandle Any exceptions other than OperationCanceledException. Defines a predicate to determine what results and/or exceptions are handled by the retry strategy.
MaxRetryAttempts 3 The maximum number of retry attempts to use, in addition to the original call.
BackoffType Constant The back-off algorithm type to generate the delay(s) between retry attempts.
Delay 2 seconds The base delay between retry attempts. See the next section for more details.
MaxDelay null If provided then the strategy caps the calculated retry delay to this value.
UseJitter False If set to true, a jitter (random value) is added to retry delays. See the next section for more details.
DelayGenerator null This optional delegate allows you to dynamically calculate the retry delay by utilizing information that is only available at runtime (like the attempt number).
OnRetry null If provided then it will be invoked before the strategy delays the next attempt.

Telemetry

The retry strategy reports the following telemetry events:

Event Name Event Severity When?
ExecutionAttempt Information / Warning Just before the strategy calculates the next delay
OnRetry Warning Just before the strategy calls the OnRetry delegate

Here are some sample events:

Unhandled case

If the retry strategy does not perform any retries then the reported telemetry events' severity will be Information:

Execution attempt. Source: 'MyPipeline/MyPipelineInstance/MyRetryStrategy', Operation Key: 'MyRetryableOperation', Result: '1', Handled: 'False', Attempt: '0', Execution Time: 110.952ms

Execution attempt. Source: 'MyPipeline/MyPipelineInstance/MyRetryStrategy', Operation Key: 'MyRetryableOperation', Result: 'Failed', Handled: 'False', Attempt: '0', Execution Time: 5.2194ms
    System.Exception: Failed
        at Program.<>c.<Main>b__0_1(ResilienceContext ctx)
        ...
        at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__1_0>d.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.Async.cs:line 67

Handled case

If the retry strategy performs some retries then the reported telemetry events' severity will be Warning:

Execution attempt. Source: 'MyPipeline/MyPipelineInstance/MyRetryStrategy', Operation Key: 'MyRetryableOperation', Result: 'Failed', Handled: 'True', Attempt: '0', Execution Time: 5.0397ms
      System.Exception: Failed
         at Program.<>c.<Main>b__0_1(ResilienceContext ctx)
         ...
         at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__1_0>d.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.Async.cs:line 67

Resilience event occurred. EventName: 'OnRetry', Source: 'MyPipeline/MyPipelineInstance/MyRetryStrategy', Operation Key: 'MyRetryableOperation', Result: 'Failed'
    System.Exception: Failed
        at Program.<>c.<Main>b__0_1(ResilienceContext ctx)
        ...
        at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__1_0>d.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.Async.cs:line 67


Execution attempt. Source: 'MyPipeline/MyPipelineInstance/MyRetryStrategy', Operation Key: 'MyRetryableOperation', Result: 'Failed', Handled: 'True', Attempt: '1', Execution Time: 0.1159ms
      System.Exception: Failed
         at Program.<>c.<Main>b__0_1(ResilienceContext ctx)
         ...
         at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__1_0>d.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.Async.cs:line 67
Note

Please note that the OnRetry telemetry event will be reported only if the retry strategy performs any retry attempts.

On the other hand the Execution attempt event will be always reported regardless whether the strategy has to perform any retries.

Also remember that Attempt: '0' relates to the original execution attempt.

For further information please check out the telemetry page.

Calculation of the next delay

If the ShouldHandle predicate returns true and the next attempt number is not greater than MaxRetryAttempts then the retry strategy calculates the next delay.

There are many properties that may contribute to this calculation:

  • BackoffType: Specifies which calculation algorithm should run.
  • Delay: If only this property is specified then it will be used as-is. If others are also specified then this will be used as a base delay.
  • DelayGenerator: If specified, overrides other property-based calculations, except if it returns null or a negative TimeSpan, in which case the other property-based calculations are used.
  • MaxDelay: If specified, caps the delay if the calculated delay is greater than this value, except if DelayGenerator is used, where no capping is applied.
  • UseJitter: If enabled, adds a random value between -25% and +25% of the calculated Delay, except if BackoffType is Exponential, where a DecorrelatedJitterBackoffV2 formula is used for jitter calculation.
Important

The summarized description below is an implementation detail. It may change in the future without notice.

The BackoffType property's data type is the DelayBackoffType enumeration. This primarily controls how the calculation is done.

Constant

stateDiagram-v2

    state if_state_step1 <<choice>>
    state if_state_step2 <<choice>>
    state if_state_step3 <<choice>>

    constant: Delay
    constantWJitter: Delay + Random
    compare: MaxDelay < BaseDelay
    setBase: Set BaseDelay
    setNormalized: Set NormalizedDelay
    setNext: Set NextDelay

    UseJitter --> if_state_step1
    if_state_step1 --> constantWJitter:true
    if_state_step1 --> constant: false
    constantWJitter --> setBase
    constant --> setBase

    setBase --> compare
    compare --> if_state_step2
    if_state_step2 --> MaxDelay: true
    if_state_step2 --> BaseDelay: false
    MaxDelay --> setNormalized
    BaseDelay --> setNormalized

    setNormalized --> DelayGenerator
    DelayGenerator --> if_state_step3
    if_state_step3 --> GeneratedDelay: positive
    if_state_step3 --> NormalizedDelay: null or negative
    GeneratedDelay --> setNext
    NormalizedDelay --> setNext
    setNext --> [*]

Constant examples

The delays column contains an example series of five values to depict the patterns.

Settings Delays in milliseconds
Delay: 1sec [ 1000, 1000, 1000, 1000, 1000 ]
Delay: 1sec, UseJitter: true [ 986, 912, 842, 972, 1007 ]
Delay: 1sec, UseJitter: true, MaxDelay: 1100ms [ 1100, 978, 1100, 1041, 916 ]

Linear

stateDiagram-v2

    state if_state_step1 <<choice>>
    state if_state_step2 <<choice>>
    state if_state_step3 <<choice>>

    linear: Delay * AttemptNumber
    linearWJitter: (Delay * AttemptNumber) + Random
    compare: MaxDelay < BaseDelay
    setBase: Set BaseDelay
    setNormalized: Set NormalizedDelay
    setNext: Set NextDelay

    UseJitter --> if_state_step1
    if_state_step1 --> linearWJitter:true
    if_state_step1 --> linear: false
    linearWJitter --> setBase
    linear --> setBase

    setBase --> compare
    compare --> if_state_step2
    if_state_step2 --> MaxDelay: true
    if_state_step2 --> BaseDelay: false
    MaxDelay --> setNormalized
    BaseDelay --> setNormalized

    setNormalized --> DelayGenerator
    DelayGenerator --> if_state_step3
    if_state_step3 --> GeneratedDelay: positive
    if_state_step3 --> NormalizedDelay: null or negative
    GeneratedDelay --> setNext
    NormalizedDelay --> setNext
    setNext --> [*]

Linear examples

The delays column contains an example series of five values to depict the patterns.

Note

Because the jitter calculation is based on the newly calculated delay, the new delay could be less than the previous value.

Settings Delays in milliseconds
Delay: 1sec [ 1000, 2000, 3000, 4000, 5000 ]
Delay: 1sec, UseJitter: true [ 1129, 2147, 2334, 4894, 4102 ]
Delay: 1sec, UseJitter: true, MaxDelay: 4500ms [ 907, 2199, 2869, 4500, 4500 ]

Exponential

stateDiagram-v2

    state if_state_step1 <<choice>>
    state if_state_step2 <<choice>>
    state if_state_step3 <<choice>>

    exponential: Delay * 2^AttemptNumber
    exponentialWJitter: Decorrelated Jitter Backoff V2
    compare: MaxDelay < BaseDelay
    setBase: Set BaseDelay
    setNormalized: Set NormalizedDelay
    setNext: Set NextDelay

    UseJitter --> if_state_step1
    if_state_step1 --> exponentialWJitter:true
    if_state_step1 --> exponential: false
    exponentialWJitter --> setBase
    exponential --> setBase

    setBase --> compare
    compare --> if_state_step2
    if_state_step2 --> MaxDelay: true
    if_state_step2 --> BaseDelay: false
    MaxDelay --> setNormalized
    BaseDelay --> setNormalized

    setNormalized --> DelayGenerator
    DelayGenerator --> if_state_step3
    if_state_step3 --> GeneratedDelay: positive
    if_state_step3 --> NormalizedDelay: null or negative
    GeneratedDelay --> setNext
    NormalizedDelay --> setNext
    setNext --> [*]

Exponential examples

The delays column contains an example series of five values to depict the patterns.

Note

Because the jitter calculation is based on the newly calculated delay, the new delay could be less than the previous value.

Settings Delays in milliseconds
Delay: 1sec [ 1000, 2000, 4000, 8000, 16000 ]
Delay: 1sec, UseJitter: true [ 393, 1453, 4235, 5369, 16849 ]
Delay: 1sec, UseJitter: true, MaxDelay: 15000ms [ 477, 793, 2227, 5651, 15000 ]

Tip

For more details please check out the RetryHelper and the RetryResilienceStrategy classes.

Diagrams

Let's suppose we have a retry strategy with MaxRetryAttempts: 2.

Happy path sequence diagram

sequenceDiagram
    actor C as Caller
    participant P as Pipeline
    participant R as Retry
    participant D as DecoratedUserCallback

    C->>P: Calls ExecuteAsync
    P->>R: Calls ExecuteCore
    Note over R,D: Initial attempt
    R->>+D: Invokes
    D->>-R: Fails
    R-->>R: Sleeps
    Note over R,D: 1st retry attempt
    R->>+D: Invokes
    D->>-R: Returns result
    R->>P: Returns result
    P->>C: Returns result

Unhappy path sequence diagram

sequenceDiagram
    actor C as Caller
    participant P as Pipeline
    participant R as Retry
    participant D as DecoratedUserCallback

    C->>P: Calls ExecuteAsync
    P->>R: Calls ExecuteCore
    Note over R,D: Initial attempt
    R->>+D: Invokes
    D->>-R: Fails
    R-->>R: Sleeps
    Note over R,D: 1st retry attempt
    R->>+D: Invokes
    D->>-R: Fails
    R-->>R: Sleeps
    Note over R,D: 2nd retry attempt
    R->>+D: Invokes
    D->>-R: Fails
    R->>P: Propagates failure
    P->>C: Propagates failure

Patterns

Limiting the maximum delay

In some cases, you might want to set a limit on the calculated delay. This is beneficial when multiple retries are anticipated, and you wish to prevent excessive wait times between these retries.

Consider the following example of a long-running background job:

ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        Delay = TimeSpan.FromSeconds(2),
        MaxRetryAttempts = int.MaxValue,
        BackoffType = DelayBackoffType.Exponential,

        // Initially, we aim for an exponential backoff, but after a certain number of retries, we set a maximum delay of 15 minutes.
        MaxDelay = TimeSpan.FromMinutes(15),
        UseJitter = true
    })
    .Build();

// Background processing
while (!cancellationToken.IsCancellationRequested)
{
    await pipeline.ExecuteAsync(async token =>
    {
        // In the event of a prolonged service outage, we can afford to wait for a successful retry since this is a background task.
        await SynchronizeDataAsync(token);
    },
    cancellationToken);

    await Task.Delay(TimeSpan.FromMinutes(30)); // The sync runs every 30 minutes.
}

Anti-patterns

Over the years, many developers have used Polly in various ways. Some of these recurring patterns may not be ideal. The sections below highlight anti-patterns to avoid.

Overusing builder methods

❌ DON'T

Overuse Handle/HandleResult:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        ShouldHandle = new PredicateBuilder()
        .Handle<HttpRequestException>()
        .Handle<BrokenCircuitException>()
        .Handle<TimeoutRejectedException>()
        .Handle<SocketException>()
        .Handle<RateLimitRejectedException>(),
        MaxRetryAttempts = 3,
    })
    .Build();

Reasoning:

Using multiple Handle/HandleResult methods is redundant. Instead of specifying to retry if the decorated code throws a certain exception repeatedly, it's more efficient to state that retries should occur if any of the retryable exceptions are thrown.

✅ DO

Use collections and simple predicate functions:

ImmutableArray<Type> networkExceptions = new[]
{
    typeof(SocketException),
    typeof(HttpRequestException),
}.ToImmutableArray();

ImmutableArray<Type> strategyExceptions = new[]
{
    typeof(TimeoutRejectedException),
    typeof(BrokenCircuitException),
    typeof(RateLimitRejectedException),
}.ToImmutableArray();

ImmutableArray<Type> retryableExceptions = networkExceptions
    .Union(strategyExceptions)
    .ToImmutableArray();

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        ShouldHandle = ex => new ValueTask<bool>(retryableExceptions.Contains(ex.GetType())),
        MaxRetryAttempts = 3,
    })
    .Build();

Reasoning:

Grouping exceptions simplifies the configuration and improves reusability. For example, the networkExceptions array can be reused in various strategies such as retry, circuit breaker, and more.

Using retry for periodic execution

❌ DON'T

Use a retry strategy to run indefinitely at a specified interval:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        ShouldHandle = _ => ValueTask.FromResult(true),
        Delay = TimeSpan.FromHours(24),
    })
    .Build();

Reasoning:

The waiting period can be either blocking or non-blocking, based on the defined strategy/pipeline. Even when used not used in a blocking manner, it unnecessarily consumes memory that can't be reclaimed by the garbage collector.

✅ DO

Use a suitable tool to schedule recurring tasks, such as Quartz.Net, Hangfire, or others.

Reasoning:

  • Polly was not designed to support this scenario; its primary purpose is to help manage brief transient failures.
  • Specialized job scheduling tools are more memory-efficient and can be set up to withstand machine failures by using persistent storage.

Combining multiple sleep duration strategies

❌ DON'T

Mix increasing values with constant ones:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        DelayGenerator = args =>
        {
            var delay = args.AttemptNumber switch
            {
                <= 5 => TimeSpan.FromSeconds(Math.Pow(2, args.AttemptNumber)),
                _ => TimeSpan.FromMinutes(3)
            };
            return new ValueTask<TimeSpan?>(delay);
        }
    })
    .Build();

Reasoning:

Using this approach essentially turns the logic into a state machine. Although this offers a concise way to express sleep durations, it has several disadvantages:

  • It doesn't support reusability (for instance, you can't use only the quick retries).
  • The sleep duration logic is closely tied to the AttemptNumber.
  • Testing becomes more challenging.

✅ DO

Use two distinct retry strategy options and combine them:

var slowRetries = new RetryStrategyOptions
{
    MaxRetryAttempts = 5,
    Delay = TimeSpan.FromMinutes(3),
    BackoffType = DelayBackoffType.Constant
};

var quickRetries = new RetryStrategyOptions
{
    MaxRetryAttempts = 5,
    Delay = TimeSpan.FromSeconds(1),
    UseJitter = true,
    BackoffType = DelayBackoffType.Exponential
};

var retry = new ResiliencePipelineBuilder()
    .AddRetry(slowRetries)
    .AddRetry(quickRetries)
    .Build();

Reasoning:

  • While this method may appear more verbose than the first, it offers greater flexibility.
  • Retry strategies can be arranged in any order (either slower first and then quicker, or the other way around).
  • Different triggers can be defined for the retry strategies, allowing for switches between them based on exceptions or results. The order isn't fixed, so quick and slow retries can alternate.

Branching retry logic based on request URL

Suppose you have an HttpClient and you want to add a retry only for specific endpoints.

❌ DON'T

Use ResiliencePipeline.Empty and the ?: operator:

var retry =
    IsRetryable(request.RequestUri)
        ? new ResiliencePipelineBuilder<HttpResponseMessage>().AddRetry(new()).Build()
        : ResiliencePipeline<HttpResponseMessage>.Empty;

Reasoning:

The triggering conditions and logic are spread across different sections. This design is not ideal for extensibility since adding more conditions can make the code less readable.

✅ DO

Use the ShouldHandle clause to define the triggering logic:

var retry = new ResiliencePipelineBuilder<HttpResponseMessage>()
    .AddRetry(new()
    {
        ShouldHandle = _ => ValueTask.FromResult(IsRetryable(request.RequestUri))
    })
    .Build();

Reasoning:

  • The conditions for triggering are consolidated in a familiar and easily accessible location.
  • You don't need to specify actions for scenarios when the strategy shouldn't be triggered.

Calling a method before/after each retry attempt

❌ DON'T

Call a specific method before Execute/ExecuteAsync:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        OnRetry = args =>
        {
            BeforeEachAttempt();
            return ValueTask.CompletedTask;
        },
    })
    .Build();

BeforeEachAttempt();
await retry.ExecuteAsync(DoSomething);

Reasoning:

  • The OnRetry function is triggered before each retry attempt, but it doesn't activate before the initial attempt since it's not considered a retry.
  • Using this method across various parts can lead to accidentally omitting the BeforeEachAttempt call before every Execute.
  • Even though the naming here is straightforward, in real-world scenarios, your method might not start with 'Before', leading to potential misuse by calling it after the Execute.

✅ DO

Group the two method calls:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new())
    .Build();

await retry.ExecuteAsync(ct =>
{
    BeforeEachAttempt();
    return DoSomething(ct);
});

Reasoning:

If DoSomething and BeforeEachAttempt are interdependent, group them or declare a simple wrapper to invoke them in the correct sequence.

Having a single strategy for multiple failures

Suppose we have an HttpClient that issues a request and then we try to parse a large JSON response.

❌ DON'T

Use a single strategy for everything:

var builder = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        ShouldHandle = new PredicateBuilder().Handle<HttpRequestException>(),
        MaxRetryAttempts = 3
    });

builder.AddTimeout(TimeSpan.FromMinutes(1));

var pipeline = builder.Build();
await pipeline.ExecuteAsync(static async (httpClient, ct) =>
{
    var stream = await httpClient.GetStreamAsync(new Uri("endpoint"), ct);
    var foo = await JsonSerializer.DeserializeAsync<Foo>(stream, cancellationToken: ct);
},
httpClient);

Reasoning:

Previously, it was suggested that you should combine X and Y only if they are part of the same failure domain. In simpler terms, a pipeline should address only one type of failure.

✅ DO

Define a strategy for each failure domain:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        ShouldHandle = new PredicateBuilder().Handle<HttpRequestException>(),
        MaxRetryAttempts = 3
    })
    .Build();

var stream = await retry.ExecuteAsync(
    static async (httpClient, ct) =>
        await httpClient.GetStreamAsync(new Uri("endpoint"), ct),
    httpClient);

var timeout = new ResiliencePipelineBuilder<Foo>()
    .AddTimeout(TimeSpan.FromMinutes(1))
    .Build();

var foo = await timeout.ExecuteAsync((ct) => JsonSerializer.DeserializeAsync<Foo>(stream, cancellationToken: ct));

Reasoning:

The failure domain of a network call is different from that of deserialization. Using dedicated strategies makes the application more resilient to various transient failures.

Cancelling retry for specific exceptions

If you encounter a TimeoutException, you may not want to retry the operation.

❌ DON'T

Embed cancellation logic within OnRetry:

var ctsKey = new ResiliencePropertyKey<CancellationTokenSource>("cts");
var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        OnRetry = async args =>
        {
            if (args.Outcome.Exception is TimeoutException)
            {
                if (args.Context.Properties.TryGetValue(ctsKey, out var cts))
                {
                    await cts.CancelAsync();
                }
            }
        }
    })
    .Build();

Reasoning:

Conditions for triggering retries should be located in ShouldHandle. Bypassing the strategy from within a user-defined delegate—either through an Exception or a CancellationToken—unnecessarily complicates the control flow.

✅ DO

Set the condition for retry within ShouldHandle:

var retry = new ResiliencePipelineBuilder()
    .AddRetry(new()
    {
        ShouldHandle = args => ValueTask.FromResult(args.Outcome.Exception is not TimeoutException)
    })
    .Build();

Reasoning:

As previously mentioned, always use the designated area to define retry conditions. Re-frame your original exit conditions to specify when a retry should be initiated.