Skip to content
Art2link ESB v2.02 LTS HomeDocumentationBlogContact
Patterns/Resilience & operational/Retry

Retry

Most delivery failures are temporary — a timeout, a brief outage, a throttle. A send port handles them by retrying at a fixed interval: the same short pause between attempts, giving the endpoint room to recover instead of hammering it.

This is port-owned resilience at its simplest. You configure a retry count and the interval to wait between attempts; the port does the rest. The pause matters because retrying instantly against a struggling service just piles on load — a steady gap gives a transient problem room to clear. Art2link waits the same fixed interval between attempts; it does not widen the gap as it goes.

try · 0sfail +5sfail +5sfail +5ssuccess → ack the same interval between attempts — on reaching the limit, the message is dead-lettered

Retry is the first line of defence; it does not stand alone. When failures persist past the limit, the message is dead-lettered; when a whole endpoint is clearly down, a circuit breaker stops wasting attempts; and because a retried delivery can double-apply, receivers should be idempotent.

In Art2link the first line of defence is adapter-level. Most send adapters — the SQL Caller, for instance — expose two settings: Retries, how many times to re-attempt a failed call, and Retry interval, how long to wait between attempts. Set them on the port’s adapter and the runtime re-tries the delivery for you — configuration, not code.

When the retries are exhausted and the adapter gives up, a more managed recovery takes over if Exception is enabled on the port: the message is re-published under the configured exception message type, payload intact. From there it can be routed and parked for a later, longer-horizon attempt — store-and-forward — or handled some other way, rather than simply failing. So the per-attempt retries live on the adapter; the escalation and dead-letter policy live in the exception path.

Separate transient from permanent. Retrying a timeout is sensible; retrying a “400 bad request” just delays the inevitable. Where you can tell them apart, send permanent failures straight to dead-letter instead of burning the retry budget.