Distributed Transactions: Part 1

5 min readDec 1, 2023

In this series, we’ll look at two important parts of service-oriented architecture. First, we’ll learn about how micro-services talk to each other, looking at things like sync vs. async communication and choosing between orchestration and choreography. Then, we’ll discuss different ways to handle distributed transactions in micro-services.

Communication:

The common view is that micro-service communication is straightforward with an event-driven architecture. However, this decision depends on factors like the number of error conditions, atomic vs eventual consistency, orchestration vs choreography and so on. If there are many error conditions, leading to complex back-and-forth communication, an event-driven approach might not be idea however an orchestration could simply the workflow significantly.

Another significant issue with queues and event-driven architecture is their inability to manage load effectively (load shedding). In synchronous communication, services under heavy load can start rejecting requests, thus controlling the traffic. However, with queues, requests are not rejected but accumulate, leading to slowed services. This slowdown can have a cascading effect, potentially impairing the performance of downstream services as well.

Transactions

What about handling transactions in micro-services? How will you manage errors ? Will you use compensating transactions or rely on workflow states and customer support? Do you need atomic transactions, or is eventual consistency good enough? If your business needs atomic rollbacks, but you’re using event-driven architecture, consider how this affects you. Who will handle retries — domain services or a central orchestrator?

Also, consider how you’ll create a current snapshot of an entity. Is it okay to gather data from multiple services, or do you prefer it all in one place? If you want it in one place, does this mean you have to use orchestration instead of just choreography? These decisions impact how your system works and aligns with your business needs.

What are we building

We will develop a system for a town’s network of parking facilities. This system will manage parking sessions, including initiating and ending sessions, calculating parking fees, and processing payments.

Workflow:

A vehicle owner starts a parking session upon entering a parking facility.
The Session Management Service records the session details, including the spot number and entry time.
When the session ends (the vehicle leaves), the Pricing Service calculates the fee based on the total time parked.
The owner is charged when session is ended.

First Implemenation

As our first implementation, we will start simple. Simple here means, synchronous communication between services (REST api calls) and transaction as atomic and cordination as choreography.

We are going to develop backend services for this. There are primarily three service involved.

Services:

Session Management Service: Manages the start and end of parking sessions, tracking which vehicle is using which parking spot.
Pricing Service: Calculates the parking fee based on factors like duration of parking and the location of the parking spot.
Payment Service: Handles payment transactions after the parking session, supporting basic payment methods like credit cards and digital wallets.

In our first version our system will have below features.

Communication: Synchronous
Consistency: Atomic
Cordination: Choreography

So flow would look like below

Saga state machine

Let also look at the state transitions.

The session starts in the NOT_STARTED state and transitions to ACTIVE when it begins.

An ACTIVE session transitions to ENDED upon completion.
An ENDED session can either become PRICED if pricing is successful or roll back to ACTIVE if pricing fails.
A PRICED session moves to PAID upon successful payment or rolls back to ACTIVE if payment fails.

This model focuses on the core states of the session lifecycle, simplifying the handling of pricing and payment outcomes.

It’s generally beneficial to create a table for your system that outlines different states. Some states are central to your business domain, while others are part of saga or workflow management. In this architecture, we’re not using a Saga state machine; instead, we rely on rollbacks. Each service can rollback to its initial state, eliminating the need for state management. If you’re aiming for atomic transactions across all services, state is usually not a concern. This is because all transactions either execute successfully together, or they all rollback together.

In many situations, atomic distributed transactions aren’t the best approach. I suggest reading the blog “Starbucks Doesn’t Use Two-Phase Commit (2PC)”. If your system can handle being eventually consistent, you can achieve greater scalability and fault tolerance with saga states. Take the example mentioned earlier: your payment service could be eventually consistent, allowing for retries in processing payments. Using atomic constraints imposes strict limitations on the system’s flow, which can be a drawback in certain scenarios.

Show me the Code:

Here is the github repo for this implementation:
https://github.com/naveen-negi/PhoneTagSaga/tree/0380f664d215c6b139566261dac930859d79d542

Please go through the project readme to run it locally.

Project Setup:

Our docker-compose.yml file orchestrates multiple services, essential for the seamless operation of our application. Let's break down each service and its function:

1. Sessions API (`sessions-api`):

Build Context: The Dockerfile located at src/Sessions/Sessions.API . It is accessible on port 5056

2. Product Pricing API (`productpricing-api`):

Build Context: This service is built using the Dockerfile found in src/ProductPricing/ProductPricing.API.You can access it on port 5055

3. Payments API (`payments-api`):

Build Context: Constructed from the Dockerfile in src/Payments/Payments.API.It’s available on port 9011.

4. Database Services:

Multiple Instances: We have three PostgreSQL instances, namely sessions-db, payments-db, and productpricing-db. These databases are accessible on ports 5436 (sessions-db), 5435 (payments-db), and 5437 (productpricing-db).

5. Jaeger Tracing (`jaeger`):

Purpose: Jaeger is for tracing and monitoring services.Its UI is accessible on port 16686, allowing for real-time monitoring and analysis.

In next post, we will change the consistency from atomic to eventual and Cordination from Choreography to Orchestration. We will keep the communication sync. This pattern will lead to quite low complexity but high coupling. Apparently, this is also a popular choice amongst architects.