Distributed Transactions: Part 1
In this series, we’ll look at two important parts of service-oriented architecture. First, we’ll learn about how micro-services talk to each other, looking at things like sync vs. async communication and choosing between orchestration and choreography. Then, we’ll discuss different ways to handle distributed transactions in micro-services.
Communication:
The common view is that micro-service communication is straightforward with an event-driven architecture. However, this decision depends on factors like the number of error conditions, atomic vs eventual consistency, orchestration vs choreography and so on. If there are many error conditions, leading to complex back-and-forth communication, an event-driven approach might not be idea however an orchestration could simply the workflow significantly.
Another significant issue with queues and event-driven architecture is their inability to manage load effectively (load shedding). In synchronous communication, services under heavy load can start rejecting requests, thus controlling the traffic. However, with queues, requests are not rejected but accumulate, leading to slowed services. This slowdown can have a cascading effect, potentially impairing the performance of downstream services as well.
Transactions
What about handling transactions in micro-services? How will you manage errors ? Will you use compensating transactions or rely on workflow states and customer support? Do you need atomic transactions, or is eventual consistency good enough? If your business needs atomic rollbacks, but you’re using event-driven architecture, consider how this affects you. Who will handle retries — domain services or a central orchestrator?
Also, consider how you’ll create a current snapshot of an entity. Is it okay to gather data from multiple services, or do you prefer it all in one place? If you want it in one place, does this mean you have to use orchestration instead of just choreography? These decisions impact how your system works and aligns with your business needs.
What are we building
We will develop a system for a town’s network of parking facilities. This system will manage parking sessions, including initiating and ending sessions, calculating parking fees, and processing payments.
Workflow:
- A vehicle owner starts a parking session upon entering a parking facility.
- The Session Management Service records the session details, including the spot number and entry time.
- When the session ends (the vehicle leaves), the Pricing Service calculates the fee based on the total time parked.
- The owner is charged when session is ended.
First Implemenation
As our first implementation, we will start simple. Simple here means, synchronous communication between services (REST api calls) and transaction as atomic and cordination as choreography.
We are going to develop backend services for this. There are primarily three service involved.
Services:
- Session Management Service: Manages the start and end of parking sessions, tracking which vehicle is using which parking spot.
- Pricing Service: Calculates the parking fee based on factors like duration of parking and the location of the parking spot.
- Payment Service: Handles payment transactions after the parking session, supporting basic payment methods like credit cards and digital wallets.
In our first version our system will have below features.
- Communication: Synchronous
- Consistency: Atomic
- Cordination: Choreography
So flow would look like below
Saga state machine
Let also look at the state transitions.
The session starts in the NOT_STARTED
state and transitions to ACTIVE
when it begins.
- An
ACTIVE
session transitions toENDED
upon completion. - An
ENDED
session can either becomePRICED
if pricing is successful or roll back toACTIVE
if pricing fails. - A
PRICED
session moves toPAID
upon successful payment or rolls back toACTIVE
if payment fails.
This model focuses on the core states of the session lifecycle, simplifying the handling of pricing and payment outcomes.
It’s generally beneficial to create a table for your system that outlines different states. Some states are central to your business domain, while others are part of saga or workflow management. In this architecture, we’re not using a Saga state machine; instead, we rely on rollbacks. Each service can rollback to its initial state, eliminating the need for state management. If you’re aiming for atomic transactions across all services, state is usually not a concern. This is because all transactions either execute successfully together, or they all rollback together.
In many situations, atomic distributed transactions aren’t the best approach. I suggest reading the blog “Starbucks Doesn’t Use Two-Phase Commit (2PC)”. If your system can handle being eventually consistent, you can achieve greater scalability and fault tolerance with saga states. Take the example mentioned earlier: your payment service could be eventually consistent, allowing for retries in processing payments. Using atomic constraints imposes strict limitations on the system’s flow, which can be a drawback in certain scenarios.
Show me the Code:
Here is the github repo for this implementation:
https://github.com/naveen-negi/PhoneTagSaga/tree/0380f664d215c6b139566261dac930859d79d542
Please go through the project readme to run it locally.
Project Setup:
Our docker-compose.yml
file orchestrates multiple services, essential for the seamless operation of our application. Let's break down each service and its function:
1. Sessions API (sessions-api
):
- Build Context: The Dockerfile located at
src/Sessions/Sessions.API
. It is accessible on port5056
2. Product Pricing API (productpricing-api
):
- Build Context: This service is built using the Dockerfile found in
src/ProductPricing/ProductPricing.API
.You can access it on port5055
3. Payments API (payments-api
):
- Build Context: Constructed from the Dockerfile in
src/Payments/Payments.API
.It’s available on port9011
.
4. Database Services:
- Multiple Instances: We have three PostgreSQL instances, namely
sessions-db
,payments-db
, andproductpricing-db
. These databases are accessible on ports5436
(sessions-db),5435
(payments-db), and5437
(productpricing-db).
5. Jaeger Tracing (jaeger
):
- Purpose: Jaeger is for tracing and monitoring services.Its UI is accessible on port
16686
, allowing for real-time monitoring and analysis.
In next post, we will change the consistency from atomic to eventual and Cordination from Choreography to Orchestration. We will keep the communication sync. This pattern will lead to quite low complexity but high coupling. Apparently, this is also a popular choice amongst architects.