Building responsive, scalable and fault tolerant microservices: An unconventional approach with replicated cache

7 min readJan 10, 2024

Most developers primarily view caching as a means to enhance performance. Yet, caching can also serve as a potent method for accessing and sharing data in distributed systems. This approach utilizes replicated in-memory caching, ensuring that data required by various services is readily accessible to each, eliminating the need for explicit requests. Unlike other caching models, a replicated in-memory cache stores data directly within each service. It continuously synchronizes this data, guaranteeing that every service has consistent and up-to-date information at all times.

Replicated cache mode is ideal for scenarios where cache reads are a lot more frequent than cache writes, and data sets are small. If your system does cache lookups over 80% of the time, then you should consider using the REPLICATED cache mode.

Options for Replicated cache

It is important to note that all caching product doesn’t support in-memory replicated cache. In process of building my solution. I evaluated below options
1. Hazelcast: It offers robust support for replicated in-memory caching. In Hazelcast, data is evenly distributed across the cluster, with each node holding a portion of the data.

2. Apache Ignite: Apache Ignite supports various caching modes, including a replicated cache. In this mode, all nodes in the cluster hold the same data, ensuring its availability to all services. Apache Ignite is particularly powerful due to its ability to process large volumes of data with in-memory speed.

3. NCache: NCache also supports a replicated cache topology where data is replicated across multiple nodes in the cluster.

4. Redis: Redis can be configured for replicated caching using Redis replication. While Redis is often used as a primary-replica model, you can set up multiple replicas that keep in sync with the primary.

5. Microsoft Orleans: Orleans uses a slightly different approach, focusing more on stateful grains (microservices). However, it can be configured to use external caching solutions like Redis or its own in-built grain state storage for replication.

6. Memcached: Memcached also supports replication, but it’s more basic compared to other solutions. It replicates data across multiple nodes to ensure availability, but it might not be as robust as Hazelcast or Apache Ignite in terms of data consistency and synchronization features.

I decided to go with Apache Ignite since NCache didn’t have docker image that I could run locally (on mac), Redis didn’t provide in-memory replicated cache, Hazelcast’s dotnet version didn’t provide embeded/in-memory support.

Calculating Cache size: How much is too much ?

Let’s calculate cache size for below object for 5 million records.

public class ParkingSession
{
    public string SessionId { get; set; }
    public string VehicleNumber { get; set; }
    public int SpotNumber { get; set; }
    public DateTime EntryTime { get; set; }
    public DateTime? ExitTime { get; set; }
    public double ParkingFee { get; set; }

}

string SessionId: Strings in .NET are UTF-16, meaning each character takes 2 bytes. Assuming an average GUID length (36 characters), it would be approximately 36 characters * 2 bytes/character = 72 bytes.
string VehicleNumber: This varies based on the format. Assuming a typical vehicle number is around 10 characters, it would be 10 characters * 2 bytes/character = 20 bytes.
int SpotNumber: An integer in .NET is 4 bytes.
DateTime EntryTime and DateTime? ExitTime: A DateTime in .NET is 8 bytes. Nullable DateTime (DateTime?) might add a little overhead for storing the null state, but it's usually negligible. So, roughly 8 bytes + 8 bytes = 16 bytes.
double ParkingFee: A double in .NET is 8 bytes.
Object Overhead: Every .NET object has an overhead (for system type information, sync block index, etc.). This is typically 16–24 bytes per object.

Adding these up: 72 + 20 + 4 + 16 + 8 + 24 (assuming higher overhead) = 144 bytes per ParkingSession object.

For 5 million objects: 5,000,000 objects * 144 bytes/object ≈ 720,000,000 bytes ≈ 687.5 MB.

What are we building

We will develop a system for a town’s network of parking facilities. This system will manage parking sessions, including initiating and ending sessions, calculating parking fees, and processing payments.

Workflow:

A vehicle owner starts a parking session upon entering a parking facility.
The Session Management Service records the session details, including the spot number and entry time.
When the session ends (the vehicle leaves), the Pricing Service calculates the fee based on the total time parked.
The owner is charged when session is ended.
All of the above is being Orchestrated by session Orchestration service

Synchronising user data

We will be using in-memory replicated cache to synchronize user data across services.

In our parking-system, when user enters a parking location, an event is triggered by Camera which informs backend about the vehicle (e.g. license plate, entry time and so on).

Job of replicated cache in our case is to help session service to resolve license plate to user mapping. Since, camera payload only contains license plate, we need to figure who is the user.

For this reason, User service will own the cache and write to cache and session service will read from this cache. This will preserve the data ownership.

With this approach session-service no longer need to make call to user service. All the data is in-memory, within the session service. When updates are made to users bt the users service, Apache ignite will update the cache in the Session service to make the data consistent.

This caching strategy enhances the session service’s responsiveness, fault tolerance, and scalability. Eliminating direct inter-service communication means data is immediately accessible in-memory, ensuring the quickest data access.

Fault tolerance is robustly addressed. If the User service becomes unavailable, the session service continues operating seamlessly. Upon the User service’s restoration, caches reconnect without interrupting the session service. Furthermore, this approach allows the session service to scale independently from the User service. Basically,

Disadvantages

Conversely, this approach requires the User service, as the cache owner and data provider, to be operational during the initial startup of the session service.

Another consideration is the volume of data. If it surpasses a certain threshold, like 500 MB, the practicality of this pattern significantly decreases.

A third trade-off involves data synchronization. In a replicated caching model, keeping data fully in sync between services becomes challenging if the rate of data updates is excessively high.

Implementation:

I am using a docker compose to setup replicated cache across all services. Apart from API port, services using cache (sessions, users and product pricing) also exposes ports for Cache replication (for example, session-service exposes “47700:10800”).

version: '3.9'

services:
  sessions-api:
    build:
      context: .
      dockerfile: src/Sessions/Sessions.API/Dockerfile
    ports:
      - "5056:5290"
      - "47700:10800"
    volumes:
      - ./src/Sessions/Sessions.API:/build/src/Sessions/Sessions.API
    environment:
      ConnectionStrings__SessionsDB: "Host=sessions-db; Database=sessions_db; Port=5432; Username=sessions_dev; Password=password"
      ProductPricingService__baseUrl: "http://productpricing-api:5260"
      SessionOrchestrator__baseUrl: "http://orchestrator-api:5118"
      
 .......

Another important thing to note is that Apache ignite is a java project (I couldn’t find find any Dotnet implementation which had in-memory replicated cache and also provided a docker-image. NCache provides in-memory cache but it didn’t have docker image that could run on Mac). So, I need to install java in the docker container as below.

# Use the Microsoft's official .NET Core image.
FROM mcr.microsoft.com/dotnet/aspnet:7.0 AS base
WORKDIR /app
EXPOSE 80

# Use SDK image to build the project
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
WORKDIR /build

RUN apt-get update && \
    apt-get install -y openjdk-11-jdk && \
    rm -rf /var/lib/apt/lists/* \
    && apt-get clean 

# Set JAVA_HOME environment variable
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
ENV PATH="${JAVA_HOME}/bin:${PATH}"

# Copy the solution-level files
COPY ["Directory.Build.props", "./"]
COPY ["Directory.Packages.props", "./"]

# Copy csproj and restore as distinct layers
COPY ["src/Sessions/Sessions.API/Sessions.API.csproj", "src/Sessions/Sessions.API/"]
RUN dotnet restore "src/Sessions/Sessions.API/Sessions.API.csproj"

# Copy the project files and build
COPY ["src/Sessions/Sessions.API/", "src/Sessions/Sessions.API/"]
WORKDIR "/build/src/Sessions/Sessions.API"
ENV ASPNETCORE_ENVIRONMENT=Development
ENV DOTNET_WATCH_SUPPRESS_PROMPTS=1

ENTRYPOINT ["dotnet", "watch", "run", "--project", "Sessions.API.csproj"]

To make sure that replication across services works, I need to create a cluster as shown when service starts (Notice the Endpoints configuration below).

var ignite = Ignition.Start(new IgniteConfiguration
{
    CacheConfiguration = new[] 
    {
        new CacheConfiguration
        {
            Name = "ReplicatedCache",
            CacheMode = CacheMode.Replicated,
        }
    },
    DiscoverySpi = new TcpDiscoverySpi
    {
        IpFinder = new TcpDiscoveryStaticIpFinder
        {
            Endpoints = new[] { "users-api:47500", "productpricing-api:47600", "sessions-api:47700"}
        }
    }
});

builder.Services.AddSingleton(ignite);

With above configuration in each service, now cache will be replicated across cluster.

Putting the caveat again

This mode is ideal for scenarios where cache reads are a lot more frequent than cache writes, and data sets are small. If your system does cache lookups over 80% of the time, then you should consider using the REPLICATED cache mode.

You can find the full CODE here.