blank

The Case Against Dependency Injection

2025-06-11T00:00:00+00:00

I first met with Dependency Injection when I on-boarded myself on a large backend project that used Scala and Play framework. Over time, I have convinced myself that dependency injection is a good way of managing dependencies, but recently, I have come to the conclusion most of the time, it hurts more than it helps.

1 - Interfaces, Objects and Classes {#1interfaces-objects-and-classes}

One argument Dependency Injection frameworks give is how your implementation is decoupled from the interface. I would like to ask you, how many times your interfaces had multiple implementations? Moreover, which one of those you wanted to abstract out the implementation being passed? I have seen many occasions where developers created interfaces pre-emptively that didn't provide any value because the standard way of doing things is "Dependency Injection" and they create interfaces to decouple the implementation. Hand on heart, did that interface achieve anything real, or is it just a pattern we have been following without thinking much, because we have been advertised this framework allows us to "separate concerns" by allowing us to inject interfaces?

interface IBookRepository {
  fun getBooks(): List<Book>
}

class BookRepository : IBookRepository {
  override fun getBooks() = Books.selectAll().toList()
}

Do you really need the interface IBookRepository when you only have one datasource that holds up books? Even if you had multiple sources, why would you inject different types of implementation in your code? One possibility is choosing different implementation for local, testing and production environments, however I think it just makes testing less effective, as you have different behavior in different environments now.

Let's remember, objects are also still a cool alternative to @Singleton injection.

object BookRepository {
  fun getBooks() = Books.selectAll().toList()
}

object BookController {
  fun getBooks() = BookRepository.getBooks()
}

There isn't a clear reason to me on why this is less acceptable than the dependency injected implementation. Moreover, dependency injection spreads like a plauge, because you can no longer access the BookRepository instance easily from a class / object that is not created via the dependency injection framework. So anything that depends on something dependency injected, needs to be dependency injected itself.

2 - Testing {#2testing}

Testing is not easier if you have constructor with bunch of unrelated dependencies. Some argue seeing dependencies explicitly enforces you to not miss them while writing tests and have fully intended behavior. I don't see it, on contrary I would argue they shift the focus away from the thing that is actually being tested. You write bunch of boilerplate things that you did not really need to test a method that only depends on a single dependency, yet so still initialized the class with 10+ dependencies.

class BookController(
  authenticator: Authenticatior,
  bookRepository: BookRepository,
  userRepository: UserRepository,
  libraryRepository: LibraryRepository,
) {
  fun getPublicBooks() = bookRepository.getPublicBooks()

  ...
}

// While testing

class BookControllerTest {

  @Test
  fun `should get public books`() {

    val mockBookRepository = mockk<BookRepository>()

    // Initialization gets longer and longer over time
    val sut = BookController(
        authenticator = mockk(),
        bookRepository = mockBookRepository,
        userRepository = mockk(),
        libraryRepository = mockk(),
    )

    every { mockBookRepository.getPublickBooks() } returns listOfBooks

    sut.getPublicBooks() should be listOfBooks
  }
}

If we are talking about the Controllers or Services, their responsibilities grow over time quickly. Therefore their constructor bloats and causes developers to juggle bunch of test code to make it work. One way you can get away is using property injection rather than constructor. Therefore I prefer using property injection more than constructor injection, especially for those complicated classes with multiple responsibilities (yes I think it is perfectly normal to have them in real life). However the alternative, the mocking libraries can handle testing aspect pretty well, if your language supports it.

class BookControllerTest {

  @Test
  fun `should get public books`() {
    mockkObject(BookRepository)

    every { BookRepository.getPublickBooks() } returns listOfBooks

    sut.getPublicBooks() should be listOfBooks
  }
}

I am not sure why this is a lot worse than spinning the DI framework or initializing classes by constructor during test run. Is it a real advantage that specifying all dependencies manually ensures there is no unintended behavior?

Personal experience, we have deliberately created instances of those classes using the framework provided builders rather than calling the constructor by hand, because it created such a huge overhead while writing tests. Therefore it lead to our constructor to be not called while initializing in tests. We deliberately got rid of that feature because it was such a pain to manage those long dependency lists by hand.

3 - Named Injection {#3named-injection}

Named injection is even worse, why are you messing up with your statically typed language by trying to declare classes with strings? If you have multiple implementations, just use the desired implementation with a proper downcast to the interface, don't use named injection to pull a specific implementation.

val RestClient = named<IClient>("rest")
val GrpcClient = named<IClient>("grpc")

// Instead...

val RestClient: IClient = RestClientImpl
val GrpcClient: IClient = GrpcClientImpl

I can't find an example for requiring multiple instances of the same object (not a singleton), but you can easily create multiple Instances as so

object ClientPool {
  val client1 = Client.new()
  val client2 = Client.new()
}

Or maybe all you need is an ObjectPool to begin with. Alternatively, leverage your language's type features and just extend the base interface with no modifications to save yourself from some headaches.

interface Logger {
    fun log(text: String)
}

interface PrettyLogger : Logger
interface RegularLogger : Logger

object PrettyLoggerImpl : PrettyLogger { ... }
object RegularLoggerImpl : RegularLogger { ... }

4 - Cross Compatability {#4cross-compatability}

If you ever imported a library that uses a dependency injection framework and tried to adopt into your own dependency injection system, good luck with that. You are bringing bunch of dependencies that you don't really understand how it works under the hood, and moreover you now have to make it work properly with your dependency injection system, which is an abstraction that helps you to not deal with managing dependencies yourself, but to your surprise, now you have to know how both DI systems work under the hood and interact together.

So if you ever create a re-usable library, please don't use the modern DI frameworks, you should rely on your language features as much as possible. If you ever feel stuck, build something that works standalone, not a part of the DI system such as Spring or Dagger.

object LoggerProvider {
  val logger: Logger = when(LoggerConfig.type) {
    "pretty" -> PrettyLoggerImpl
    else -> RegularLoggerImpl
  }
}

Don't be afraid of creating your own abstractions to fit your own needs, I think it is perfectly normal and a common pattern in many different libraries.

5 - Final Remarks {#5final-remarks}

I'm not against dependency injection, but I think we are making ourselves excuses to think it is the best way of managing dependencies and instances around. Instead I wanted to show you how it creates "self-fulfilling prophecies", when it makes sense and when it doesn't. I see when dependency injection might make sense, where implementations change quickly, they differ platform to platform, environment to environment etc. However it is important to understand when we really need it, versus when it just looks cool.

If you are new to dependency injection, I don't think this article makes a lot of sense. Therefore I have decided to move this "mini-introduction" to the end of the article, as a foot-note to the readers.

0 - Types of Dependency Injection {#0types-of-dependency-injection}

The default approach to dependency injection is Constructor Injection. This type of injection ensures your dependencies are not lazily evaluated and thus your class can be created if and only if your dependencies have been already initialized successfully, unlike property injection. A constructor dependency injected class might look like so,

public class BookController(
  authenticator: Authenticator,
  bookRepository: BookRepository,
) {
  fun getBooks() = authenticator.protect {
    bookRepository.getBooks()
  }
}

Whereas, a property injection might look like so,

public class BookController {
  var authenticator: Authenticator
  var bookRepository: BookRepository

  ...
}

Since you should create an instance of BookController without providing the dependencies and later set them, it doesn't have guardrails that prevent you from calling method such as getBooks before bookRepository is set. Therefore it is seen as a less desired way of dependency injection, however it provides some flexibility which is useful during testing and it helps application to initialize in a partial-state, which might be desired in some cases over total-blackout.

In some frameworks such as Koin, you can use language specific features such as lateinit or lazy initialization in Kotlin and methods provided by Koin framework to initialize properties automatically.

public class BookController : KoinComponent {
  private val authenticator : Authenticator by inject()
  private val bookRepository : BookRepository by inject()
}

However this couples your classes directly with Koin, so it should be omitted for shared code if possible, otherwise you enforce users to use Koin to ensure classes are initialized properly.

Rethinking Modern Asynchronous Paradigms

2025-05-14T00:00:00+00:00

Most developers deal with some sort of asynchronous operation day to day. For most of us, it is I/O (Input & Output). A web developer does network calls, a systems developer could do some file operations, both are based on a submit and wait system, where program waits until some operation is completed. Different programming languages provide different ways to write code that is asynchronous, as developer wants to utilize the processor during the "wait" phase, by either doing more operations or yielding some CPU cycles back to the host until the async operation finishes, so other processes continue running.

It takes time for a request to reach the server, be processed, and for the response to arrive back at the client.

For reference, if you have a 4 GHz CPU and the fastest NVMe SSDs, it takes about 0.01 milliseconds of latency to read something from the disk. This is about 40,000 CPU cycles wait, just to read something from the disk that is on your computer. Moreover, if you live in New York city and the servers are located in Chicago, it takes around 20 milliseconds just to do a roundtrip without any additional operations, which takes about 80,000,000 spare cycles.

If your code is running in an operating system, normally the code you write runs sequentially inside the main thread within a process. The OS handles concurrent operations by switching threads super-fast. If your CPU has only 1 core, it can only run 1 thread simultanously. However, from a users perspective, this doesn't sound right, as you can run multiple programs at the same on your OS, while using your keyboard and mouse. This magical effect is achieved by pausing and unpausing threads super quickly, so the user can't feel there had been micro pauses.

From an application developer's perspective, how do you know your code is waiting for something to finish? Let's start with an explicit wait, Thread.sleep(milliseconds). Assume you are sending some notification, but you don't want to annoy the user by sending them notifications too quickly. So let's wait 2 seconds after each notification is sent. Assume sending a notification is real time for now.

sendNotifications(notifications: List<Notification>) {
  for (Notification notification : notifications) {
    notification.send();
    Thread.sleep(2000);
  }
}

When you call Thread.sleep(2000), your program notifies the OS that current thread doesn't want to run for the next 2000 milliseconds. Therefore, the thread is blocked for the next 2 seconds, as it doesn't run any other code. OS will take that thread, suspend it until that given time is passed and it will run other important stuff that needs to be done in the meanwhile, such as rendering stuff on screen or processing background messages.

A non-blocked thread can pickup other stuff while free

Instead if you wrote some dumb code like

long now = System.currentMillis();
while (System.currentMillis() <= now + 2000) {}

You will keep wasting CPU cycles, even though you are not doing any valuable calculation. Even though OS will probably pause your thread and do other stuff in the background, it might struggle with scheduling it efficiently, so background tasks might run slower, you might feel like your computer is less responsive and of course, as you are not leaving any spare CPU cycles.

In this scenario, we look at only one thread, but in most applications, we spawn more thread called "background threads" to run stuff concurrently inside our application.

Let's say you receive some messages from an outside source. You have a web application and you are constantly receiving messages from users and you need to send notifications to the respective target. In this case, you need a background thread that helps you receive those messages. And when you receive a message, you can send those notifications in a separate thread, so you don't block any other notification from being received and processed.

Thread worker = new Thread(() -> {
      while (!Thread.currentThread.isInterrupted()) {
        List<Message> messages = pollMessages();
        messages.forEach((message) -> {
          Thread sender = new Thread(() -> {
            sendNotifications(message.notifications);
          });
          // Start sending but don't wait until it finishes
          sender.start();
        });
        // Rate limit poll messages to prevent self DDoS
        Thread.sleep(1000);
      }
  });

// Start the thread
worker.start();

// Wait until Thread exits (until OS interrupts)
worker.join();

First glance, this looks fine, we are creating a separate thread for each send operation, so the operating system handles concurrency for us. However, creating a threads is not cheap, it allocates lots of OS-level resources, so it is a relatively slow operation.

So another idea is to use Thread Pools, where we initialize the threads beforehand, so we can omit the expensive resource and time cost of initializing threads.

ExecutorService notificationPool = Executors.newFixedThreadPool(10);

Thread worker = new Thread(() -> {
    System.out.println("Background listener thread started.");
    while (!Thread.currentThread().isInterrupted()) {
        List<Message> messages = pollMessages();
        messages.forEach((message) -> {
            notificationPool.submit(() -> sendNotifications(message.notifications));
        });

        // Rate limit poll messages to prevent self DDoS
        Thread.sleep(1000);
    }
});

// Start the thread
worker.start();

// Wait until Thread exits (until OS interrupts)
worker.join();

// Shutdown thread pool after use
notificationPool.shutdown();
notificationPool.awaitTermination(30, TimeUnit.SECONDS);

Here, we have set a size for the thread pool. This thread pool size is basically our maximum concurrency limit. We can't send notifications concurrently to more than 10 users with this setup. So let's think how we can handle this.

The core issue with 10 user concurrency is the amount of time it takes when you call send notifications. If sending notifications took only a couple CPU cycles, running 10 threads would be more than enough! But our assumption of sending notification taking couple CPU cycles is wrong, in reality, those send notification calls are usually happening over network and takes a long time as we discussed. During those network calls, our threads would be blocked.

Note: If you want to run it with minimal overhead, you could choose number of threads to be equal to 2 times number of CPU cores. Usually modern CPUs have 2 logical cores on a single physical core, thus they can run two threads real time per core.

So how can we make the send notification only run instructions that are wait-free? It is important that we move everything related to wait outside this thread pool. Why? Because anything that does a wait, basically occupies and blocks your Thread from executing other code, even though it is technically doing nothing. So, here comes the idea of Event Loops. Where we run code that is doing only non-blocking operations, which means thread is newer blocked on a wait operation, or something super CPU intensive, such as a crypthographical calculation. On this loop, we will poll and emit events, which signal some other code to be executed potentially in another thread. For example, anything that does a blocking operation can be run on a different thread pool, where it has bunch of spare threads and a lower priority in OS, which prevents it from interrupting the precious event loop from running and executing low latency code.

Let's think about how we can achieve sleeps and waits, calling Thread.sleep delegates scheduling to the operating system by blocking the thread until the given time has passed. Instead of blocking a thread, let's build an event-loop system. Instead of calling Thread.sleep, we can submit some job to a queue with a given delay, we will be creating a pub-sub model, where some jobs are scheduled via a publisher thread and the jobs are consumed and executed when the time comes on a consumer thread.

Schedule schedule = new Schedule();

Thread publisher = new Thread(() -> {
    while (!Thread.currentThread().isInterrupted()) {
        List<Message> messages = pollMessages();
        messages.forEach((message) -> {
          schedule.queue(message::sendNotification, 2000);
        });

        Thread.sleep(1);
    }
});

Thread consumer = new Thread(() -> {
    long lastRunAt = System.currentMillis();
    while (!Thread.currentThread().isInterrupted()) {
        List<Jobs> jobs = schedule.getJobBetween(lastRunAt, System.currentMillis());

        jobs.forEach((job) -> job.run());

        lastRunAt = System.currentMillis();
        Thread.sleep(1); // 1 milliseconds precision
    }
});

This is better now, as we are only running 2 threads and not running any major blocking code that affects our performance. Of course it is possible to improve this by using OS level calls. It can utilize hardware to trigger some events based on a timer or hardware level interrupts. However I wanted to show you how we can achieve something similar without relying on OS internals. This logic is actually similar to how Asynchronous frameworks are built, such as Netty. A key distinction is the use of asynchronous triggers and low-level parking mechanisms instead of Thread.sleep, allowing for more efficient CPU utilization and better responsiveness. Also in this example, our Schedule object acts similarly to a message queue, which is more popular choice in event queues, where different messages are passed around to perform different actions.

Inside this event loop, we are currently calling some get job between method to constantly check if a new job has arrived. This is not very efficient. Instead, we could use something like epoll_wait with io_uring ,which is a kernel call that blocks the thread until some change happens on a given file descriptor. Alternatively, if you are waiting messages to arrive in your message queue, you can use pthread_cond_signal with pthread_cond_wait, which allows a thread to wait until a signal is given. In this case, our event loop can wait if all messages are processed and while adding a message to the queue, we can call signal to wake up the event loop. Those kernel calls do it efficiently, so that you are not wasting CPU cycles while doing this wait.

For now we have just considered a static blocking call, sleep(...). However, most of the blocking calls we typically use are I/O related. For example network I/O, where you send a request and wait for a response to come back. To write fully non-blocking code, you have to spin-up a thread for each step that has blocking logic (wait). You also need to write schedulers and coordinators to manage those jobs and make sure they are running with high concurrency and low latency. So, developers of Java said concurrency is really hard to manage manually, let's invent some construct that allows developers to write asynchronous code, and that's how Future is born.

Java's Future

With a Future, the developer doesn't have to worry about blocking calls as often, because a Future is basically a chain of callbacks. When you construct a future, you register callbacks in your event loop. Whenever the executed code inside the Future has finished, the event loop calls your registered callback. This paradigm decouples the task submission from thread management.

CompletableFuture<List<Message>> messagesF = pollMessages();
List<Message> messages = messagesF.join();

A simple example to convert a future to a blocking call

A Future is a wrapper that can have values put inside from other sources in a future time. For example, when you call .join(), your current thread waits until the result inside the Future object is available. The result is usually set from another thread. So you can pass around those Future objects safely in your code without blocking your current thread.

CompletableFuture<Object> future = new CompletableFuture<>();

// Spawn a thread to do calculation in the background
new Thread(() -> {
  Object result = longRunningCalculation();
  future.complete(result);
});

// Wait until the result is available (complete) is called.
future.join();

Moreover, you can transform and chain Futures together to do more complex operations such as,

CompletableFuture.supplyAsync(() -> calculateString())
            .thenApply((String::toUpperCase))
            .thenApply(s -> s + " world")
            .thenAccept(System.out::println);

Moreover, futures can be chained together, so ones execution will depend on another's result.

CompletableFuture<String> f1 = CompletableFuture.supplyAsync(() -> "hello");
f1.thenCompose(s -> CompletableFuture.supplyAsync(() -> s + " world"));

As you can see, using a Future as a developer is something you need to get used to, you can't write code sequentially as before. You have to rewrite it using a special syntax. For example, a blocking code for polling and sending notifications can be written as,

List<Message> messages = pollMessages();
messages.forEach((message) -> {
  Result result = sendNotification(message.notification);
  persistResult(result)
});

But as usually polling, sending and persisting are waiting operations, let's modify them to return Futures. Therefore we need to write our code in the following way to prevent blocking calls. First, we modify pollMessages, sendNotification and persistResult to return futures, so they are not blocking.

pollMessages()
    .thenComposeAsync(messages -> {
        List<CompletableFuture<Void>> futures = messages.stream()
            .map(message ->
                sendNotification(message.notification)
                    .thenComposeAsync(v -> persistResult(message), executor)
            )
            .toList();

        return CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
    }, executor);

As you can see, a simple sequential code had become something obscure pretty quickly. We are not doing any kind of trick to run stuff in parallel as well, we just want to run asynchronous operations without blocking.

Scala's Way of Sequentialism

By using futures, we have the flexibility of keep running more async code without waiting for each one of them. However, an application developer's code is usually written in a sequential way, so that each operation happens back to back. Therefore, futures are usually composed in a nested way. This nesting creates a readability and maintainability issue. So Scala came up with a clever way to manage those nestings, a for comprehension.

for {
  messages <- pollMessages()
  result <- sendNotifications(messages.notifications)
  _ <- persistResult(result)
} yield (result)

This approach tries to create a sequential syntax for writing asynchronous code unlike Java's traditional Future chaining. However it comes with several limitations,

You still need to write code using a special syntax.
Early returns are not possible
Error handling is still nested.
Iterative code doesn't translate directly.

Those limitations also apply to Java's Future, but demonstrating them would require a different syntax, I found scala's syntax to be slighltly more friendly, but I will show you why it is still limiting. For example you can't conditionally run a code without nesting.

for {
  result <- sendNotifications(messages.notifications)

  // This is not a valid syntax
  if (result == Result.ERROR) {
    _ <- reportErrors(result)
    return false
  }

  _ <- reportSuccess(result)

} yield (true)

You have to write it using nested for comprehensions, so each decision point in your comprehension tree needs to branch out.

for {
  result <- sendNotifications(messages.notifications)
  innerResult <- result match {
    case Result.ERROR => for {
      _ <- reportErrors(result)
    } yield false

    // For comprehension is not recommended for single futures.
    case _ => reportSuccess(result).map(_ => true)
  }
} yield innerResult

For error handling, similarly you have to write recover blocks, you can't use your daily tool of try { .. } catch { .. }.

for {
  result <- sendNotifications(messages.notifications).recoverWith { case err =>
    for {
      _ <- reportError(err)
      _ <- rollback(messages.notifications)
    } yield false
  }
} yield result

Nesting also forces you to unify the type of result. Normally, you could assign the result of error to a different variable and called return early on to prevent code from incrementing sequntially. Moreover, all those limitations still apply to Java's traditional Futures as well.

Sequentialism as First Class Citizen

We now know why Java's Futures exist and how Scala's for comprehension syntax try to solve some fundamental issues with those. However, it is obvious to see Java wasn't designed as first-class asynchronous programming support, where Scala tried to patch some of its inherent issues. However, Scala, never tried to replace Java, but rather tried to extend it. For comprehensions has been a big deal, but it also brought a lot of other benefits as well. On the other hand Kotlin directly targeted Java as its contender and tries to replace it. One of the distinct features of kotlins is coroutines.

Instead of relying threads, which are expensive operating-system level constructs, Kotlin introduces coroutines, which are runtime-level lightweight constructs. Coroutines do still run on threads, but their execution is not strictly tied to a single thread, so they can switch threads during runtime. This flexibility makes them lightweight, similar to jobs submitted to the thread pools as we have shown in the first chapter. However, Kotlin has first-class support for coroutines using its language features, most importantly suspend.

Unlike threads, coroutines are not paused randomly to let other coroutines run. Note that the thread that is running coroutines can be paused randomly by the OS, that is not possible to prevent, however the coroutine scheduler doesn't internally pause coroutines. On contrary, those coroutines show a cooperative approach. They yield the current execution whenever possible. Most importantly, they yield during asynchronous operations, where they wait for an operation result. Therefore underlying libraries should expose those asynchronous operations as suspend functions to allow benefiting from Kotlin's coroutine features.

Also the best thing about suspend functions is its written the traditional sequenatial way. Sequential asynchronism is the first class citizen, whereas controlled asynchronism is also provided using other interfaces, including Future or Kotlin's Deferred construct.

suspend fun processMessages() {
  val messages = pollMessages();
  messages.forEach { message ->
    sendNotification(message.notification)
    delay(2000)
  }
}

Wait, that must be blocking right? No, there is no blocking code here! The methods pollMessages, sendNotification and delay is actually suspend methods. For example, when you are polling messages, it actually does it asynchronously and the coroutine is yielded during this polling process, thus it doesn't block the running thread. Same goes for send and delay. The delay is a native implementation, where a scheduler stops the coroutine in the background and continues it when the given time has arrived. So we were able to benefit from an event-loop without writing the nested futures and executors. If you are curious about how event-loops are implemented, check the C++ Worker implementation for Kotlin.

Having Kotlin's suspension language feature solved almost all of our pain points as developers with writing asynchronous code. Most importantly, writing code that does asynchronous stuff without inducing any parallelism. A developer doesn't necessarily care how those futures are chained and handled, especially if they are writing data intensive applications. If a developer needs explicit parallelism, they can use Kotlin's provided Deferred variables.

val messagesD: Deferred<List<Message>> = async { pollMessages() }
val messages = messagesD.await() // calling await is "suspend"
sendNotifications(messages)

Moreover, a user might dispatch the given suspend call in a different coroutine context, or thread pool. This is specifically important if an old-school blocking code needs to be executed inside a suspend function.

val messages = withContext(Dispatchers.IO) {
    pollMessages()
}
sendNotifications(messages)

Implicit Parallelism: Know Where to Go

A step forward from sequenatial asynchronism can be thought as implicit parallelism, where the execution of code happens sequentially and asynchronously at the same time. How? It is only possible by the programming language's support. Let's assume when you call,

val messages = pollMessages()
val users = fetchUsers()

the code fetchUsers() is executed before pollMessages() is finished, because they are mutually exclusive events. This can be traditionally done using a futures approach.

val messagesF = pollMessages()
val usersF = fetchUsers()

val (messages, users) = awaitAll(messagesF, usersF)

However having this in programming language's native construct can both help users write performant code, whereas it can also cause them to write buggy code easily, as the default assumption is sequentialism. Therefore, I think a paradigm where implicit parallelism is possible, but it should be assessed very carefully while using, as there is no way to prevent unintentional race conditions without doing any formal verification. Even in runtime, you might see flakiness issues, as you are starting to build a distributed by default environment. We already know distributed systems is already hard to ensure correctness without doing formal verification, we are pushing this complexity towards our code.

That's why I think Kotlin deserves some praises on how it handles paralellism, where it is explicit and easy to shift between paradigms.

val messages = async { pollMessages() }
val users = async { pollMessages() }

sendNotifications(messages.await(), users.await())

I hope to see some language features where calling a second await is unnecessary because it is already awaited in the past, similar to smart casting, where a nullable type can be cast to be a not-null type automatically if some check has been performed.

Final Remarks

There is still a lot to talk. There are bunch of other languages and frameworks that handle asynchronous execution in various ways, such as Go's goroutines, javascript's async/await, python's asyncio, Rust's tokio etc. There is still more in Java related to Future, Mono, Flux – Scala's execution contexts, Cats, Akka – Kotlin's coroutine contexts, dispatchers, Flows, Channels and many more if you are interested in reading about them.

We see how programming languages have evolved to catch up with the developers need. Our hardware has improved, our CPUs have many spare cycles, now we are usually a larger share of our time for waiting tasks, such as disk or network. Initially we have written code sequentially, later we have built Futures, executors and event loops. Finally, we have seen how syntax evolved to support asynchronous programming in an easier and more readable way. I do believe asynchronous programming is still open to improvements, frameworks and languages used will keep improving, sequential asynchronism will increase its popularity.

Start with a clean slate: Integration testing with PostgreSQL

2025-04-22T00:00:00+00:00

We have been using PostgreSQL as our primary database in production for 4 over years, however over time, as our database grew bigger and reached over 500 tables in a single monolithic application, we had to come up with smart ways to manage it. PostgreSQL is a database that is capable of handling hundreds of tables and billions of rows, however it doesn't necessarily mean it will be easy to develop applications in a such setting. In this post, I am going to write down how I have tackled some bottlenecks in the integration testing pipeline at Carbon Health by speeding up and increasing isolation of our integration test pipelines. The solution powers our CI/CD pipelines for the last 2 years.

This blog post’s topic is my upcoming presentation at PGDay Chicago 2025. Conference slides accessible at https://pgday.dogac.dev/.

Link to the tool: github.com/Dogacel/pg_test_table_track

Problem

A short anectode on monoliths: Microservices is something we often hear about but usually a far reality for many of us. Monoliths (monos: single/one, lithos: stone) still work pretty great in many real-world settings and they only bear a subset of management problems microservices have. One of the core problems in Monoliths is its huge codebase and slow build times. You don't need to open 3 pull requests just to do some CRUD operations on a basic database table, and jump back a couple PRs later, because you forgot to add a field to your proto definitions and you gotta open 3 more PRs to add that. That sounds neat, however most likely, the total CI/CD runtimes of your 6 PRs will be still less than a single PR check in the monolith's PR, just to see your linter failed after 45 minutes, because you failed to define a constant for a magic number, yikes. If you want to have a productive and effective development environment with your Monolith, you have to do some optimizations in your CI/CD and testing environment.

Background: So a little background about our company before we start,

Our tech-stack consists of a monolithic server supported by 30+ micro-services.
We host our services on cloud, our primary choice of database is PostgreSQL.
We have over 500 tables serving more than 10TBs of data.
We have about 6 distinct development teams.

As you might have guessed, those 500 tables are causing a big trouble in our CI/CD pipelines. Almost more than half of our 9000+ tests in our monolith are also integration tests, meaning they use a PostgreSQL instance to run queries. And over time, our pipeline has became painfully slow and annoying to work with, which lead me to come up with a solution.

Integration tests

Integration testing checks whether different parts of a system work together correctly as a whole. Unit testing focuses on testing individual components.

Even though unit tests are much superior in terms of isolation and speed, they are not as good for covering the end to end flows and detect real-life failures. That's why we have extensively written integration tests to ensure our Monolith is tested well before release. Based on our experience, setting up scenarios and running the actual DB queries in tests really help catching bugs early on.

So, what's the catch? Writing integration tests are painfully hard, as your data dependencies, such as foreign key constraints, make initializations a hassle for developers. Moreover, your database keeps a state, therefore you need to ensure it doesn't leak in-between tests. So let's explore our options in order to achieve a fast and isolated environment.

Wrapping every test with transactions

At first, it sounds like a good idea. In reality, it is a terrible idea. PostgreSQL supports some sort of nested transactions, also called SAVEPOINTS. However a failure inside a transaction aborts the rest. Therefore, it is not possible to truly wrap every test inside a transaction and run, as some errors might result parent transaction to abort. Moreover, wrapping with additional transactions would result in altering the runtime behavior of tests. This is not something we want, as it might result in hard to debug errors that are only faced during tests, as well as behavioral differences from the actual production environment, which might cause some bugs to be not caught early on.

Fresh DB for each start

If you want to maximize isolation, go ahead and create a fresh DB instance for each of your tests. This worked fine in our microservices where the number of tables and tests were lower. However in monoliths, you will quickly realize this is a slow process. We have thousands of migration files, but we can always use a schema dump. In our case, we used rake:schema:dump. I highly encourage readers to experiment with TEMPLATE databases as well. However, initialization takes around 400 milliseconds, this results in a little over 1 hours of just DB initialization time for our 9000+ integration tests.

A very simple implementation of a DB provider for running isolated tests.

Cleaning all tables

This was the initial approach in our codebase, maintaining a hand-curated list of DELETE TABLE queries. However it has some drawbacks,

Order of deletions matter as there are foreign keys.
Sometimes tables were missed from the, resulting in flakiness.
Sequences and Materialized Views require special attention.

For number 3, our codebase doesn't truly benefit from both, so we didn't care. However, this approach was still too slow and maintaining the list was super annoying. Adding a new table into this list was very hard, you would see weird foreign key errors, random test failures and so. Also, there is no guarantee that your hand-crafted list contains all the necessary tables in the right order. Therefore, a developer might randomly encounter a flakiness while writing a test without knowing it is related to some artifacts leftover from the recent tests.

Actually I have had this issue once, and it was super annoying to fix. Updating our build tools resulted in changing the execution order of tests, which ultimately lead to flakiness. It took me an enourmous amount of time to figure out that test order was changed and the bug was caused by a state leak between tests.

We also experimented with TRUNCATE over DELETE. However it slowed down our pipelines even more. I think it is because our test tables had a small amounts of data, which made truncate less effective and caused overhead.

Final Solution

So I have decided, our final goal should be

Make each table fresh before each test
Clean the state as fast as possible
Get rid of hand-crafted lists as entropy always wins

So I built a solution that uses PL/pgSQL to automatically clean all tables that are used in-between tests.

Storing Access

If there are bunch of tables, trying to clean up all of them would generate a big overhead. So instead of that, what about only cleaning the ones that contains data? To do that, we need to store the tables that are used during testing somewhere.

CREATE TABLE IF NOT EXISTS test_access(table_name varchar(256) not null primary key);

Later, create a function / trigger that adds a given table name to the list.

CREATE OR REPLACE FUNCTION add_table_to_accessed_list() RETURNS TRIGGER AS $$
BEGIN
  --- Assuming that the table name is passed as the first argument to the function.
  INSERT INTO test_access VALUES (TG_ARGV[0]) ON CONFLICT DO NOTHING;
  RETURN NEW;
  END $$ LANGUAGE PLPGSQL;

Spying on tables

In order to spy on tables that are modified, we can use triggers. This trigger will be executed before every insert, which ensures we capture all tables that are altered during the test run.

CREATE OR REPLACE FUNCTION setup_access_triggers(schemas text[]) RETURNS int AS $$
DECLARE tables CURSOR FOR
  SELECT table_name, table_schema FROM information_schema.tables
    WHERE table_schema = ANY(schemas)
      AND table_type = 'BASE TABLE' --- Exclude views.
      AND table_name NOT IN ('test_access', 'schema_migrations');
      --- Prevent recursion when an insertion happens to 'test_access' table.
BEGIN
  --- Create a table to store the list of tables that have been accessed.
  EXECUTE 'CREATE TABLE IF NOT EXISTS test_access(table_name varchar(256) not null primary key);';
  FOR stmt IN tables LOOP
    --- If the trigger exists, first drop it so we can re-create.
    EXECUTE 'DROP TRIGGER IF EXISTS "' || stmt.table_name || '_access_trigger" ON "' ||
          stmt.table_schema || '"."'|| stmt.table_name || '"';
    --- Create the on insert trigger.
    --- This calls `add_table_to_accessed_list` everytime a row is inserted into the table with table name.
    --- The table name also includes the table schema.
    EXECUTE 'CREATE TRIGGER "' || stmt.table_name || '_access_trigger"' ||
            ' BEFORE INSERT ON "' || stmt.table_schema ||'"."'|| stmt.table_name || '"' ||
            ' FOR EACH STATEMENT ' ||
            ' EXECUTE PROCEDURE public.add_table_to_accessed_list (''"'||
            stmt.table_schema ||'"."'|| stmt.table_name ||'"'')';
  END LOOP;
RETURN 0;
END $$ LANGUAGE plpgsql;

Cleaning the tables

As a last step, we need to create a function that allows us to clean all tables that are accessed during the last test execution cycle. We disable foreign keys before deleting to ensure deletion order doesn't matter as our final goal is to clean all tables.

CREATE OR REPLACE FUNCTION delete_from_accessed_tables() RETURNS int AS $$
DECLARE tables CURSOR FOR
  SELECT table_name FROM test_access;
BEGIN
--- Disable foreign key constraints temporarily. Without this, we need to clear tables in a specific order.
--- But it is very hard to find this order and this trick makes the process even faster.
--- Because we clear every table, we don't care about any foreign key constraints.
EXECUTE 'SET session_replication_role = ''replica'';';
--- Clear all tables that have been accessed.
FOR stmt IN tables LOOP
  BEGIN
    EXECUTE 'DELETE FROM '|| stmt.table_name;
    --- If we accessed a table that is dropped, an exception will occur. This ignored the exception.
    EXCEPTION WHEN OTHERS THEN
  END;
END LOOP;
--- Clear the list o accessed tables because those tables are now empty.
EXECUTE 'DELETE FROM test_access';
--- Turn foreign key constraints back on.
EXECUTE 'SET session_replication_role = ''origin'';';
RETURN 0;
END $$ LANGUAGE plpgsql;

Embedding into Tests

We have developed an interface / trait called CleanDBBetweenTests and every integration test in our system extends this trait. Inside this trait, we have setup some before and after test triggers to ensure our tables are cleaned.

def clearAccessedTables(): Unit = {
  finishOperation(sql"""SELECT public.delete_from_accessed_tables()""".as[Int])
}

def setupTestTriggers(): Unit = {
  finishOperation(sql"""SELECT public.setup_access_triggers(array['test_schema'])""".as[Int])
}

trait CleanDBBetweenTests extends BeforeAndAfterEach with BeforeAndAfterAll { this: Suite =>
  override def beforeAll(): Unit = {
    setupTestTriggers()
    clearAccessedTables()
  }
  override def beforeEach(): Unit = {
    clearAccessedTables()
  }
  override def afterAll(): Unit = {
    clearAccessedTables()
  }
}

Results

Using this approach, we were able to cut our CI/CD times by 30%. The speed increase and better isoaltion greately improved our developer experience. We have never had issues with our table cleaning approach since we first rolled out this tool. As our codebase keeps growing, without this change, our current CI runtime would be more than 1.5 hours by now. Speeding up our CI times didn’t only decrease our bills but it also motivated people towards writing more code and tests as the PR feedback cycle was much quicker

Future Work: Exploring strategies to support constant rows that would stay during all execution cycles, as well as setting up scenarios. Moreover, UNLOGGED TABLEs can potentially speed up the execution further more.

Last words… I have decided that I should open-source this tool so everyone can benefit from it. Your feedback is very valuable, please let me know what you think.

Behind the 6-digit code: Building HOTP and TOTP from scratch

2025-04-11T00:00:00+00:00

A while ago, I have started working on authorization and authentication at work. This taught me a lot about how modern authentication systems work. However I have always thought One-Time Password logins are the most mystical ones. A six-digit code that changes every time and can be used to verify your identity. How does the server know the newly generated one, and how is it really secure? In this post, I will explain what HOTP, TOTP is and how they work by sharing my own implementation from scratch.

What Are OTPs?

One-Time Passwords (OTPs) are a widely-used form of authentication. You’ve likely encountered them when using a “Secure Login” app like Google Authenticator, or during a “Forgot Password” flow where a temporary code is sent to your email or phone.

Unlike traditional passwords, OTPs are only valid for a single use or a limited time window. This greatly reduces the risk of password replay attacks, where someone captures the password used to login and tries to reuse it.

Passwords can be used repeatedly. When leaked, malicious actors can impersonate the user and access critical information.

Like the traditional password authentication approach, the user and the authority (server) still needs to agree on a common secret key. During the regular password authentication, this secret key is directly communicated to the authority. There are many ways of doing this process safely, such as hashing the password or sending it over an encrypted network. However the risk still exists, as the password itself never changes, as long as we use our devices to type our passwords, there is some way those malicious actors can watch and get that information before it reaching the network.

So instead of using a constant secret key, we can use something dynamic that changes over time. As a simple example, assume when those two people first met, they have set their secretly hidden clocks to a random time together.

Using secret clocks as a basic OTP implementation

Also in some examples like a password recovery, we can use also use a secret clock. This secret clock not shared with the user directly but rather server's generated one-time password is sent via a trusted medium, such as an email to the user.

*Edit: Several readers have warned me it is much easier to generate random numbers instead. The server has to store number of attempts to make sure it is not brute forced as well.*

Obviously a clock on its own is not secure, as in this example Plankton could have predicted the time-shift of the secret clock based on the real time. However for the sake of this example, I wanted to show how copying the "password" is not enough on its own. Let's take a look at some strategies to build this "secret clock" and make sure it is not possible to predict time just by knowing a single code in some point in time.

There are two common types of OTP algorithms:

HOTP (HMAC-based One-Time Password) – based on a counter that increments every time an OTP is requested.
TOTP (Time-based One-Time Password) – based on the current time, typically using 30-second intervals.

These methods are standardized in RFC 4226 (for HOTP) and RFC 6238 (for TOTP), and are used in many modern 2FA (two-factor authentication) implementations.

A counter based password method is easier to understand. Imagine two people met and generated a totally random series of numbers. They both start from count 0, as in each attempt, user needs to communicate to the server with the secret key in the given index. However this comes with several problems,

Clients needs to sync their counter, if there is a skew, they might get temporarily locked out.
Malicious actors can collect upcoming login codes by phishing the user and those codes can be used for a long time.

Therefore, instead of storing a counter, we can use the current time as the counter. That's how TOTP works. Using time makes synchronization easier, as many modern machines already use technologies such as NTP to sync their time and this prevents malicious actors from harvesting codes as their code will be valid for only next 30 seconds or so, not for a long sequence of future login attempts.

How to Generate TOTPs?

The analogy of two people met and decided on a totally random series of numbers is partially realistic. However it is not feasible to have such a huge list, you potentially need to have millions of secret numbers to support OTPs for a reasonable time. Therefore we should use algorithms that are cryptographically safe that generate values based on a secret key. It is important that this algorithm is not random, as both user and the authority will hold a copy of this secret key and they should be able to generate the same value given the same time.

We have introduced HOTP first because the actual implementation of TOTPs are actually HOTP based. Instead of using a static counter, TOTPs use the time as the current counter. We can write the following formula to find the counter in any given time,

\[c(t) = \left\lfloor \frac{t - t_0}{X} \right\rfloor\]

Here $t_0$ is the starting time, in most systems this is the default UNIX epoch timestamp, 1 January 1970. $X$ is the period you want the code to rotate. For example, if you want the login code to change every 30 seconds, X should be 30 seconds.

How to Actually Generate HOTPs?

In order to generate an HOTP, you need to decide on three things:

A secret key
A hash function
Number of digits you will output

First, we need to start by hashing our secret key. For example, if we have chosen SHA-1 as our hashing algorithm, our output would be only 64 bytes. If secret key is shorter than 64 bytes, we can just pad it with zeroes. Otherwise, given $K$ is our secret key and $H$ is our hashing algorithm,

\[K\_{pad} = H(K)\]

Later we do an XOR operation on text with some pre-defined magic constants $I_{pad}$ and $O_{pad}$.

\[\begin{align} I_{pad} &= [\texttt{0x36}, \dots] \\ O_{pad} &= [\texttt{0x5c}, \dots] \end{align}\]

Those numbers are originally chosen by HMAC designers and any pair where $I_{pad} \neq O_{pad}$ could have been chosen. Their length should be also 64 bytes, same as our hashing algorithm’s digest length. Later we define the famous $\text{HMAC}$, Hash-based Message Authentication Code, function as in RFC 2104. It outputs a crypthographic hash calculated using the given key and message.

\[\text{HMAC}(K, M) = H(K*{pad} \oplus O*{pad} + H(K*{pad} \oplus I*{pad} + M))\]

This cryptographic hash function is secure, so that user can’t infer the secret key $K_{pad}$ even if they knew $M$ and the resulting hash.

Later we will define a new function to generate a 4-byte result. Here is the definition of DT from the original RFC,

    DT(String) // String = String[0]...String[19]
     Let OffsetBits be the low-order 4 bits of String[19]
     Offset = StToNum(OffsetBits) // 0 <= OffSet <= 15
     Let P = String[OffSet]...String[OffSet+3]
     Return the Last 31 bits of P

This function allows us to shrink our 20 byte input to 4 bytes dynamically by choosing the bytes offsetted by the number that is represented using the last 4 bits of the input. The outputs of the DT on distinct counter inputs are uniformly and independently distributed.

Finally, we can define our HOTP function as,

\[\text{HOTP}(K,C) = \text{DT}(\text{HMAC}(K,C)) \bmod 10^{\text{digits}}\]

Here we can replace our counter $C$ with $c(t)$ to get a TOTP code.

Final Remarks

There are many online resources with TOTP and HOTPs, however I have struggled to find a website that help me check my implementation as their secret-key representations were not standardized. Thus, I have published my own short demo app to showcase.

I have published this app on my website and also on GitHub, the implementation uses Kotlin.

Link to the app https://otp.dogac.dev/
Link to the GitHub repository: github.com/Dogacel/otp-server

To recap: We’ve looked at how HOTP and TOTP work, explored how they're derived from HMAC, and saw how the server and client can generate matching codes without ever transmitting the password itself.

Working on this project helped me understand how OTPs work at a much deeper level. What once felt like magic now feels like elegant design.

On Decidability of Our Jobs and AI Replacing Software Engineers

2025-04-03T00:00:00+00:00

Among all the occupations AI could replace, why are we focusing so much on Engineering jobs that require such expertise? I'm well aware of quality of the code AI writes, it is beyond useful but I don't see a world where that piece of code can find its way into the real world without the help of a software engineer. First, I would like to talk about the kind of jobs that I think AI will replace first and how AI can't replace Software Engineers any time soon.

I would like to use the Turing-completeness analogy to describe jobs (a system that can simulate any computation). I think as there are two categories of jobs, decidable and undecidable. In the traditional sense, a decidable problem can be solved by a well-defined algorithm that always halts with a correct answer. Translating this to the world of work, a decidable job has well-defined inputs and a finite set of outputs. So it can be fully automated or scripted. Jobs that are mostly decidable are most prone to being replaced by AI. On the other hand, an undecidable job is open-ended. Given a problem, there is no guaranteed algorithm that always gives you a solution, or even tells you if a solution exists. An example of the decidable jobs could be a customer support agent. Even though your input set is not well-defined as it is usually a text written by a human, (probably this is the only reason why this job still exists today), your possible actions are all documented. On the other hand, an engineering job can be well-defined as undecidable, build a system that scales and supports X features under Y constraints. Arguably most of your job is to figure out how and planning the process rather than the actual implementation. Take construction engineers for example, their primary duty is to come up with a plan rather than carrying the material that is required to build.

One might say, in this context every job can be defined as either decidable or undecidable based on the job description. It's also fair to say each job has some decidable factor and some undecidable factor. For example, a customer support agent might have creating new workflows as a part of their duty rather than only using pre-existing procedures, which makes it less decidable. On the other hand, an engineer can have a job where the only expectation is to transform some data from one format to another. Therefore it is not always possible to classify an occupation entirely as decidable or undecidable. Here is a key take-away, the definition of a job ultimately determines its decidability. We create jobs to solve problems and there are infinitely many ways to define those jobs. If we are able to define those jobs with a clear separation of decidable and undecidable, we can easily replace the decidable part with AI.

However this will give birth to new jobs where the person's primary function is to split a job into a decidable and an undecidable factor. We can think them as AI-integration engineers. Their primary function is to extract out the decidable factor from the undecidable factor. Since the process of extracting decidable from undecidable is undecidable (the classic halting problem), it is fair to say their jobs are secure. I do believe software engineering overlaps quite a lot with the definition of extracting out the decidable. Not just software engineers, but most engineering jobs have this function, where engineers primary function is to create individual units of job each can be autonomously executed. It's almost like we have defined what engineering is…

As programming languages have a formal spec, their syntax is decidable. I think this is one of the pitfalls that make people think software engineering is going to vanish. However their function is absolutely not as we have discussed. Furthermore it is discussed only around one third of a Software Engineer's duties consist of writing code. So, it is fair to say most software engineers' jobs are safe. Only a small portion of their job can be replaced by AI, the syntax of their programming languages. Note that it is not possible to generalize this to all software engineers, as some engineers might find their jobs to be more decidable than others. But you don't really need AI to replace those types of Software Engineers, software can replace developers on its own. We have been seeing this shift for a while, from "punch card punchers" being replaced by terminal emulators to static website generators allowing non-programmers to create websites. However this didn't end the Web Developers' jobs but rather pushed them towards building more advanced frameworks and tackling harder problems. So ultimately, my takeaway is that AI will help us eliminate the decidable part of our jobs faster than ever, which is usually the most boring and uninspiring part anyway. It will allow us to spend more time on tinkering and building more advanced tools.

Final Remarks

I have intentionally tried to keep this article short, as there is much more to say about software engineering. The article "AI Over-Hype: A Dangerous Threat (and How to Fix It)" motivated me to write this post, as it advocates professionals to rally against the remarks of "AI will write all the code" (Another shoutout to Anthropic's CEO). It dives much deeper into the topic of software and AI, supports its arguments with empirical data. Also a great blog post from Alperen, "Verifiability is the Limit" dives much deeper into software engineering. It discusses the pitfalls of AI on correctness and verifiability in relation to software engineering, which inspired me to come up with the analogy of Turing Completeness in terms of job functions. Finally David Graeber's "Bullshit Jobs" is a must read on a broader context. It's not meaningful to discuss what jobs can be replaced and what can't without really understanding their functions and why do they exists in the first place.

Supercharge Your Home Cluster Using Cloudflare Tunnel

2025-03-29T00:00:00+00:00

I'm a big fan of self-hosting and DIY. Since writing my previous blog post about my self-hosting journey, I have learned some exciting new things that I want to share it with you. First, I’ll explain my initial server setup. Then, I’ll discuss why I looked for an alternative, and finally, I’ll show how Cloudflare Tunnels helped me achieve my goals.

The Problem

If you are like me and you are hosting your website on your own home-cluster, there is some configuration you have to do to ensure you are not exposing your devices to the internet insecurely. Secondly, you will soon realize your home-cluster is not accessible in the same way inside your home (private network) as someone outside your network.

Initial Setup

My home-server runs on Proxmox and it exposes a lightweight Alpine LXC (Linux Container) to handle external traffic. I’ve deliberately disabled SSH on this container for extra security; I can only connect to its TTY via the Proxmox console. Instead of port-forwarding the services I want to expose one-by-one, I deliberately placed that container behind a DMZ, so I don't need to configure a port-forward everytime I need one (Shout-out to Xfinity for making port forwards extra difficult).

I currently host many things in my home-cluster, some of the applications run on a Kubernetes cluster and some of them run as standalone docker images because I was lazy to move them. I have my Blog, a Fresh RSS instance to manage my RSS subscriptions, a generic-purpose PostgreSQL instance to collect data from experiment runs for research projects, a Minecraft server to play with my friends, a Grafana dashboard to visualize different kinds of data and set various alerts, such as SSL certificate expiration of my website, an influx DB to collect sensor data from my house and many more.

My main Kubernetes cluster runs on MicroK8s, it has a MetalLB on front and I can setup ingress rules to forward traffic to different applications. However this is not enough on its own, because there are other applications outside kubernetes that I need to expose. Therefore I have decided to put everything behind HAProxy.

defaults
  mode http
  timeout client 60s
  timeout connect 30s
  timeout server 60s
  timeout http-request 60s

frontend .dogac.dev
  mode http
  bind :443 ssl crt /root/haproxy/all.pem

  acl is_freshrss hdr(host) -i freshrss.dogac.dev
  use_backend fresh-rss if is_freshrss

  acl is_blog hdr(host) -i blog.dogac.dev
  use_backend ghost-blog if is_blog

  acl is_grafana hdr(host) -i grafana.dogac.dev
  use_backend main-cluster if is_grafana

  acl is_healthcheck hdr(host) -i health.dogac.dev
  use_backend main-cluster if is_healthcheck

  acl is_otp hdr(host) -i otp.dogac.dev
  use_backend main-cluster if is_otp

  default_backend ghost-blog


backend fresh-rss
  mode http
  option forwardfor
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  server container-master 10.0.0.X:X

backend ghost-blog
  mode http
  option forwardfor
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  server container-master 10.0.0.X:X

backend main-cluster
  mode http
  option forwardfor
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  server main-cluster 10.0.0.X:X ssl verify none

frontend psql-fe
  mode tcp
  bind :X
  default_backend psql-be

backend psql-be
  mode tcp
  server psql 10.0.0.X:X

frontend minecraft-server
  mode tcp
  bind :X
  default_backend minecraft-sv

backend minecraft-sv
  mode tcp
  server game-server 10.0.0.X:X

By using this configuration, I am able to forward traffic coming from different sub-domains to different applications and potentially to kubernetes. Just to make everything extra secure, I did not forward unknown domains / subdomains to kubernetes, so I wouldn’t accidentally expose something.

I still had a couple of additional issues,

I don't own a static IP address
SSL certificates rotate every 3 months
I expose my home IP address directly to the domain provider

For number 1, I used DDClient to automatically update my IP address to PorkBun domain provider regularly. As I don't provide any availability SLAs for my personal website and my IP address doesn't change often, it works fine.

For number 2, I have created a small script to automatically update my HAProxy certificates from Porkbun and I want to share with you. I ran this script a week before my SSL certs expired (my Grafana instance reminds me). Alternatively I could have scheduled this to be a weekly job.

#/bin/ash

set -eo pipefail

apikey=X
secretapikey=X
domainname=X

resp=$(curl -s -X POST https://api.porkbun.com/api/json/v3/ssl/retrieve/$domainname -d "{\"secretapikey\": \"$secretapikey\", \"apikey\": \"$apikey\"}" | jq)

result=$(echo $resp | jq -r '.status')

if [[ "SUCCESS" != "$result" ]]; then
    echo "Not successful result: $resp"
    exit 1
fi

chain=$(echo $resp | jq -r '.certificatechain')
privatekey=$(echo $resp | jq -r '.privatekey')

mv all.pem old.pem 2>/dev/null

echo "$chain" >> all.pem
echo "$privatekey" >> all.pem

echo "Done!"

For number 3, I did not have much option with my current setup. As far as I know, Porkbun doesn't have a direct way to secure my IP address and I don't want to pay monthly for a proxy server.

Access from Home

My ISP and router doesn't allow NAT Loopback, meaning that I can't access my own network using its external IP while I am in the internal IP. You might ask, why do I need that? For example when I try to visit my website while I am at home, I can't access it because its domain name resolves to my external IP and my router doesn't allow it. There are a couple of ways around it but none of them are perfect,

Change your router to support NAT Loopback. Firstly it is not guaranteed that it will work and secondly it requires additional maintenance cost and money.
Update /etc/hosts. This technique works locally but you have to remember to update your hosts file every time you connect to an external network / internal network. Also, it needs to be configured per device. I am not sure if there is an equivalent way for my iPhone for example. Also you might face SSL certificate issues.
Update Router's DNS records. As I have stated before, I don't think it is possible in my case and I don't want to deal with the complexity of an additional DNS server.
Use server IP directly. My LB forwards traffic based on domain names and some services are configured to only listen to those domains. Also I have to switch to the domain name when I am on an external network. I can't change the server address each time for every application I use, such as my RSS reader, NetNewsWire.
Use a Proxy server. My motivation is to not pay for an additional server and maintain it. However, Cloudflare provides a free solution that you can use. Let's explore it.

Where Cloudflare Tunnels Shine

In the previous sections, I have explained why a Proxy Server increases the security of your home cluster and helps you federate access from internal and external network. So I started searching for a free proxy alternative, however you shouldn't really trust a free proxy server. Previously I have set my own proxy using Squid, however I wasn't happy with its performance with AWS's Lightsail solution. Even if you don't mind paying for an additional server you still have to maintain it.

After a careful investigation, I have found Cloudflare Tunnel. From Cloudflare's website,

Cloudflare Tunnel provides you with a secure way to connect your resources to Cloudflare without a publicly routable IP address. With Tunnel, you do not send traffic to an external IP — instead, a lightweight daemon in your infrastructure (cloudflared) creates outbound-only connections to Cloudflare's global network

This seemed like the perfect solution for me. First of all, I followed their docs to move my DNS nameservers from Porkbun to Cloudflare to start using it.

curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/bin/cloudflared
chmod +x /usr/bin/cloudflared
cloudflared --help

Later I have created a cloudflared tunnel and installed it as a service. The service makes sure my certificates are up-to-date and tunnel forwards to my latest external IP.

cloudflared tunnel create home-server
cloudflared tunnel route dns home-server *.dogac.dev
cloudflared tunnel route dns home-server dogac.dev
cloudflared service install

Initially I thought about replacing HAProxy with cloudflared's ingress configuration. However, I concluded that it would be a lot of effort and a step backward from my current setup. So instead, I have decided to forward all traffic coming to my domain directly to HAproxy without any additional configuration

tunnel: home-server
credentials-file: /root/.cloudflared/XXX.json

ingress:
  - service: http://localhost:8443

Also I had to change my HAProxy config now. I should delete the SSL certificate now as cloudflared handles the SSL automatically and I have switched from port 443 -> 8443 so the cloudflare tunnel can use the standard port 443 for HTTPS.

frontend .dogac.dev
  mode http
  bind :8443

  ...

Also when you run dig queries on my domain now, you will see that your home IP address is hidden and instead it shows cloudflare's IP addresses. Now I can also benefit from Cloudflare's analytics on top of Google Analytics

And most importantly, I am able to visit my website directly from its domain, dogac.dev without needing any extra configuration. This allowed me to configure all my devices to directly use the domain address no matter which network I am connected to.

Conclusion

Cloudflare Tunnel is a free and secure solution for hosting your home-server. The setup was pretty straightforward and it helped me secure my home-cluster while providing a federated access both from my internal network and other external networks. Cloudflare has many other free features that I didn't have a chance to explore yet. I recommend it for any hobbyist that has a home-cluster. Let me know what you think about this post, and feel free to share any recommendations for my home cluster.

I Like Self-Hosting

2024-06-10T00:00:00+00:00

As I mentioned in my previous posts, I love open-source software. How do I prefer my open-source projects? Of course, self-hosted. I love self-hosting not because I save a ton of money on hosting or SaaS fees, but because it offers a fun and educational experience. However, in my professional life, I usually avoid self-hosting due to the responsibilities it entails and the total cost that accumulates over time. I understand it depends on individual circumstances, so it’s not right to generalize, but this has been my general experience.

Where It All Began

So, where did it all start? Probably, many of us have gone through a similar journey. I was just learning how to code and created a basic website using Bootstrap (thank you for teaching me what responsive design is). I had the webpage ready and needed to share it with my friends and family."OK, here you go C:/Users/Dogac Eldenk/Desktop/awesome_website.html". You're saying it doesn't work? I installed something called Apache, now try localhost:8000. It still doesn't work? Oh, there are internal and external IP addresses. I found mine; here you go: 100.0.0.255. It doesn't work either. Let's learn about NAT, firewalls and port forwarding. Oh, I have to pay my ISP just to forward my port? Fine, a couple bucks per month must be worth it (it was really hard to convince my dad back then). Finally, my website was online!

But who could remember those digits to visit my website? I wanted a cool domain name, so I paid for one. OK, I finally got everything in place, I thought. My server goes offline when I turn off my computer? Do people really keep their computers open all day? I don't think so. I thought the magical answer to this question was CPanel hosting. It was super cheap to host my own website on CPanel, however it had several limitations. It was constrained to PHP, HTML and MySQL in my case. So, what is the alternative? I found Digital Ocean. This website allowed me to host a VPS (Virtual Private Server) for $5 per month. As you might have guessed, I was broke because I paid yearly for hosting and a domain name and could never afforded a proper VPS for more than a month.

Of course, things have changed quite fast in the last 10 years. When I went through this, I knew basically nothing about how servers work. I was routing myself towards the shortest path to achieve hosting a website online to get only 3 clicks per month. Why am I telling this story? Because I have learned so many things just to showcase my website to my friends. I know this could have been a screenshot or screen recording, but where is the fun in that?

What About My Own Server?

So, at this point, I had a brief idea about how websites work. I also questioned how game servers worked when I wanted to spin-up my own Minecraft server locally to play with my friends. Then I found out about SBCs, Single Board Computers! I have bought my first Raspberry Pi, plugged it in and started tumbling. I learned so much when I first used it: Linux, python, hardware, networking etc… I previously had an Arduino, ESP-8266; I wrote C code to blink some LEDs, display some text on an old-school display. However none of them were as capable as real computers. So, when I met the Raspberry Pi, I realized I could connect everything together and connect it to the internet.

My first significant project was creating a weather station. I had some electronic components that measured temperature and humidity. I also had an RF radio module which communicated with the Raspberry Pi I set up in my room. Using this setup, I was running my weather station on an AA battery, which I prematurely optimized to run for years on a single battery even though I was going to use it for only a week and leave it. The station was reporting to my Raspberry Pi using RF and the Pi was hosting the data on my website online!

Of course, this was a fun little project that thought me a lot. I was still unable to do more complex tasks, for example run a Minecraft server. Also. the ARM architecture was not so popular back then, so occasionally I hit a hard wall of "x86 only" applications. So, I never actually had a server that I could use generally; it was always limited in some way.

When the COVID pandemic began, I had to suspend school and go back to live with my parents. I was bored for months; the only fun I had was playing Counter-Strike with my friends. So as a fun project, I spun up my own game server to play with my friends. To do this, I used my old laptop, which wasn't being used. I kept it open all day. However, it only lasted until summer because my room would get super hot, and the fan of that computer was giving me a hard time sleeping.

Half a year later, I returned to the College and started working part-time. With my first month's salary, I upgraded my desktop computer. During my internship in the UK, one of my colleagues, Aaron, showed me his home setup. He had a separate computer running Unraid. So I got inspired and thought I could use the remaining parts to create a home server like he did. I bought a wooden crate to put my computer parts in because I did not have an extra case.

I did not use Unraid for some reason. I installed Ubuntu on my computer (Pop OS!) and occasionally used the HDMI output to connect to my TV. For the first time ever, I had a computer at home that was capable of almost anything I could imagine, except for machine learning training for my coursework. I used this computer to host our final year project, game server, media station and even a personal cloud. I was so happy to be using this computer for my daily life.

What About Now?

Rolling forward for two and a half years, I moved to the US in the meantime. I had to leave a lot of stuff I owned back in my home in Turkey. During this period, I learned a lot about the cloud. However, I was still pretty much excited about having my own hardware. My first months in the US were rough, so I bought an Orange Pi 5 Pro and Raspberry Pi 5 to spend some time with. I probably should have only bought the Raspberry Pi; however I wanted the 16GB of RAM on my server and thought 8GB was not enough.

It felt pretty nostalgic to be working with a SBC again after 10 years. The community was much more bigger than before, and I knew much more about computers and software. I have used those two SBCs to learn about Kubernetes and and created my own Kubernetes cluster using those two. I still have some fun projects in my mind involving Pi Hole, personal Grafana dashboards, CasaOS and so on.

Later on, I was frustrated by the fact that the tutorials I followed still did not publish Docker images on the ARM architecture. I was frustrated and I wanted to do something more professional with x86, something I can use more generally, something with more memory and more importantly, disk space.

I surfed through eBay to find refurbished computers. Then I found the Dell OptiPlex 3060. This computer offers so much for only $135: specs are i5 8500T, 32GB RAM and 512GB SSD. It had twice the performance, double the memory, much much more disk capacity and speed.

This time, I won't be running barebones Linux on my server. So I installed Proxmox. Because I had so much memory and disk space, I could create multiple VMs for various tasks. First, I created 3 Ubuntu Server virtual machines. I used those virtual machines to learn about kubernetes and created a cluster using microk8s.

One day at work, I had to do some testing with an external integration and I needed a public HTTP server. Unfortunately, I am not able to port-forward devices based on IP addresses because the Xfinity app sucks. I went ahead and created a lightweight container using Proxmox without any VM and set up HAProxy. This was an opportunity for me to learn about HAProxy. I put this container behind DMZ and let it handle the traffic and forward it to my laptop. I also configured a Minecraft server and put it behind the load balancer.

Moving forward, I decided to start a blog, the one that you are reading right now. I created this blog using Ghost, an open source blogging platform. I am self-hosting this using the Dell machine inside a virtual machine. As always, it is a fun experience to go through the hassle of initial setup with software. You always learn something during the process.

Getting Serious

Now I want to get serious with my blog. As I post more stuff, I hope to get more traffic (more than 2 people per day). As I do this, availability is a concern. For example, yesterday, the electricity went off and my server shut down. The website was inaccessible for a whole day, and I did not realize it.

I was at home, so I was able to recover my blog. However, I am planning to travel this summer to visit home. So I won't be able to do any emergency recovery in that case. Also, I am not backing up my blog yet. So I am thinking about moving my blog temporarily during summer and observing how it goes.

So I have a couple of alternatives; either to keep self-hosting on the cloud with higher availability or use a hosted version of Ghost. And I chose, of course, to keep self hosting. I have done some cost analysis. To keep things simple, I have used AWS as the baseline. The server I need should be fairly minimal with a couple of gigs of RAM.

ECS: Too expensive, not even near EC2 for the smallest instance. 1 vCPU + 1 GB RAM for $30 per month.
EC2: My choice of instance is t4g.micro with 2 vCPU and 1GB RAM. It costs about $7 for a public IP, and I choose spot instances, so $3, adding up to about $10 per month.
Amazon Lightsail: This is the most traditional approach, a VPS. The pricing for this class is also much more predictable as there aren't many moving pieces around. The same setup with 2 vCPU and 1GB RAM is only $5 per month using IPv6.

Currently, my choice is Amazon Lightsail. My website is super lightweight; I only need a couple of features. I am not even thinking about using AWS managed MySQL to manage my data. I am hosting Ghost and MySQL under the same instance. Note that MySQL can run on instances with 512MB RAM out of the box.

For backing up and alerting, I am planning to set up some alerts that would send me an email regarding downtime on my server. I am also planning to back up my MySQL database to S3 daily or weekly. This seems like the cheapest and easiest option.

Conclusion

Self-hosting is fun; it teaches you a lot. It doesn't matter if the thing you do is the most efficient or productive way possible. What matters is how you get there and what values it brings. I feel like trying to do stuff on my own is an important part of my early life that taught me lots of stuff that I currently know. In this post, I have focused on servers; however, it doesn't have to be just servers. It can be writing your own X, where X is already a solved problem such as writing your own serialization format, implementing compression with Huffman Encoding, writing a Chess Engine, implementing Neural Networks from scratch, a custom JSON parser, and so on.

Building an Authorization Framework with Armeria - a Case Study

2024-06-03T00:00:00+00:00

I have been introduced to Armeria 2 years ago in 2022. Since then, it is my go-to framework for JVM based projects. Recently, I had some experience at work to build some shared authorization code in our system and I wanted to share my experience on how we built our authorization framework using Armeria by applying it to a theoretical scenario.

Case Study: Blog Application

Let's start by describing a theoretical scenario. We have a Blog website, in this website there will be members and authors. Members can subscribe to authors. Authors can write blog posts. Authors can change the visibility of each blog post to public, members-only or subscribers-only.

First issue, authentication. In today's standards, OAuth2 tokens are a pretty common way to authenticate. Let's assume our application uses OAuth2 JWT tokens. Armeria allows us to Decorate our code using Decorators. Let's create a decorator that requires a valid OAuth2 token.

val ACCESS_TOKEN_KEY: AttributeKey<Token> = AttributeKey.valueOf("access_token")

class RequireAccessToken : DecoratingHttpServiceFunction {
override fun serve(
delegate: HttpService,
ctx: ServiceRequestContext,
req: HttpRequest,
): HttpResponse {
val token: String = ctx.request()
.headers()
.get("Authorization")
.removePrefix("Bearer ")

        val claims = MyJWTVerifier.validate(token)

        return if (claims.isValid) {
            ctx.setAttr(ACCESS_TOKEN_KEY, claims)
            delegate.serve(ctx, req)
        } else {
            HttpResponse.of(HttpStatus.UNAUTHORIZED)
        }
    }

}

A decorator that mandates an access token

This decorator does two things:

Ensure there is a valid JWT token issued. (Implementation of MyJWTVerifier is up-to-you).
Inject claims parsed from the JWT token to the request context.

Number 1 is obviously required to make an endpoint protected. Number 2 will be used to authorize using actors and relations (ABAC, RBAC…) in upcoming section. So, let's go ahead and apply this decorator.

@Decorator(RequireAccessToken::class)
class BlogPostController {

    @Get("/blog_posts")
    suspend fun listBlogPosts(): List<BlogPost> { ... }

    @Post("/blog_posts")
    suspend fun createBlogPost(body: CreateBlogPostBody): BlogPost { ... }
}

Now, using the Access Token decorator, we have enforced all requests coming to our controller to have a valid JWT token. Note that this is an annotated service, however the same decorator will work for other HTTP services even including gRPC services.

In this implementation, some requirements are not met. For example, there are blog posts that are publicly visible. To fix it, we need a graceful way to inject token metadata into the request context.

class MaybeAccessToken : DecoratingHttpServiceFunction {
    override fun serve(
        delegate: HttpService,
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse {
        val token: String = ctx.request()
                                .headers()
                                .get("Authorization")
                                .removePrefix("Bearer ")

        val claims = MyJWTVerifier.validate(token)

        if (claims.isValid) {
            ctx.setAttr(ACCESS_TOKEN_KEY, claims)
        }

        return delegate.serve(ctx, req)
    }
}

This slight modified decorator will not throw an Unauthorized exception when there is no token present. Let's modify our controller to accommodate this change.

class BlogPostController {

    @Get("/blog_posts")
    @Decorator(MaybeAccessToken::class)
    suspend fun listBlogPosts(): List<BlogPost> {
    val token = ServiceRequestContext.current().getAttr(ACCESS_TOKEN_KEY)

    // Only public endpoints
    if (token == null) {
        return BlogRepository.listPublicBlogPosts()
    }

    val subscribedAuthors = Subscriptions.getForUser(token.userId).map { it.authorId }

    return repository.listAllBlogPosts(subscribedAuthors)
    }

    @Post("/blog_posts")
    @Decorator(RequireAccessToken::class)
    suspend fun createBlogPost(body: CreateBlogPostBody): BlogPost { ... }
}

Now, let's add a slight twist to this scenario. Let's assume this Blog website was created long ago and it also has a mobile app. In the mobile app, instead of using JWT tokens, we were using username and password header (Yikes!). Even though this is not desired, some real world applications might need to support their legacy code for different reasons. In this example, the application was created long ago and it did not have OTA updates. So even if we migrate to JWT in the mobile app, to keep serving our old users, we need to keep supporting their way of authorizing.

Let's modify the decorator to take the username, password header into account.

val USER_ID_KEY: AttributeKey<UUID> = AttributeKey.valueOf("user_id")

class MaybeUser : DecoratingHttpServiceFunction {
    override fun serve(
        delegate: HttpService,
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse {
        val jwtToken: String = ctx.request()
                                .headers()
                                .get("Authorization")

        if (jwtToken.startsWith("Bearer")) {
            val claims = MyJWTVerifier.validate(jwtToken.removePrefix("Bearer ")
            ctx.setAttr(USER_ID_KEY, claims.userId)
        }

        if (jwtToken.startsWith("Basic")) {
            val userNameAndPassword = jwtToken.removePrefix("Basic ").split(":")
            val userId: UUID? = Users.check(userNameAndPassword[0], userNameAndPassword[1])
            ctx.setAttr(USER_ID_KEY, claims.userId)
        }


        return delegate.serve(ctx, req)
    }
}

Now, with this new decorator, we have delegated the business logic for finding out which user made to call outside the controller. We basically wrap our controller with a single annotation and it magically injects the calling user into the context.

Suspend Calls in Decorators

We most likely need to make suspending calls from decorators to do certain checks such as database calls, network calls etc. This includes user login check and maybe JWT verification. As you might have noticed, it is currently not possible to create suspend decorators (issue to track). So to achieve this, we can use the event loop as our dispatcher.

val future = CoroutineScope(ctx.eventLoop().asCoroutineDispatcher()).future {
    // Can call suspend functions here
    UserRepository.login(...)
    HttpResponse.of(HttpStatus.OK)
}

return HttpResponse.of(future)

Note that by using event loop, you should ensure your suspend functions are following the best practice and they can be safely called from the main thread without blocking it. otherwise you should use some other dispatcher, i.e. blocking task executor or Dispatchers.IO.

Handling Dependency Injection

As you might have noticed, our repositories were assumed to be objects for simplicity in the first examples. However, in real world application, dependency injection frameworks such as Koin is being widely adopted. For example with Koin, we can mark a decorator as KoinComponent.

class MaybeAccessToken : DecoratingHttpServiceFunction, KoinComponent {
    private val jwtVerifier by inject<JWTVerifier>()

    override fun serve(
        delegate: HttpService,
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse { ... }
}

Custom Annotations and Parameters

Sometimes a decorator might be generic and it might need to take parameters. For example, let's say MaybeUser annotation can be constrained to only a certain types of users. Such as subscriber, member or visitor. We want something like the following,

@RequireUser(allow = ["subscriber", "member"])
@Post("/blog_post/{id}/like")
fun likeBlogPost(@Param id: String) { ... }

To achieve this functionality, we can't user @Decorator(...) approach because it does not accept parameters. Instead, we should use a @DecoratingFactoryFunction.

@DecoratorFactory(RequireUserDecoratorFactory::class)
annotation class RequireUser(val allow: Array<String> = [])

class RequireUserDecorator(delegate: HttpService, allow: Array<String>): SimpleDecoratingHttpService(delegate) {
    override fun serve(
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse {
        val token = MyJWTVerifier.verify(ctx.request()
                                            .headers()
                                            .get("Authorization"))


        if (token == null || token.groups.containsAll(allow).not()) {
            return HttpResponse.of(HttpStatus.UNAUTHORIZED)
        }

        return unwrap().serve(ctx, req)
    }
}

class RequireUserDecoratorFactory: DecoratorFactoryFunction<RequireUser> {
    override fun newDecorator(parameter: RequireUser): Function<in HttpService, out HttpService> {
        return Function { RequireUserDecorator(it, parameter.allow) }
    }
}

By including this, Armeria automatically detects whenever the @RequireUser annotation is applied to a controller / service and it automatically decorates it with RequireUserDecorator.

Authorized By Default

Let's add an authorization-by-default semantic into our application. Adding auth by default ensures sensitive applications to not leak data by mistake. The challenge with this approach is that the authorization decorator will be the top most decorator however overriding this behavior in method level is though. So, we should slightly modify our decorators to be more aware of each other. Let's define our syntax as the following,

@NeedsAuthentication
class MembersController {

    @PublicEndpoint
    fun getMemberCount(): Int { ... }

    @Get("/members")
    fun getMembers(): List<...> { ... }
}

// Or alternatively...

Server.builder().decorator(NeedsAuthentication.newDecorator())

Here, the decorator @RequireAuth will be applied first. However we should override the behavior there using @Public. So, let's define our annotations.

@DecoratorFactory(NeedsAuthenticationDecoratorFactory::class)
@Target(AnnotationTarget.FUNCTION, AnnotationTarget.CLASS)
annotation class NeedsAuthentication

@DecoratorFactory(PublicEndpointDecoratorFactory::class)
@Target(AnnotationTarget.FUNCTION, AnnotationTarget.CLASS)
annotation class PublicEndpoint

Let's define our services. Here, public endpoint service is only a dummy service used as a marker.

class NeedsAuthService(delegate: HttpService): SimpleDecoratingHttpService(delegate) {
    override fun serve(
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse {
        val token = ServiceRequestContextAuthChecker.getAccessToken()

        if (token == null || token.groups.containsAll(allow).not()) {
            return HttpResponse.of(HttpStatus.UNAUTHORIZED)
        }

        return unwrap().serve(ctx, req)
    }
}

// A dummy service used as a marker
class PublicEndpointService(delegate: HttpService): SimpleDecoratingHttpService(delegate) {
    override fun serve(
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse {
        return unwrap().serve(ctx, req)
    }
}

So, why the marker? We basically need to find a way to figure out if a service is annotated using @PublicEndpoint annotation. If so, we should conditionally not apply the auth decorator. This factory function also eliminates the duplicate auth checks by trying to down cast the delegate to NeedsAuthService once more.

class NeedsAuthDecoratorFactory : DecoratorFactoryFunction<NeedsAuth> {
    fun newDecorator(): Function<in HttpService, out HttpService> {
        return Function { delegate ->
            val maybePublic: PublicApiService? = delegate.`as`(PublicApiService::class.java)
            val maybeAuthenticated: NeedsAuthService? = delegate.`as`(NeedsAuthService::class.java)

            if (maybePublic != null || maybeAuthenticated != null) {
                return@Function delegate
            }

            NeedsAuthService(delegate)
        }
    }
}

class PublicEndpointDecoratorFactory : DecoratorFactoryFunction<PublicEndpoint> {
    fun newDecorator(): Function<in HttpService, out HttpService> {
        return Function { delegate -> PublicEndpointService(delegate) }
    }
}

Bonus: Open Policy Agent (OPA)

As a bonus, let's use the popular policy language OPA to authorize our system. Recommended way to authorize using OPA is using the Envoy sidecar with an external authorization filter. However, this scenario might be not sufficient or not available at all if you are not using Envoy.

Source: https://www.openpolicyagent.org/docs/latest/envoy-introduction/

So we can create a decorator that will intercept all requests coming to our service at the top level and checks access.

class Authorize : DecoratingHttpServiceFunction {
    override fun serve(
        delegate: HttpService,
        ctx: ServiceRequestContext,
        req: HttpRequest,
    ): HttpResponse {
        val path = ctx.routingContext().path()
        val method = ctx.routingContext().method()
        val token = ctx.authorizationHeader.removePrefix("Bearer ")

        // More contextual data can be added as desired
        val result = OPAClient.check(mapOf("path" to path, "method" to method, "bearer_token" to token))

        if (result.authorized) {
            return delegate.serve(ctx, req)
        }

        return HttpResponse.of(HttpStatus.UNAUTHORIZED)
    }
}

Wrap Up

In this blog post, I have covered how Decorators can be a useful building blocks for your application's authorization framework. They are flexible, customizable and allow you to separate concerns for various tasks such as authentication & authorization into a different layer. I hope this was an inspiration for you to use decorators. Please let me know if you liked this article and if so please subscribe to be notified about future articles.