Realtime analytics with Go + NATS
Published by Ray Barrera on May 15, 2025
Introduction
In the Brighter platform, the student-facing games and activities generate several events: general interaction events, game events that describe progression (or lack thereof) in a particular game, and engagement events. Individually, these events lack the full context of what the user experienced, and in aggregate, they are too noisy to paint a useful picture. Specifically, we needed a solution to use these events to power insights for our parent and educator monitoring tools and, most importantly, to create a feedback loop for the learning platform to adapt in real time to the student's progression. Let's do a high-level overview of how we have solved this problem.
Initial Findings
In the most generic sense, this is an analytics-shaped problem, so logic would dictate that you turn to off-the-shelf analytics solutions. There are several, like Firebase Analytics, for example, but these are generally designed for business intelligence use cases. Some do offer segmentation and cohorting tools to drive dynamic variables or A/B tests in your applications, but they generally don't provide a way to expose the metrics to your end user, so that rules these solutions out for our use case.
The next logical place to look would be a more general-purpose data platform, like Snowflake or Google Analytics + BigQuery, for example. You can then store, transform, and deliver the data via some flavor of SQL to a visualization tool, like Streamlit. For 99.99% of use cases, this amount of abstraction and complexity (let's not even talk about cost) is overkill. There is a broader philosophical argument to be had here about highly specialized analytics and data disciplines doing routine application work, but we can save that for another time. For now, let's rule these out on technical merits:
- Our use case has relatively low cardinality, meaning the number of dimensions we're dealing with is small.
- We have full control over the entire data pipeline, including the shape and contents of the data.
- Our application needs real-time access to the data, so we need to couple a synchronous process to an asynchronous one.
Looking at our needs through this lens and ruling out more complex or abstracted options allowed us to focus on thinking outside the box to arrive at a potential solution. Enter NATS and NATS JetStream.
Our Solution
You can get all the details about NATS on their website, but the tl;dr is: NATS is a lightweight message bus. JetStream builds on top of NATS (and its key-value store) to add scalable data storage and processing capabilities. I'll let you read the manual for the details, but for now, let's see how NATS features map onto our use case and the data lifecycle more broadly.
Event Sourcing and Ingestion: This is the most basic part of NATS, being a message bus and all. To abstract this away from our applications (the event sources), we deploy a "producer" app, which exposes an HTTP endpoint to send events. There are some basic metadata fields that are required, but generally speaking, the API is simple. The server receives an event, then pipes it into NATS.
Extract-Transform-Load (ETL) and ELT: At present, we deploy a single processor app written in Go that embeds NATS, so it can subscribe to the incoming events. NATS's filtering capabilities allow you to mux or demux the events as needed and/or process them on the spot. Because JetStream is a key-value store under the hood, it's at this stage that we can load the processed events for aggregations into a time-bucketed key.
Because NATS is a distributed system, you have some flexibility here. You can scale horizontally to add more processors to keep your system boundaries clean, or you can scale vertically. Generally speaking, you'll achieve durability via replication, but you can decouple that need from the logical processing of data. Because we've chosen Go for this use case, which provides a concurrency-freindly runtime, you benefit greatly from adding more cores to a single node.
Queries and Analysis: This is perhaps the biggest paradigm shift compared to the status quo. I happen to share Uncle Bob's general disdain for SQL. Because we're using a key-value store, you can just bolt on an API to get the events you care about. I'm partial to the use of the repository pattern, but that's an implementation detail. For use cases with low cardinality, you'd be best served by adding another processor to generate metrics, but if you really need to dig into data across all its dimensions in a very ad-hoc way, then you can always pipe the results or raw events to a columnar database for analysis. In that sense, NATS still provides a ton of value, even if only as a simple distributed event bus.
Prior and ongoing testing shows the ability to easily process thousands of events per second in real time on a very modest compute instance. Some work will be needed to manage storage and backups, but this solution is so cost-efficient that we can easily defer this work until it becomes a problem. The time-bucketing of the metrics gives us an escape hatch to distribute storage if need be.
Conclusion
This solution balances leaning on battle-tested technology (NATS) with a bit of elbow grease in hosting your own compute to create a solution that is not only cheaper but ultimately performs better within the constraints of the project. This won't be a catch-all for every use case. You may sometimes deal with disparate data sources you don't control or have to do analytics on high-cardinality datasets. This solution would be fairly unpleasant to make work once you get into the hundreds of thousands of columns. So, is this something you should try? Maybe! Working with NATS is the easy part—you'll need to decide what your tolerance is for managing your own infrastructure. Even then, you may be willing to use Synadia Cloud (Synadia is the primary maintainer of the NATS project) and let them handle the NATS side of things. If you want to learn more about NATS, check out the Synadia YouTube channel—they have fantastic learning resources. As of this writing, Brighter isn't quite ready for launch, but if you're curious to follow the product, keep an eye on the Brighter website. That's all for now ✌️
Less Is Powerful
Published by Ray Barrera on August 15, 2024
I'm not just saying "Less is more". I'm saying Less is sometimes just less, and that's OK.
Sometimes, Less gives you the clarity to see what's important.
Less can help shine a light on the hidden gems.
At worst, Less can help you truly understand what you need more of.