Taking the new Kafka-compatible streaming platform WarpStream for a spin with Kadeck
A few weeks ago, WarpStream announced a new data streaming platform that is compatible with the Apache Kafka® protocol and runs directly on top of Amazon S3. In an Apache Kafka cluster, brokers must constantly replicate their storage among themselves to ensure that data is not lost in the event of a failure. This results in significant network traffic, which incurs costs. WarpStream claims that by moving data storage from individual brokers to S3, network costs between zones on AWS are eliminated. According to WarpStream, these costs can account for 80% of the infrastructure costs of a large Kafka deployment.
WarpStream thus makes Apache Kafka more affordable and reduces complexity. Data streaming, and Apache Kafka in particular, is a foundational technology that is enabling new business models, business processes, and an unprecedentedly connected society - topics I write about in my LinkedIn channel or on X.
However, data streaming technology is still not mass-market ready, so any new vendor that makes it more affordable or, like Kadeck, easier and more accessible, is welcome.
To learn more about WarpStream, I contacted WarpStream co-founder Richard Artoul on the WarpStream Labs Community Slack channel.
Q: What inspired you to develop WarpStream?
“When we were at Datadog, we built Husky. A columnar store purpose-built for observability data that ran directly on top of object storage. When we were done, we had this magical auto-scaling data lake that was super easy to operate, and never ran out of disk space. It was also a lot more cost effective than the system it replaced. By comparison, our Apache Kafka clusters suddenly looked downright ancient. We were spending a fortune on inter-zone networking, and while we had built up a lot of good tooling and automation for administering the clusters, it paled in comparison to S3.
Almost all modern data lake technology is built directly on top of object storage, and we felt like data streaming technology should be the same. Our goal with WarpStream is to make stream processing as cost effective and easy as processing files in S3!
Q; Is WarpStream a transformational technology?
“We think so. When we explain it to people, they’re usually like: ‘ah yeah, that makes sense’. It feels like such an obvious idea, like `yeah obviously data streaming technology should separate storage from compute, leverage object storage, be 10x cheaper, and have almost zero operations’. But it’s actually a really tricky problem to solve.
I’ve talked about this before, but at this point building a data lake on top of object storage is a “solved” problem. Many people know how to do it and there are tons of competing implementations out there. Dump files into object storage, track them in a highly scalable metadata store, point a query engine at the whole thing. Compact periodically. Add caching to taste.
On the other hand, figuring out how to provide the same semantics as Apache Kafka without introducing any local disks into the system and whilst keeping end-to-end latency low… that problem is a lot less well understood.
We think the fact that WarpStream can provide the same API and semantics as Kafka, while reducing costs by 80%, and without introducing any local disks for users to manage is really transformational and is going to significantly increase the spectrum of use-cases where data streaming can be used.
Q: Why is compatibility with Kadeck important to you and your users?
“Kadeck has been very helpful to us during the development of WarpStream. The visibility it provides helps us troubleshoot development and support issues, and it's also helped us test our Kafka compatibility. And for our users, I think WarpStream and Kadeck give companies an edge in the real-time innovation race by making it easier and more affordable to put streaming data to work in more ways.”
WarpStream is a Kafka protocol compatible system, and Kadeck can connect to any Kafka compatible system, so we took them for a quick spin. The good news is that they work well together. Here’s how you can try them out:
Get Kadeck and WarpStream (both offer “free-forever”)
As of October, 2023, WarpStream is still in developer preview, but you can try out their “free forever” tier completely self-serve. And Kadeck can be used for free of course.
Connect to WarpStream from Kadeck
Kadeck supports Apache Kafka as well as Kafka protocol-compatible systems like WarpStream. Connecting to the WarpStream demo cluster from Kadeck was straightforward:
Exploring streaming data in WarpStream with Kadeck
After connecting to WarpStream, I started exploring topics and consumers in the cluster. I began in the Kadeck Data Catalog, which enables people to organize, browse, and search topics. I found several topics, as shown below:
I drilled into one of the topics to see the data within it. The Topic Overview dashboard showed all the details I expected to see like the topic’s size, owner, documentation, and recent data records:
The Topic Overview dashboard also shows me a histogram of data stream size over time:
Clicking the Kadeck Flow View shows how data flows from producer to consumer groups for this topic. The Flow View makes it easy to see how records are sourced and how they flow to different consumers:
I drilled down to the Consumer Group Details dashboard, which shows offsets, lag, and member status for each of the consumer groups associated with my topic:
Overall, Kadeck and WarpStream integrate well; they’ve done a good job of implementing the Kafka compatibility protocol. We are committed to supporting WarpStream and look forward to working with them to ensure compatibility going forward.
If you need help using Kadeck and WarpStream together or have feedback for us, just reach out on the WarpStream community Slack channel. Enjoy!