Correcting Data Delivery Issues in Apache Kafka

Learn an easier way to correct and reprocess messages in a Kafka dead letter queue

Correcting Data Delivery Issues in Apache Kafka

From time to time, your topic consumers might receive messy data. This is often due to improper message formatting or incorrect serialization/deserialization. In these cases, you can program the consumer to halt, or to ignore the data, or save the data for correction and reprocessing.

In this post, I’ll demonstrate an easy way to correct and reprocess erroneous data using Kadeck’s visual topic management dashboard (get Kadeck for free).

Best Practice: Send bad data to a Kafka Dead Letter Queue (DLQ)

The first task is to store bad data so it can be analyzed, corrected, and reprocessed. The recommended way to do this is by creating a dead letter topic. The consuming application uses this topic to store messages it cannot process so someone can examine them, make corrections, and probably make a code change somewhere in the data pipeline to prevent the error from recurring. It’s also good practice to facilitate troubleshooting by adding an error message attribute to the record as you store it in the DLQ.

Let’s assume a system that is streaming financial transactions has a dead letter queue in place, and you’ve been asked to correct the erroneous data. Having a visual user interface (UI) for your Kafka system and data makes this easy; here’s how.

Find the dead letter queue!

The erroneous financial transactions are in a dead letter topic. In an enterprise setting, you might have hundreds of topics, which would make finding the failed messages difficult. Kadeck solves this problem by organizing your topics and custom views of topic data in a Data Catalog.

Finding the dead letter queue

You can search the Data Catalog by data source, data owner, custom labels, or by keywords. Since we are correcting financial transaction data, I can choose the “finance” label, and my catalog view is narrowed to the topics in finance. The error queue is easy to spot. I’ll click that to explore the data within the dead letter queue.

Analyze the messages in the DLQ

In the Topic Browser dashboard, we can see the erroneous messages in this dead letter queue. You can filter records within the view and even modify them via an editor or make mass changes in bulk using Javascript.

Analyzing dead letter queue records in the Kadeck Topic Browser

In our example, each financial transaction in the DLQ has an error message attribute, which you can click to add it as a column in the view. Upon doing so, it is clear that all messages in the topic have the same error, a malformed account ID. If there were multiple errors, we could filter the messages for this specific error and save it as a new view, which would make it easy to focus on correcting each separate issue.

We can save this view, and then switch to the topic containing the clean financial transactions. There we see that correctly formed Account IDs contain a 2-character country code, which is missing from the records in the dead letter channel. Let’s return to the DLQ in the Kadeck Topic Browser and correct the errors.

Fix the data using the Quick Processor

Now that we know why the data could not be processed, it is time to fix it. To do this, we use the Quick Processor tool at the top of the Topic Browser. Through the quick processor you can easily add, delete or change record attributes using JavaScript. In this case, we use it to prepend the country code to the Account ID field.

Correcting dead letter queue messages using Javascript

Reprocess the corrected records

After executing the Javascript in the Quick Processor, we see that the DLQ messages now have correctly-formatted account ID attributes. Reprocessing them is easy; just select the corrected messages in the view and click “Produce.” We are prompted for a target topic into which the corrected records will be sent, and we choose the original financial transaction stream.

Reprocessing corrected dead letter queue records

After reprocessing, if the corrected records are not needed anymore, they can be deleted by right-clicking on the last entry and select "Delete up to here" from the context menu.

Share the view with team members

Now that the data has been corrected and reprocessed, Kadeck makes it easy to find the team members who are responsible for the financial transaction producer, and inform them of the issue. They can modify the producer, and eliminate the cause of the errors.

The Flow tab at the top of the Topic Browser shows the name of the system that produced the erroneous messages and the people responsible for it’s development. You can share the Topic Browser view with them to help them begin and facilitate the software change process.

Explore the Kadeck Topic Browser for free

Fixing data delivery issues is extremely quick and easy with Kadeck. There are many uses for the Kadeck Topic Browser and Quick Processor, and we invite you to explore them. The Kadeck visual management and collaboration tooling can take your Apache Kafka, Redpanda, and Amazon Kinesis development, troubleshooting, and monitoring to the next level.

Get started with Kadeck - free forever!

Correcting Data Delivery Issues in Apache Kafka

Software architect and engineer with a passion for data streaming and distributed systems.