Learn how to find stalled or lagging consumers and reset Kafka consumer offsets
If the pipeline stops because of a failed test or faulty records, it is necessary to reset the offsets of a consumer or consumer group. In this article, you will learn how to find stalled or lagging consumers and, if necessary, reset Kafka consumer offsets.
Be it a failed test case, faulty data records, or an error in the consuming application itself: if the consumer does not continue, the data pipeline stops, and all downstream processes no longer receive any data. In such a case, there is really only one option: the data records that lead to the problem must be skipped. Doing so requires you to reset the consumer offset.
A Kafka consumer offset is a placeholder that automatically keeps track of where the un-fetched records in the data stream begin. When a consumer fetches data records from the stream, it begins retrieving data from the offset location as the starting point. The offset can be edited to rewind (and refetch data records) or to fast-forward (and skip data records).
The process for resetting a Kafka consumer offset is as follows:
The first step is to find out which consumers might need their offsets reset. This is done via the Consumer Groups dashboard in the Kafka GUI client. There are several ways to find faulty or stalled consumers:
The Consumer Groups dashboard makes it easy to spot consumers that are in trouble. As you can see on the right side of the dashboard below, consumers that are experiencing very high lag are colored red:
If you are running Apache Kafka version 2.4.0 or higher, you can view a list of all consumer groups and their lag via the console application “kafka-consumer-groups“. The command for this is:
If you are not yet running version 2.4.0 or later or if you do not have access via the console application, the "--all-groups" CLI parameter will not be available to you. If this applies to you, then use the Kafka GUI client approach described above (download the free Kafka GUI client).
If you already know the consumer group, you can slightly modify the above command to retrieve the lag of a known consumer group (in the example: “myConsumerGroup“):
Once you have found the consumer or consumer group that you want to reset, you must make sure that the consumer group is not active: all applications in the consumer group must therefore be shut down. Otherwise, the offsets of a consumer or consumer group cannot be changed.
Next, you click the edit (pencil) icon next to the offset value in the affected consumer. Then you change the offset to the desired value and restart the consumer group.
Using the console application “kafka-consumer-groups“, reset the offsets as follows:
Instead of –to-earliest, which causes the Kafka consumer to be set back to the beginning, –to-latest can be used to set the consumer to the end of the stream. Thus, all messages not consumed so far will be skipped. Alternatively, you can reset the consumer offset to specify a time:
The ability to reset consumer offsets is a very useful troubleshooting feature in Apache Kafka. I strongly suggest using a Kafka GUI client to monitor consumer health and to make it easier to reset consumer offsets.
If you found this article helpful, you might also enjoy this article on Controlling Access to Kafka Consumers
Feel free to let me know if you have questions (Twitter: @benjaminbuick or the Xeotek team via @xeotekgmbh)!
Ben