Introduction

The Terra network operates a set of proof of stake validator nodes to provide security for the network. Validators are judged on their uptime and on the quality of their validation - any deviations from expectations can result in slashing and loss of funds for the validator operator. This post explores some uptime & downtime insights of the Terra validator nodes.

How often do Validators go down?

How often do validators that have been online at least once in the past 6 months turn off? It turns out, that on average, it happens pretty often - once every 1.6 hours for each validator (see table below). This simple statistic doesn't really tell the story though - there are some very reliable validators and some rather unreliable ones making up this average.

We took a dataset of all validators who collected rewards over the last 6 months, and looked for the liveness events recorded on chain. These events are collected by the protocol and used in slashing calculations for nodes that haven't met their validation requirements. These events keep a record of how many blocks a validator missed validating.

Taking these liveness events and looking at how many, on average, each validator has had over this time (adjusting for how long the validator has been running), we can calculate an average time between downtime events for each validator. Another name for this metric is Mean Time Between Failure (MTBF) - a common metric in Asset Management. We can plot the MTBF for each validator and see what the distribution looks like below.

MTBF
1 1.6

We see there is a large variation between validators, pointing towards the conclusion that there are a number of very professional outfits, and potentially some less skilled or resourced node operators on the network. There are 4 validators who operate with an MTBF of greater than 60 hours (one downtime event every 2-3 days) and there are nearly 50 who operate at less than 2 hours between each downtime event.

Change in MTBF Over Time

To see how the MTBF metric has changed over the last 6 months, we plot the MTBF (per validator) over time below. We can see there has been a marked decrease in MTBF over the 6 months, with early weeks as high as 6 hours between downtime events, trending down to less than 1 hour per event per validator. This indicates a decrease in the overall reliability of the validator nodes over this time period.

High Performing Validators

We saw earlier that there was a distinct difference between the best and worst performing validators. The graph below segments the data to just look at the top 10 best performing validators (by number of downtime events relative to their operating time) over the last 6 months. Here we also see a downward trend. Earlier, MTBF ranged in the hundreds of hours between failures. This has trended down to around 30 hours between dowtime events. Notice that this is more than an order of magnitude better than the current average of around 1 hour.

Low Performing Validators

Performing the same analysis on the worst 10 validators (by downtime events) we can see why the average is so low. The most recent week average MTBF for this group is around 8 minutes, and there is a distinct downtrend from 6 months ago when this number was as high as 1 hour. This mirrors what we have seen in the rest of the dataset - reliability is trending downwards.

Conclusions

The data above has explored the downtime events among Terra validators. We have seen that, on average over the entire validator set, validators miss blocks around once every 1.6 hours. There is a vast difference between validators however - the best 10 validators currently miss blocks every 30 hours and the worst 10 miss one every 8 minutes. Across all datasets there has been a trend of decreasing reliability compared with 6 months ago.