Validators receive staking rewards based on their performance. That's why it's important to be aware of your validator's health, and check up on it so you don't miss out. Validators are rewarded if at the end of their staking period they are judged by peers (other network validators) to have been online and correct for more than 60% of the validation time.
To check you can use explorers like Avascan and VScout, as well as monitoring services like AllNodes. But, those services can go offline, have incomplete data or have a lag in updating so it's good to be able to check for yourself.
There are two main components of a node's health you should check:
Connectivity refers to the ability of other nodes to connect to your node's public IP on port 9651, which is used for peer-to-peer communication. Without it, your node may not be seen as reachable by other nodes, and they may judge you to be offline. To check, there are online tools that help with that. One of them is https://ismyportopen.com/. Navigate there with your browser, enter your public IP, port 9651, and press the Check button. If you don't see a green checkmark and the message 'port is open' you need to look into your node networking setup and correct it.
Internal health refers to correct node operation. There is a health API call that can be used to check on it. You can issue API calls through the command line, or using the Postman Avalanche collection, which is much easier to use once set up. After executing the call, examine the results.
First, check that
"healthy" attribute returns
true. If that is so, your node is healthy, you're done. If it returns
false, you need to examine other the output under the
"checks" attribute for more information, examining the parts as follows:
router: if you see failures in this block, that is usually an indication of a connectivity issue. Check that you have a stable connection of at least 5Mbps, especially in the upstream (upload) direction.
network: failures in this block can also be caused by connectivity issues.
isBootstrapped: this is an indication that the node has not finished bootstrapping yet. Your node will not be able to participate in validation decisions until the bootstrap process is over. Note that bootstrap can take up to three days, depending on your HW configuration, as well as your internet connection speed.
"X"/"P"/"C": These attributes indicate failures in particular chains of the Primary Network. Causes of problems might be connectivity issues, but also the saturated CPU, or too slow disk drive, which has the effect of your node lagging behind in message processing.
Troubleshooting problems with your node, besides the basic networking checks and checking OS tools to make sure the CPU is not overwhelmed, can be made much easier by installing the node monitoring toolset. That will provide deep and continuous insight into your node operation and health, by constantly recording the node metrics and displaying them in functional dashboards. If you really want to be able to know what is going on with your node, that is the best possible solution.