Introducing the Cosmos Operator

Nov 21, 2022

David Nix

Strangelove is excited to announce the release of a new open-source product, the Cosmos Operator.

This product is a Kubernetes Operator that should work for the majority of blockchains built with the Cosmos SDK.

The Cosmos Operator allows you to create highly-available, fault-tolerant RPC node deployments quickly and easily within a Kubernetes cluster. Long-term, we plan to support validator sentries, persistent peers, and seed nodes as well.

Through a single abstraction, the Cosmos Operator unifies and simplifies:

  • Setting reasonable configuration defaults

  • TOML wrangling

  • p2p identity

  • Downloading genesis files

  • Restoring from a snapshot

For example, this is all you need to deploy a highly available RPC node for the Cosmos Hub:

apiVersion: cosmos.strange.love/v1
kind: CosmosFullNode
metadata:
name: cosmoshub
spec:
replicas: 3
chain:
network: mainnet
chainID: cosmoshub-4
binary: gaiad
genesisURL: "<https://github.com/cosmos/mainnet/raw/master/genesis.cosmoshub-4.json.gz>"
snapshotURL: "<https://snapshots1.polkachu.com/snapshots/cosmos/cosmos_11701512.tar.lz4>"
config:
seeds: "cfd785a4224c7940e9a10f6c1ab24c343e923bec@164.68.107.188:26656,bf8328b66dceb4987e5cd94430af66045e59899f@public-seed.cosmos.vitwit.com:26656,d72b3011ed46d783e369fdf8ae2055b99a1e5074@173.249.50.25:26656"
app:
minGasPrice: "0.0025uatom"
podTemplate:
image: "ghcr.io/strangelove-ventures/heighliner/gaia"
volumeClaimTemplate:
storageClassName: "premium-rwo"
resources:
requests:
storage: 500Gi

Note: The snapshot url is outdated and changes by the day. You will need to update it to point to a recent, valid URL.

Why Kubernetes?

Strangelove leverages Kubernetes for almost all of our infrastructure. Kubernetes provides well-known, battle-tested DevOps patterns and abstractions, thus minimizing "reinventing the wheel" as common in DevOps.

Additionally, the Operator Pattern allows mixing business logic with infrastructure, which offers powerful leverage in maintaining many different deployments while minimizing human intervention.

The long-term vision is "configure it and forget it."

The Flagship CRD: CosmosFullNode

Kubernetes allows you to extend its API, defining your own objects via a Custom Resource Definition, often abbreviated to CRD.

The primary CRD for the Cosmos Operator is the CosmosFullNode.

Once the Cosmos Operator is installed in your cluster, creating RPC nodes is as simple as:

kubectl apply -f path/to/fullnode.yaml

Examples:

CosmosFullNode is stable with API version v1. Any changes will not break backward compatibility, but we may occasionally deprecate fields.

Strangelove has been successfully running CosmosFullNode in production for many weeks.

Tame the TOML

Configuring Cosmos nodes is difficult and error-prone. (Lots of nuanced TOML across several files!)

You never have to endure the madness of sed again:

# Never again
sed -i '/\\[api\\]/,+3 s/enable = false/enable = true/' app.toml

CosmosFullNode exposes common configuration in a more natural hierarchy, and updates the correct TOML files for you.

Take pruning as an example:

apiVersion: cosmos.strange.love/v1
kind: CosmosFullNode
# ...
spec:
chain:
app:
pruning:
strategy: "custom"
interval: 17
keepEvery: 1000
keepRecent: 5000
minRetainBlocks: 10000

Or tweaking configs not exposed by the operator:

apiVersion: cosmos.strange.love/v1
kind: CosmosFullNode
# ...
spec:
chain:
app:
overrides: |-
# Add valid toml here for app.toml
[tx_index]
indexer = "null"

Installing Cosmos Operator

Simply `git-clone` the repository locally and run `make`.

git clone <https://github.com/strangelove-ventures/cosmos-operator.git>
cd cosmos-operator
# Switch to the appropriate kuberentes cluster.
make deploy IMG="ghcr.io/strangelove-ventures/cosmos-operator:$(git describe --tags --abbrev=0)"

In the future, we plan to include a helm chart. That way, you will not need to clone the repo.

What about Ingress?

An RPC node isn't much use if no one can reach it on the public internet.

There are a variety of ways to configure Ingress within Kubernetes. Therefore, the operator cannot configure Ingress for you. However, it creates a Kubernetes Service and allows customizing that service to ease Ingress configuration.

The Service exposes all RPCs and APIs. However, you control whether or not Ingress maps to a port.

Here's an example for Google Cloud's Ingress controller on GKE to expose the RPC and LCD endpoints.

apiVersion: cosmos.strange.love/v1
kind: CosmosFullNode
# ...
spec:
service:
rpcTemplate:
metadata:
annotations:
cloud.google.com/backend-config: '{"default": "cosmoshub-lb-backend"}'
cloud.google.com/neg: '{"ingress": true}'
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: gce
name: cosmoshub-ingress
spec:
rules:
- host: rpc.cosmoshub.strange.love
http:
paths:
- backend:
service:
name: cosmoshub-rpc
port:
name: rpc
path: /
pathType: Prefix
- host: api.cosmoshub.strange.love
http:
paths:
- backend:
service:
name: cosmoshub-rpc
port:
name: api
path: /
pathType: Prefix

Conclusion

We hope the Cosmos Operator encourages other Kubernetes aficionados to deploy nodes, thus growing and securing the network.

You can find additional information in the official README, including a rough roadmap.

Read More Like This


Interchaintest v8.1 Release Notes

Interchaintest v8.1 Release Notes

Feb 06, 2024

Local-Interchain: Launch Private Testnets for Rapid Development

Local-Interchain: Launch Private Testnets for Rapid Development

Nov 28, 2023

Sunsetting the Public Voyager API

Sunsetting the Public Voyager API

Sep 05, 2023