Introducing the Cosmos Operator
Nov 21, 2022
David Nix
Strangelove is excited to announce the release of a new open-source product, the Cosmos Operator.
This product is a Kubernetes Operator that should work for the majority of blockchains built with the Cosmos SDK.
The Cosmos Operator allows you to create highly-available, fault-tolerant RPC node deployments quickly and easily within a Kubernetes cluster. Long-term, we plan to support validator sentries, persistent peers, and seed nodes as well.
Through a single abstraction, the Cosmos Operator unifies and simplifies:
Setting reasonable configuration defaults
TOML wrangling
p2p identity
Downloading genesis files
Restoring from a snapshot
For example, this is all you need to deploy a highly available RPC node for the Cosmos Hub:
apiVersion: cosmos.strange.love/v1kind: CosmosFullNodemetadata:name: cosmoshubspec:replicas: 3chain:network: mainnetchainID: cosmoshub-4binary: gaiadgenesisURL: "<https://github.com/cosmos/mainnet/raw/master/genesis.cosmoshub-4.json.gz>"snapshotURL: "<https://snapshots1.polkachu.com/snapshots/cosmos/cosmos_11701512.tar.lz4>"config:seeds: "cfd785a4224c7940e9a10f6c1ab24c343e923bec@164.68.107.188:26656,bf8328b66dceb4987e5cd94430af66045e59899f@public-seed.cosmos.vitwit.com:26656,d72b3011ed46d783e369fdf8ae2055b99a1e5074@173.249.50.25:26656"app:minGasPrice: "0.0025uatom"podTemplate:image: "ghcr.io/strangelove-ventures/heighliner/gaia"volumeClaimTemplate:storageClassName: "premium-rwo"resources:requests:storage: 500Gi
Note: The snapshot url is outdated and changes by the day. You will need to update it to point to a recent, valid URL.
Why Kubernetes?
Strangelove leverages Kubernetes for almost all of our infrastructure. Kubernetes provides well-known, battle-tested DevOps patterns and abstractions, thus minimizing "reinventing the wheel" as common in DevOps.
Additionally, the Operator Pattern allows mixing business logic with infrastructure, which offers powerful leverage in maintaining many different deployments while minimizing human intervention.
The long-term vision is "configure it and forget it."
The Flagship CRD: CosmosFullNode
Kubernetes allows you to extend its API, defining your own objects via a Custom Resource Definition, often abbreviated to CRD.
The primary CRD for the Cosmos Operator is the CosmosFullNode.
Once the Cosmos Operator is installed in your cluster, creating RPC nodes is as simple as:
kubectl apply -f path/to/fullnode.yaml
Examples:
CosmosFullNode is stable with API version v1. Any changes will not break backward compatibility, but we may occasionally deprecate fields.
Strangelove has been successfully running CosmosFullNode in production for many weeks.
Tame the TOML
Configuring Cosmos nodes is difficult and error-prone. (Lots of nuanced TOML across several files!)
You never have to endure the madness of sed again:
# Never againsed -i '/\\[api\\]/,+3 s/enable = false/enable = true/' app.toml
CosmosFullNode exposes common configuration in a more natural hierarchy, and updates the correct TOML files for you.
Take pruning as an example:
apiVersion: cosmos.strange.love/v1kind: CosmosFullNode# ...spec:chain:app:pruning:strategy: "custom"interval: 17keepEvery: 1000keepRecent: 5000minRetainBlocks: 10000
Or tweaking configs not exposed by the operator:
apiVersion: cosmos.strange.love/v1kind: CosmosFullNode# ...spec:chain:app:overrides: |-# Add valid toml here for app.toml[tx_index]indexer = "null"
Installing Cosmos Operator
Simply `git-clone` the repository locally and run `make`.
git clone <https://github.com/strangelove-ventures/cosmos-operator.git>cd cosmos-operator# Switch to the appropriate kuberentes cluster.make deploy IMG="ghcr.io/strangelove-ventures/cosmos-operator:$(git describe --tags --abbrev=0)"
In the future, we plan to include a helm chart. That way, you will not need to clone the repo.
What about Ingress?
An RPC node isn't much use if no one can reach it on the public internet.
There are a variety of ways to configure Ingress within Kubernetes. Therefore, the operator cannot configure Ingress for you. However, it creates a Kubernetes Service and allows customizing that service to ease Ingress configuration.
The Service exposes all RPCs and APIs. However, you control whether or not Ingress maps to a port.
Here's an example for Google Cloud's Ingress controller on GKE to expose the RPC and LCD endpoints.
apiVersion: cosmos.strange.love/v1kind: CosmosFullNode# ...spec:service:rpcTemplate:metadata:annotations:cloud.google.com/backend-config: '{"default": "cosmoshub-lb-backend"}'cloud.google.com/neg: '{"ingress": true}'---apiVersion: networking.k8s.io/v1kind: Ingressmetadata:annotations:kubernetes.io/ingress.class: gcename: cosmoshub-ingressspec:rules:- host: rpc.cosmoshub.strange.lovehttp:paths:- backend:service:name: cosmoshub-rpcport:name: rpcpath: /pathType: Prefix- host: api.cosmoshub.strange.lovehttp:paths:- backend:service:name: cosmoshub-rpcport:name: apipath: /pathType: Prefix
Conclusion
We hope the Cosmos Operator encourages other Kubernetes aficionados to deploy nodes, thus growing and securing the network.
You can find additional information in the official README, including a rough roadmap.