Minimal Deployment
This document will guide you through quickly setting up a Crater environment locally using Kind. Crater is a distributed training platform based on Kubernetes. This guide will cover the complete process from creating a Kind cluster to deploying all necessary components of Crater.
1. Environment Preparation
1.1 Installing Kind
Reference documentation: https://kind.sigs.k8s.io/docs/user/quick-start/
# Install Kind
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
If you encounter permission issues, you can try using sudo
or move Kind to a directory with write permissions.
1.2 Installing kubectl and Helm
# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/
# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
2. Creating a Kind Cluster
2.1 Creating a Cluster Configuration File
Reference documentation: https://kind.sigs.k8s.io/docs/user/ingress/
# kind-cluster.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraPortMappings:
- containerPort: 80
hostPort: 8080
protocol: TCP
- containerPort: 443
hostPort: 8443
protocol: TCP
Port mapping configuration allows accessing services within the cluster from the host, which is especially important for the Ingress controller.
2.2 Creating the Cluster
kind create cluster --config kind-cluster.yaml
Expected Output:
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.27.3) 🖼
✓ Preparing nodes 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
3. Deploying Ingress-Nginx for the Kind Cluster
Reference documentation: https://kind.sigs.k8s.io/docs/user/ingress/
kubectl apply -f https://kind.sigs.k8s.io/examples/ingress/deploy-ingress-nginx.yaml
Verify the installation:
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=90s
4. Deploying PostgreSQL Database
Reference documentation: https://artifacthub.io/packages/helm/bitnami/postgresql
# Create namespace
kubectl create namespace crater-system
# Set current namespace context
kubectl config set-context --current --namespace=crater-system
# Add Bitnami Helm repository
helm repo add bitnami https://charts.bitnami.com/bitnami
# Install PostgreSQL
helm install crater-postgresql bitnami/postgresql -n crater-system
Get the database password:
export POSTGRES_PASSWORD=$(kubectl get secret --namespace crater-system crater-postgresql -o jsonpath="{.data.postgres-password}" | base64 -d)
echo "Database password: $POSTGRES_PASSWORD"
Please keep the database password secure; it will be needed for subsequent Crater deployments.
5. Deploying Volcano Scheduler
Reference documentation: https://volcano.sh/en/docs/v1.11.0/installation/
# Add Volcano Helm repository
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
# Create namespace and install Volcano
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace
Verify the installation:
kubectl get pods -n volcano-system
Expected Output:
NAME READY STATUS RESTARTS AGE
volcano-admission-xxxxxxxxx-xxxxx 1/1 Running 0 1m
volcano-controllers-xxxxxxxx-xxxxx 1/1 Running 0 1m
volcano-scheduler-xxxxxxxx-xxxxx 1/1 Running 0 1m
6. Deploying NFS Storage
Reference documentation: https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner
# Add NFS Provisioner Helm repository
helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
# Install NFS Server Provisioner
helm install nfs-provisioner nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner -n nfs-system --create-namespace
Verify StorageClass:
kubectl get storageclass
Expected Output:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs cluster.local/nfs-provisioner-nfs-server-provisioner Delete Immediate true 1m
7. Deploying Crater
7.1 Get Crater Helm Chart Values File
helm show values oci://ghcr.io/raids-lab/crater --version 0.1.0 > values.yaml
7.2 Configure Database Connection
Edit the values.yaml
file to configure the database connection information:
postgres:
host: crater-postgresql.crater-system.svc.cluster.local
port: 5432
dbname: postgres
user: postgres
password: "your-database-password" # Replace with your actual password
sslmode: disable
TimeZone: Asia/Shanghai
7.3 Install Crater
helm install crater oci://ghcr.io/raids-lab/crater --version 0.1.0 -n crater-system -f values.yaml
Verify the installation:
kubectl get pods -n crater-system
If all Pods are in the Running state, it means the Crater environment has been successfully set up!
8. Accessing Crater
8.1 Get Access Address
kubectl get ingress -n crater-system
8.2 Configure Local hosts (if needed)
If you are using a local Kind cluster, you may need to add a mapping in /etc/hosts
:
127.0.0.1 crater.example.com
8.3 Access the Web Interface
Open your browser and visit: http://crater.example.com:8080
(or according to your Ingress configuration)
9. Troubleshooting
9.1 Common Issues
Issue: Pods fail to start or keep restarting
Solution:
# Check Pod logs
kubectl logs <pod-name> -n crater-system
# Check Pod details
kubectl describe pod <pod-name> -n crater-system
Issue: Database connection failure
Solution:
# Check database service status
kubectl get svc -n crater-system | grep postgres
# Test database connection
kubectl run postgres-test --rm -it --image=postgres:13 --restart=Never -- \
psql -h crater-postgresql.crater-system.svc.cluster.local -U postgres
10. Cleanup Environment
# Delete Kind cluster
kind delete cluster
# Or delete specific resources
helm uninstall crater -n crater-system
helm uninstall crater-postgresql -n crater-system
helm uninstall volcano -n volcano-system
helm uninstall nfs-provisioner -n nfs-system
Deleting the cluster will clear all data, please ensure that important data has been backed up.
Summary
Through this guide, you have successfully deployed a complete Crater environment on a Kind cluster, including:
- ✅ Kind Kubernetes cluster
- ✅ Ingress-Nginx controller
- ✅ PostgreSQL database
- ✅ Volcano scheduler
- ✅ NFS storage provisioner
- ✅ Crater training platform
Now you can start using Crater for distributed training tasks! If you encounter any issues, please refer to the official documentation for each component or check the logs for troubleshooting.
Edit on GitHub