Loading Search...
Crater

Configuration Guide

A university-developed cluster management platform for intelligent cluster scheduling and monitoring.

Version: 0.1.0 Type: application AppVersion: 1.0.0

A comprehensive AI development platform for Kubernetes that provides GPU resource management, containerized development environments, and workflow orchestration.

Homepage: https://github.com/raids-lab/crater

Maintainers

NameEmailUrl
RAIDS Labhttps://github.com/raids-lab

Source Code

Values

KeyTypeDefaultDescription
affinityobject{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"preference":{"matchExpressions":[{"key":"nvidia.com/gpu.present","operator":"NotIn","values":["true"]}]},"weight":100}]}}Pod affinity configuration
backendConfigobject{"auth":{"accessTokenSecret":"<MASKED>","refreshTokenSecret":"<MASKED>"},"enableLeaderElection":false,"port":":8088","postgres":{"TimeZone":"Asia/Shanghai","dbname":"crater","host":"192.168.0.1","password":"<MASKED>","port":6432,"sslmode":"disable","user":"postgres"},"prometheusAPI":"http://192.168.0.1:12345","registry":{"buildTools":{"proxyConfig":{"httpProxy":null,"httpsProxy":null,"noProxy":null}},"enable":true,"harbor":{"password":"<MASKED>","server":"harbor.example.com","user":"admin"}},"secrets":{"imagePullSecretName":"","tlsForwardSecretName":"crater-tls-forward-secret","tlsSecretName":"crater-tls-secret"},"smtp":{"enable":true,"host":"mail.example.com","notify":"example@example.com","password":"<MASKED>","port":25,"user":"example"},"storage":{"prefix":{"account":"accounts","public":"public","user":"users"},"pvc":{"readOnlyMany":null,"readWriteMany":"crater-rw-storage"}}}Backend configuration
backendConfig.authobject{"accessTokenSecret":"<MASKED>","refreshTokenSecret":"<MASKED>"}Authentication token configuration for JWT-based authentication (Required) Both token secrets must be specified for secure authentication
backendConfig.auth.accessTokenSecretstring"<MASKED>"Secret key used to sign JWT access tokens (Required) Must be a secure, randomly generated string
backendConfig.auth.refreshTokenSecretstring"<MASKED>"Secret key used to sign JWT refresh tokens (Required) Must be a secure, randomly generated string
backendConfig.enableLeaderElectionboolfalseEnable leader election for controller manager to ensure high availability Defaults to false if not specified
backendConfig.portstring":8088"Network port that the server endpoint will listen on (Required) Must be specified for the server to start
backendConfig.postgresobject{"TimeZone":"Asia/Shanghai","dbname":"crater","host":"192.168.0.1","password":"<MASKED>","port":6432,"sslmode":"disable","user":"postgres"}PostgreSQL database connection configuration (Required) All fields must be specified for database connectivity
backendConfig.postgres.TimeZonestring"Asia/Shanghai"Time zone for database connections Defaults to system time zone if not specified
backendConfig.postgres.dbnamestring"crater"Name of the database to connect to (Required) Database must exist and be accessible
backendConfig.postgres.hoststring"192.168.0.1"PostgreSQL server hostname or IP address (Required) Must be reachable from the application
backendConfig.postgres.passwordstring"<MASKED>"Database password for authentication (Required) Must match the specified user's password
backendConfig.postgres.portint6432PostgreSQL server port number (Required) Typically 5432 for PostgreSQL
backendConfig.postgres.sslmodestring"disable"SSL/TLS mode for database connection Defaults to "disable" if not specified
backendConfig.postgres.userstring"postgres"Database username for authentication (Required) User must have appropriate permissions
backendConfig.prometheusAPIstring"http://192.168.0.1:12345"Endpoint URL for Prometheus API used for metrics and monitoring If not specified, Prometheus integration will be disabled
backendConfig.registryobject{"buildTools":{"proxyConfig":{"httpProxy":null,"httpsProxy":null,"noProxy":null}},"enable":true,"harbor":{"password":"<MASKED>","server":"harbor.example.com","user":"admin"}}Container registry configuration for image storage and building If Enable is false, registry functionality will be disabled
backendConfig.registry.buildToolsobject{"proxyConfig":{"httpProxy":null,"httpsProxy":null,"noProxy":null}}Configuration for container image building tools and proxies Required if Registry.Enable is true
backendConfig.registry.buildTools.proxyConfigobject{"httpProxy":null,"httpsProxy":null,"noProxy":null}HTTP proxy settings for build environments If not specified, no proxy will be configured for builds
backendConfig.registry.buildTools.proxyConfig.httpProxystringnilHTTP proxy URL for build environments If not specified, HTTP traffic will not be proxied
backendConfig.registry.buildTools.proxyConfig.httpsProxystringnilHTTPS proxy URL for build environments If not specified, HTTPS traffic will not be proxied
backendConfig.registry.buildTools.proxyConfig.noProxystringnilComma-separated list of domains that should not be proxied If not specified, all traffic will go through the proxy
backendConfig.registry.enablebooltrueEnable container registry integration Defaults to false if not specified
backendConfig.registry.harborobject{"password":"<MASKED>","server":"harbor.example.com","user":"admin"}Configuration for Harbor container registry integration Required if Registry.Enable is true: All Harbor fields must be specified
backendConfig.registry.harbor.passwordstring"<MASKED>"Admin password for Harbor authentication (Required) Must match the specified user's password
backendConfig.registry.harbor.serverstring"harbor.example.com"Harbor registry server URL (Required) Must be a valid Harbor instance URL
backendConfig.registry.harbor.userstring"admin"Admin username for Harbor authentication (Required) User must have appropriate permissions in Harbor
backendConfig.secretsobject{"imagePullSecretName":"","tlsForwardSecretName":"crater-tls-forward-secret","tlsSecretName":"crater-tls-secret"}Kubernetes secret names for various security components (Required) All secret names must correspond to existing Kubernetes secrets
backendConfig.secrets.imagePullSecretNamestring""Name of the Kubernetes secret for pulling container images from private registries If not specified, no image pull secret will be used
backendConfig.secrets.tlsForwardSecretNamestring"crater-tls-forward-secret"Name of the Kubernetes secret for TLS forwarding configuration (Required) Secret must contain appropriate forwarding certificates
backendConfig.secrets.tlsSecretNamestring"crater-tls-secret"Name of the Kubernetes secret containing TLS certificates for HTTPS (Required) Secret must contain 'tls.crt' and 'tls.key' keys
backendConfig.smtpobject{"enable":true,"host":"mail.example.com","notify":"example@example.com","password":"<MASKED>","port":25,"user":"example"}Configuration for email notifications via SMTP If Enable is false, email notifications will be disabled
backendConfig.smtp.enablebooltrueEnable SMTP email functionality Defaults to false if not specified
backendConfig.smtp.hoststring"mail.example.com"SMTP server hostname or IP address (Required if Enable is true) Must be a valid SMTP server
backendConfig.smtp.notifystring"example@example.com"Default email address for system notifications (Required if Enable is true) Must be a valid email address
backendConfig.smtp.passwordstring"<MASKED>"Password for SMTP authentication (Required if Enable is true) Must match the specified user's password
backendConfig.smtp.portint25SMTP server port number (Required if Enable is true) Typically 25, 465, or 587
backendConfig.smtp.userstring"example"Username for SMTP authentication (Required if Enable is true) Must be a valid SMTP user
backendConfig.storageobject{"prefix":{"account":"accounts","public":"public","user":"users"},"pvc":{"readOnlyMany":null,"readWriteMany":"crater-rw-storage"}}Persistent volume claim and path prefix configurations (Required) All PVC names and prefix paths must be specified
backendConfig.storage.prefixobject{"account":"accounts","public":"public","user":"users"}Path prefixes for different types of storage locations (Required) All prefix paths must be specified
backendConfig.storage.prefix.accountstring"accounts"Account prefix for account-related storage paths (Required) Must be a valid path within the storage system
backendConfig.storage.prefix.publicstring"public"Public prefix for publicly accessible storage paths (Required) Must be a valid path within the storage system
backendConfig.storage.prefix.userstring"users"User prefix for user-specific storage paths (Required) Must be a valid path within the storage system
backendConfig.storage.pvc.readOnlyManystringnilName of the ReadOnlyMany Persistent Volume Claim for datasets and models It should be a link to the same underlying storage as ReadWriteMany If not specified, datasets and models will be mounted as read-write
backendConfig.storage.pvc.readWriteManystring"crater-rw-storage"Name of the ReadWriteMany Persistent Volume Claim for shared storage (Required) PVC must exist in the cluster with ReadWriteMany access mode
buildkitConfigobject{"amdConfig":{"cache":{"maxUsedSpace":"400GB","minFreeSpace":"50GB","reservedSpace":"50GB","storageClass":"openebs-hostpath","storageSize":"400Gi"},"enabled":true,"replicas":3},"armConfig":{"cache":{"maxUsedSpace":"80GB","minFreeSpace":"10GB","reservedSpace":"10GB","storageClass":"openebs-hostpath","storageSize":"80Gi"},"enabled":false,"replicas":2},"generalConfig":{"resources":{"limits":{"cpu":16,"memory":"48Gi"},"requests":{"cpu":8,"memory":"24Gi"}}}}Image building pipeline configuration Only fully available when you have self-hosted image registries like Harbor
buildkitConfig.amdConfigobject{"cache":{"maxUsedSpace":"400GB","minFreeSpace":"50GB","reservedSpace":"50GB","storageClass":"openebs-hostpath","storageSize":"400Gi"},"enabled":true,"replicas":3}AMD architecture configuration
buildkitConfig.amdConfig.cacheobject{"maxUsedSpace":"400GB","minFreeSpace":"50GB","reservedSpace":"50GB","storageClass":"openebs-hostpath","storageSize":"400Gi"}Cache configuration for AMD builds
buildkitConfig.amdConfig.cache.maxUsedSpacestring"400GB"Maximum used space for AMD build cache
buildkitConfig.amdConfig.cache.minFreeSpacestring"50GB"Minimum free space for AMD build cache
buildkitConfig.amdConfig.cache.reservedSpacestring"50GB"Reserved space for AMD build cache
buildkitConfig.amdConfig.cache.storageClassstring"openebs-hostpath"Storage class for AMD build cache
buildkitConfig.amdConfig.cache.storageSizestring"400Gi"Storage size for AMD build cache
buildkitConfig.amdConfig.enabledbooltrueEnable AMD architecture builds
buildkitConfig.amdConfig.replicasint3Number of AMD build replicas
buildkitConfig.armConfigobject{"cache":{"maxUsedSpace":"80GB","minFreeSpace":"10GB","reservedSpace":"10GB","storageClass":"openebs-hostpath","storageSize":"80Gi"},"enabled":false,"replicas":2}ARM architecture configuration
buildkitConfig.armConfig.cacheobject{"maxUsedSpace":"80GB","minFreeSpace":"10GB","reservedSpace":"10GB","storageClass":"openebs-hostpath","storageSize":"80Gi"}Cache configuration for ARM builds
buildkitConfig.armConfig.cache.maxUsedSpacestring"80GB"Maximum used space for ARM build cache
buildkitConfig.armConfig.cache.minFreeSpacestring"10GB"Minimum free space for ARM build cache
buildkitConfig.armConfig.cache.reservedSpacestring"10GB"Reserved space for ARM build cache
buildkitConfig.armConfig.cache.storageClassstring"openebs-hostpath"Storage class for ARM build cache
buildkitConfig.armConfig.cache.storageSizestring"80Gi"Storage size for ARM build cache
buildkitConfig.armConfig.enabledboolfalseEnable ARM architecture builds (Can only be true when ARM nodes exist)
buildkitConfig.armConfig.replicasint2Number of ARM build replicas
buildkitConfig.generalConfigobject{"resources":{"limits":{"cpu":16,"memory":"48Gi"},"requests":{"cpu":8,"memory":"24Gi"}}}General configuration for all architectures
buildkitConfig.generalConfig.resourcesobject{"limits":{"cpu":16,"memory":"48Gi"},"requests":{"cpu":8,"memory":"24Gi"}}Resource configuration
buildkitConfig.generalConfig.resources.limits.cpuint16CPU limit
buildkitConfig.generalConfig.resources.limits.memorystring"48Gi"Memory limit
buildkitConfig.generalConfig.resources.requests.cpuint8CPU request
buildkitConfig.generalConfig.resources.requests.memorystring"24Gi"Memory request
cronjobConfigobject{"jobs":{"longTime":{"BATCH_DAYS":"4","INTERACTIVE_DAYS":"4","schedule":"*/5 * * * *"},"lowGPUUtil":{"TIME_RANGE":"90","UTIL":"0","WAIT_TIME":"30","schedule":"*/5 * * * *"},"waitingJupyter":{"JUPYTER_WAIT_MINUTES":"5","schedule":"*/5 * * * *"}}}Cronjob management strategy configuration Job scheduling management strategies such as low utilization email reminders and cleanup, long-time usage email reminders and cleanup, etc.
cronjobConfig.jobsobject{"longTime":{"BATCH_DAYS":"4","INTERACTIVE_DAYS":"4","schedule":"*/5 * * * *"},"lowGPUUtil":{"TIME_RANGE":"90","UTIL":"0","WAIT_TIME":"30","schedule":"*/5 * * * *"},"waitingJupyter":{"JUPYTER_WAIT_MINUTES":"5","schedule":"*/5 * * * *"}}Job management tasks configuration
cronjobConfig.jobs.longTime.BATCH_DAYSstring"4"Batch job maximum days
cronjobConfig.jobs.longTime.INTERACTIVE_DAYSstring"4"Interactive job maximum days
cronjobConfig.jobs.longTime.schedulestring"*/5 * * * *"Schedule for long-time usage check
cronjobConfig.jobs.lowGPUUtil.TIME_RANGEstring"90"Time range for monitoring (minutes)
cronjobConfig.jobs.lowGPUUtil.UTILstring"0"GPU utilization threshold
cronjobConfig.jobs.lowGPUUtil.WAIT_TIMEstring"30"Wait time before action (minutes)
cronjobConfig.jobs.lowGPUUtil.schedulestring"*/5 * * * *"Schedule for low GPU utilization check
cronjobConfig.jobs.waitingJupyter.JUPYTER_WAIT_MINUTESstring"5"Jupyter wait time in minutes
cronjobConfig.jobs.waitingJupyter.schedulestring"*/5 * * * *"Schedule for waiting Jupyter check
firstUserobject{"password":"<MASKED>","username":"crater-admin"}First user configuration When connecting to the database for the first time, creates the first user account with administrator privileges
firstUser.passwordstring"<MASKED>"Password for the first administrator user (Please reset this password)
firstUser.usernamestring"crater-admin"Username for the first administrator user
frontendConfigobject{"grafana":{"job":{"basic":"/d/R4ZPFfyIz/crater-job-basic-dashboard","nvidia":"/d/2CDE0AC7/crater-job-nvidia-dashboard","pod":"/d/MhnFUFLSz/crater-pod-dashboard"},"node":{"basic":"/d/k8s_views_nodes/crater-node-basic-dashboard","nvidia":"/d/nvidia-dcgm-dashboard/crater-node-nvidia-dashboard"},"overview":{"main":"/d/f33ade9f-821d-4e96-a7f2-eb16c8b9c447/838ffad","network":"/d/8b7a8b326d7a6f1f04y7fh66368c67af/networking","schedule":"/d/be9oh7yk24jy8f/crater-gpu-e8b083-e5baa6-e58f82-e88083"},"user":{"nvidia":"/d/user-nvidia-dcgm-dashboard/crater-user-nvidia-dashboard"}},"url":{"apiPrefix":"/api/v1","document":"https://raids-lab.github.io/crater/zh"},"version":"1.0.0"}Frontend configuration
frontendConfig.grafanaobject{"job":{"basic":"/d/R4ZPFfyIz/crater-job-basic-dashboard","nvidia":"/d/2CDE0AC7/crater-job-nvidia-dashboard","pod":"/d/MhnFUFLSz/crater-pod-dashboard"},"node":{"basic":"/d/k8s_views_nodes/crater-node-basic-dashboard","nvidia":"/d/nvidia-dcgm-dashboard/crater-node-nvidia-dashboard"},"overview":{"main":"/d/f33ade9f-821d-4e96-a7f2-eb16c8b9c447/838ffad","network":"/d/8b7a8b326d7a6f1f04y7fh66368c67af/networking","schedule":"/d/be9oh7yk24jy8f/crater-gpu-e8b083-e5baa6-e58f82-e88083"},"user":{"nvidia":"/d/user-nvidia-dcgm-dashboard/crater-user-nvidia-dashboard"}}Grafana dashboard configurations References: https://github.com/raids-lab/crater/tree/main/grafana-dashboards
frontendConfig.grafana.job.basicstring"/d/R4ZPFfyIz/crater-job-basic-dashboard"Basic job dashboard URL
frontendConfig.grafana.job.nvidiastring"/d/2CDE0AC7/crater-job-nvidia-dashboard"NVIDIA job dashboard URL
frontendConfig.grafana.job.podstring"/d/MhnFUFLSz/crater-pod-dashboard"Pod dashboard URL
frontendConfig.grafana.node.basicstring"/d/k8s_views_nodes/crater-node-basic-dashboard"Basic node dashboard URL
frontendConfig.grafana.node.nvidiastring"/d/nvidia-dcgm-dashboard/crater-node-nvidia-dashboard"NVIDIA node dashboard URL
frontendConfig.grafana.overview.mainstring"/d/f33ade9f-821d-4e96-a7f2-eb16c8b9c447/838ffad"Main overview dashboard URL
frontendConfig.grafana.overview.networkstring"/d/8b7a8b326d7a6f1f04y7fh66368c67af/networking"Network dashboard URL
frontendConfig.grafana.overview.schedulestring"/d/be9oh7yk24jy8f/crater-gpu-e8b083-e5baa6-e58f82-e88083"Schedule dashboard URL
frontendConfig.grafana.user.nvidiastring"/d/user-nvidia-dcgm-dashboard/crater-user-nvidia-dashboard"User NVIDIA dashboard URL
frontendConfig.url.apiPrefixstring"/api/v1"Backend API prefix (not modifiable currently)
frontendConfig.url.documentstring"https://raids-lab.github.io/crater/zh"Documentation base URL
frontendConfig.versionstring"1.0.0"Frontend version
grafanaProxyobject{"address":"http://prometheus-grafana.monitoring","enable":true,"host":"gpu-grafana.<your-domain>.com","token":"<MASKED>"}Grafana proxy configuration Only Grafana Pro has password-free login feature. We use Nginx proxy to support password-free login for Iframe
grafanaProxy.addressstring"http://prometheus-grafana.monitoring"Grafana service address in cluster
grafanaProxy.enablebooltrueWhether to enable Grafana proxy
grafanaProxy.hoststring"gpu-grafana.<your-domain>.com"Domain name for exposing Grafana via Ingress
grafanaProxy.tokenstring"<MASKED>"Grafana access token (masked, please apply for read-only token in Grafana)
hoststring"crater.<your-domain>.com"Domain name or IP address that the server will bind to (Required) Must be specified for the server to start
imagePullPolicystring"Always"Image pull policy ("IfNotPresent"
imagePullSecretslist[]Image pull secrets
imagesobject{"backend":{"repository":"ghcr.io/raids-lab/crater-backend","tag":"latest"},"buildkit":{"repository":"docker.io/moby/buildkit","tag":"v0.23.1"},"buildx":{"repository":"ghcr.io/raids-lab/buildx-client","tag":"latest"},"cronjob":{"repository":"docker.io/badouralix/curl-jq","tag":"latest"},"envd":{"repository":"ghcr.io/raids-lab/envd-client","tag":"latest"},"frontend":{"repository":"ghcr.io/raids-lab/crater-frontend","tag":"latest"},"grafanaProxy":{"repository":"docker.io/library/nginx","tag":"1.27.3-bookworm"},"nerdctl":{"repository":"ghcr.io/raids-lab/nerdctl-client","tag":"latest"},"storage":{"repository":"ghcr.io/raids-lab/storage-server","tag":"latest"}}Container images configuration
images.backend.repositorystring"ghcr.io/raids-lab/crater-backend"Backend service image repository
images.backend.tagstring"latest"Backend service image tag
images.buildkit.repositorystring"docker.io/moby/buildkit"Buildkit image repository for containerd-based builds
images.buildkit.tagstring"v0.23.1"Buildkit image tag
images.buildx.repositorystring"ghcr.io/raids-lab/buildx-client"Buildx image repository for Docker Buildx multi-platform builds
images.buildx.tagstring"latest"Buildx image tag
images.cronjob.repositorystring"docker.io/badouralix/curl-jq"Cronjob image repository
images.cronjob.tagstring"latest"Cronjob image tag
images.envd.repositorystring"ghcr.io/raids-lab/envd-client"Envd image repository for environment-based development builds
images.envd.tagstring"latest"Envd image tag
images.frontend.repositorystring"ghcr.io/raids-lab/crater-frontend"Frontend service image repository
images.frontend.tagstring"latest"Frontend service image tag
images.grafanaProxy.repositorystring"docker.io/library/nginx"Grafana proxy image repository
images.grafanaProxy.tagstring"1.27.3-bookworm"Grafana proxy image tag
images.nerdctl.repositorystring"ghcr.io/raids-lab/nerdctl-client"Nerdctl image repository for containerd-based builds
images.nerdctl.tagstring"latest"Nerdctl image tag
images.storage.repositorystring"ghcr.io/raids-lab/storage-server"Storage server image repository
images.storage.tagstring"latest"Storage server image tag
namespacesobject{"create":true,"image":"crater-images","job":"crater-workspace"}Namespace configuration for crater components By default, crater components run in crater namespace, while jobs and images are in separate namespaces
namespaces.createbooltrueWhether to create namespaces along with the deployment
namespaces.imagestring"crater-images"Namespace for building images
namespaces.jobstring"crater-workspace"Namespace for running jobs
nodeSelectorobject{"node-role.kubernetes.io/control-plane":""}Node selector for all Deployments Prevents control components from being scheduled to GPU worker nodes
protocolstring"https"Protocol for server communication
storageobject{"create":true,"pvcName":"crater-rw-storage","request":"2Ti","storageClass":"ceph-fs"}Persistent Volume Claim configuration
storage.createbooltrueWhether to create PVC
storage.pvcNamestring"crater-rw-storage"PVC name (also used in backendConfig)
storage.requeststring"2Ti"Storage request size
storage.storageClassstring"ceph-fs"Storage class name (e.g. cephfs, nfs, must support ReadWriteMany)
tlsobject{"base":{"cert":"<MASKED>","create":false,"key":"<MASKED>"},"forward":{"cert":"<MASKED>","create":false,"key":"<MASKED>"}}TLS certificate configuration for exposing services via Ingress cert-manager configuration variables
tls.baseobject{"cert":"<MASKED>","create":false,"key":"<MASKED>"}Base certificate configuration (Standard mode, e.g., crater.example.com certificate)
tls.base.certstring"<MASKED>"Base certificate content (masked)
tls.base.createboolfalseWhether to create base certificate
tls.base.keystring"<MASKED>"Base certificate private key (masked)
tls.forwardobject{"cert":"<MASKED>","create":false,"key":"<MASKED>"}Forward certificate configuration (Subdomain mode, e.g., xxx.crater.example.com certificate for exposing internal job services externally)
tls.forward.certstring"<MASKED>"Forward certificate content (masked)
tls.forward.createboolfalseWhether to create forward certificate
tls.forward.keystring"<MASKED>"Forward certificate private key (masked)
tolerationslist[{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane","operator":"Exists"}]Pod tolerations
Edit on GitHub