Prometheus特點 多維數據架構 (由 metric 名稱與 key/ value 定義的時間序列)
靈活的查詢語言(PromQL)
支援Local與Remote,不依賴分散式儲存
使用Pull方式取得目標資訊,通過HTTP協議傳輸
支援Pushgateway
支援多種圖形模式與Dashboard
Prometheus核心元件 Prometheus Server: 主要服務,用來抓取和儲存時序資料 (Time Series Data)
Client librarys: 用於對接Prometheus Server,負責檢測應用程序代碼,讓其可以透過HTTP傳送資訊到Prometheus Server
PushGateway: 支援Client主動將短期和批量的監控數據推送至此,而Prometheus會定時來抓取數據
Exporter: 負責從目標收集數據,並轉換成Prometheus支援的格式,不同的Exporter負責不同的業務, 命名格式為xxx_exporter
Alertmanager: 用於告警通知管理
Prometheus架構
Prometheus VS InfluxDB InfluxDB :僅僅是一個資料庫,它被動的接受客戶端資料和查詢請求,基於 Push
Prometheus :完整的監控系統,能抓取資料、查詢資料、告警等功能,基於 Pull
Push 和 Pull 主要區別在發起者不同及邏輯架構不同
PromQL 在Node Exporter的/metrics接口中返回的監控數據,在Prometheus下稱為一個樣本。採集到的樣本由以下三部分組成:
指標(metric)
時間戳記(timestamp)
樣本值(value)
Metric Type Counter(計數器)
rate(http_requests_total[5m])
Gauge(儀表板)
node_memory_MemFree
predict_linear(node_filesystem_free{job=”kubernetes-node-exporter”}[1h], 4 * 3600)
Histogram(直方圖)
Summary(摘要)
環境準備 Ubuntu 16.04 LTS
Docker 1.10+
Kubernetes v1.9.6
Node_exporter v0.15+
Prometheus v2.0.0
Grafana 5.1+
安裝Prometheus Create A Namespace :
1 kubectl create namespace kube-ops
Create Exporter :
1 kubectl create -f node-exporter.yaml
node-exporter.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 --- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: node-exporter namespace: kube-ops labels: k8s-app: node-exporter spec: template: metadata: labels: k8s-app: node-exporter spec: containers: - image: prom/node-exporter name: node-exporter ports: - containerPort: 9100 protocol: TCP name: http --- apiVersion: v1 kind: Service metadata: labels: k8s-app: node-exporter name: node-exporter namespace: kube-ops spec: ports: - name: http port: 9100 nodePort: 31672 protocol: TCP type: NodePort selector: k8s-app: node-exporter
Create ServiceAccount、ClusterRole、ClusterRoleBinding :
1 kubectl create -f prometheus-sa.yaml
prometheus-sa.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-ops --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus namespace: kube-ops rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list" , "watch" ] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus namespace: kube-ops roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-ops
Create A Config Map :
1 kubectl create -f prometheus-cm.yaml
prometheus-cm.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-ops data: prometheus.yml: | global: scrape_interval: 30 s scrape_timeout: 30 s rule_files: - /etc/prometheus/rules.yml alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"] scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics - job_name: 'kubernetes-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-node-exporter' scheme: http tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:31672' target_label: __address__ - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-services' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: service relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name rules.yml: | groups: - name: test-rule rules: - alert: NodeFilesystemUsage expr: (node_filesystem_size{device="rootfs"} - node_filesystem_free{device="rootfs"}) / node_filesystem_size{device="rootfs"} * 100 > 80 for: 2 m labels: team: node annotations: summary: "{{$labels.instance}} : High Filesystem usage detected" description: "{{$labels.instance}} : Filesystem usage is above 80% (current value is: {{ $value }} " - alert: NodeMemoryUsage expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80 for: 2 m labels: team: node annotations: summary: "{{$labels.instance}} : High Memory usage detected" description: "{{$labels.instance}} : Memory usage is above 80% (current value is: {{ $value }} " - alert: NodeCPUUsage expr: (100 - (avg by (instance) (irate(node_cpu{job="kubernetes-node-exporter",mode="idle"}[5m])) * 100 )) > 80 for: 2 m labels: team: node annotations: summary: "{{$labels.instance}} : High CPU usage detected" description: "{{$labels.instance}} : CPU usage is above 80% (current value is: {{ $value }} " - alert: test expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{job="kubernetes-node-exporter",mode="idle"}[5m])) * 100 )) > 1 for: 2 m labels: team: node annotations: summary: "{{$labels.instance}} : High CPU usage detected" description: "{{$labels.instance}} : CPU usage is above 1% (current value is: {{ $value }} " --- kind: ConfigMap apiVersion: v1 metadata: name: alertmanager namespace: kube-ops data: config.yml: |- global: resolve_timeout: 5 m route: receiver: webhook group_wait: 30 s group_interval: 5 m repeat_interval: 4 h group_by: [alertname] routes: - receiver: webhook group_wait: 10 s match: team: node receivers: - name: webhook webhook_configs: - url: 'http://apollo/hooks/dingtalk/' send_resolved: true - url: 'http://apollo/hooks/prome/' send_resolved: true
Create A Prometheus Deployment :
1 kubectl create -f prometheus-deploy.yaml
prometheus-deploy.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: k8s-app: prometheus name: prometheus namespace: kube-ops spec: replicas: 1 template: metadata: labels: k8s-app: prometheus spec: serviceAccountName: prometheus containers: - image: prom/prometheus:v2.0.0-rc.3 name: prometheus command: - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention=24h" ports: - containerPort: 9090 protocol: TCP name: http volumeMounts: - mountPath: "/prometheus" name: data - mountPath: "/etc/prometheus" name: config-volume resources: requests: cpu: 100 m memory: 100 Mi limits: cpu: 200 m memory: 1 Gi - image: quay.io/prometheus/alertmanager:v0.12.0 name: alertmanager args: - "-config.file=/etc/alertmanager/config.yml" - "-storage.path=/alertmanager" ports: - containerPort: 9093 protocol: TCP name: http volumeMounts: - name: alertmanager-config-volume mountPath: /etc/alertmanager resources: requests: cpu: 50 m memory: 50 Mi limits: cpu: 200 m memory: 200 Mi volumes: - name: data emptyDir: {} - configMap: name: prometheus-config name: config-volume - name: alertmanager-config-volume configMap: name: alertmanager
Exposing Prometheus As A Service :
1 kubectl create -f prometheus-svc.yaml
prometheus-svc.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 apiVersion: v1 kind: Service metadata: name: prometheus namespace: kube-ops labels: k8s-app: prometheus spec: selector: k8s-app: prometheus type: NodePort ports: - name: web port: 9090 targetPort: http
Create A Grafana Deployment :
1 kubectl create -f grafana-deployment.yaml
grafana-deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 apiVersion: extensions/v1beta1 kind: Deployment metadata: name: grafana namespace: kube-ops labels: app: grafana spec: replicas: 1 template: metadata: labels: app: grafana spec: containers: - image: grafana/grafana:latest securityContext: runAsUser: 0 name: grafana imagePullPolicy: IfNotPresent resources: limits: cpu: 100 m memory: 100 Mi requests: cpu: 100 m memory: 100 Mi env: - name: GF_AUTH_BASIC_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ENABLED value: "false" readinessProbe: httpGet: path: /login port: 3000 volumeMounts: - name: grafana-persistent-storage mountPath: /var/lib/grafana volumes: - name: grafana-persistent-storage emptyDir: {}
Exposing the Grafana web UI :
1 kubectl create -f grafana-svc.yaml
grafana-svc.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 apiVersion: v1 kind: Service metadata: name: grafana namespace: kube-ops labels: app: grafana spec: type: NodePort ports: - port: 3000 selector: app: grafana
Prometheus UI介面 Grafana設定