跳到主要内容

第6章 Grafana

grafana介绍

安装部署

cat > grafana.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: prom
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
volumes:
- name: storage
hostPath:
path: /data/k8s/grafana/
nodeSelector:
kubernetes.io/hostname: node2
securityContext:
runAsUser: 0
containers:
- name: grafana
image: grafana/grafana:7.4.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: grafana
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
readinessProbe:
failureThreshold: 10
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
cpu: 150m
memory: 512Mi
requests:
cpu: 150m
memory: 512Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: storage
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: prom
spec:
ports:
- port: 3000
selector:
app: grafana
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: prom
labels:
app: grafana
spec:
ingressClassName: nginx
rules:
- host: grafana.k8s.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: grafana
port:
number: 3000
EOF

应用资源配置:

[root@node1 prom]# kubectl apply -f grafana.yml
deployment.apps/grafana created
service/grafana created
ingress.networking.k8s.io/prometheus configured

访问测试:1726449161335-c499099d-e701-4feb-9687-d4adc9c05f65.png

添加数据源

1726449161342-f84f9dfa-5a99-486c-af59-01933f00702d.png

1726449161386-aa0ab3e1-5fb8-4aab-a7a2-2e83a4d32ea1.png

1726449161393-09d04c89-eeaa-4f4e-8e8e-7327d31ba15c.png

安装插件

grafana具有丰富的插件,这里我们使用一个非常强大的专门对k8s集群进行监控的插件 :

DevOpsProdigy KubeGraf 项目地址为:

https://github.com/devopsprodigy/kubegraf/
https://github.com/devopsprodigy/kubegraf-v2

安装这个插件需要我们进入grafana的pod内进行安装:

[root@node1 prom]# kubectl -n prom exec -it grafana-7f5b7455fc-z6ctx -- /bin/bash
bash-5.0# grafana-cli plugins install devopsprodigy-kubegraf-app
installing devopsprodigy-kubegraf-app @ 1.5.2
from: https://grafana.com/api/plugins/devopsprodigy-kubegraf-app/versions/1.5.2/download
into: /var/lib/grafana/plugins

✔ Installed devopsprodigy-kubegraf-app successfully
installing grafana-piechart-panel @ 1.6.2
from: https://grafana.com/api/plugins/grafana-piechart-panel/versions/1.6.2/download
into: /var/lib/grafana/plugins

✔ Installed grafana-piechart-panel successfully
Installed dependency: grafana-piechart-panel ✔

Restart grafana after installing plugins . <service grafana-server restart>

bash-5.0#

安装完成后我们还需要重启一下grafana才能生效,因为我们做了数据持久化,所以直接删除pod重新创建即可。

[root@node1 prom]# kubectl -n prom delete pod grafana-7f5b7455fc-z6ctx
pod "grafana-7f5b7455fc-z6ctx" deleted

重启之后我们在grafana页面激活插件

1726449161660-e3a828fc-c0ed-4131-b928-f69035d5a507.png

1726449161743-b65be14c-6a39-4df6-8e23-74a9f5cad830.png

1726449161782-1aefc5e0-d77c-4206-94b8-2c1cf5763364.png

这里需要对验证,我们使用kubectl的kubeconfig配置文件的内容来进行配置:

[root@node1 prom]# cat ~/.kube/config
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: #CA Cert的值
..............
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: #Client Cert的值
client-key-data: #Client Key的值
..............

但是配置文件里的为base64编码后的,所以我们还需要进行解码,配置完成后的截图如下:干!你写完整能!!@#!@%#

1726449161790-e01ffb96-d776-4544-b425-641f5c523de9.png

1726449161812-3c7296b2-3d0b-46ff-a38d-2595f2738ca2.png

保存之后左边就会出现插件的图标,点击就可以查看了

1726449162169-d3ea006b-2fd2-4a53-b09e-d5f0884cf944.png

导入dashboard

1726449162130-ec152fce-7053-45b5-9253-e78af8a38988.png

1726449162159-9d6d7295-70c9-4ffc-a3f7-4d2d99b912a4.png

1726449162164-faeb6d22-7c3f-4fcd-8281-617fa8f0656a.png

当我们下载别人的dashboard时经常会遇到图形显示错乱或者数据异常,这是因为作者制作的图形的数据源和采集信息和我们部署的prometheus版本不一样或者不匹配,我们可以通过修改采集语句的变量来调整。

https://grafana.com/grafana/dashboards/16098-1-node-exporter-for-prometheus-dashboard-cn-0417-job/

比如这个dashboard作者说有一个指标需要单独填写规则

cm:

global:
scrape_interval: 15s
scrape_timeout: 15s
# 新增加规则文件
rule_files:
- 'node_rules.yml'
...

# 新增加以下配置

node_rules.yml: |
groups:
- name: node_usage_record_rules
interval: 1m
rules:
- record: cpu:usage:rate1m
expr: (1 - avg(irate(node_cpu_seconds_total{mode="idle"}[3m])) by (job,instance)) * 100
- record: mem:usage:rate1m
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

重新生效后查看prometheus配置:

1726449162303-6fc5b2eb-59db-4802-a3d1-bfbcbc861f4a.png

修改dashboard的图表语句:

quantile_over_time(0.99, cpu:usage:rate1m{origin_prometheus=~"$origin_prometheus",job=~"$job",}[$interval])

quantile_over_time(0.99, mem:usage:rate1m{origin_prometheus=~"$origin_prometheus",job=~"$job"}[$interval])

1726449162449-f1259a2c-600d-47ec-ae84-54702a9d26c8.png

更新: 2024-09-21 16:14:11