本文记录了在自己的工作站上部署一个测试集群的过程 参考文档

环境

  • HP ML350 工作站,安装了Proxmox VE 8.2
  • Mac Book Pro 本地工作机

制作一个ubuntu虚机模版

基础准备

采用ubuntu 24.04 server amd64 已经将用户ubuntu加入到sudo组

1
2
# 先更新一下包
sudo apt -y update && sudo apt -y dist-upgrade

安装容器引擎

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

sudo apt install containerd.io

配置

/etc/containerd/config.toml

1
2
disabled_plugins = ["cri"]
SystemdCgroup = true

禁用Swap

1
2

sudo swapoff -a

开启网络包转发

1
2
3
4
5
6
7
8
9

sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF


sudo sysctl --system

配置内核模块

1
2
3
4
5
6

sudo tee /etc/modules-load.d/k8s.conf <<EOF
br_netfilter
EOF

modprobe br_netfilter

安装k8s

1
2
3
4
5
6
7
8
9
10
11

sudo apt-get install -y apt-transport-https ca-certificates curl

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
sudo chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list]
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubectl

装集群

把上面做好的ubuntu模版clone 3份备用

容器运行时

如果你从软件包(例如,RPM 或者 .deb)中安装 containerd,你可能会发现其中默认禁止了 CRI 集成插件。 你需要启用 CRI 支持才能在 Kubernetes 集群中使用 containerd。 要确保 cri 没有出现在 /etc/containerd/config.toml 文件中 disabled_plugins 列表内。如果你更改了这个文件,也请记得要重启 containerd。 相关文档

  • 重置containerd配置

    1
    sudo containerd config default > /etc/containerd/config.toml
  • 配置 systemd cgroup 驱动

    1
    2
    3
    4
    5
    # 编辑 /etc/containerd/config.toml
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    ...
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true
  • 重载沙箱(pause)镜像 我用的kubeadm 是1.30 需要修改默认的沙箱镜像不然会有警告

    1
    2
    3
    # 编辑 /etc/containerd/config.toml
    [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.k8s.io/pause:3.9"
  • 一旦你更新了这个配置文件,可能就同样需要重启 containerd:

    1
    sudo systemctl restart containerd
  • GFW

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    #1 编辑 /etc/environment
    PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
    HTTP_PROXY=<http://172.17.90.17:7890
    HTTPS_PROXY=<http://172.17.90.17:7890
    NO_PROXY=localhost,127.0.0.1,.cluster.local,10.244.0.0/16,10.96.0.0/12,172.17.0.0/16,172.16.0.0/16,172.18.0.0/16

    #2 给containerd服务加代理
    sudo mkdir -p /etc/systemd/system/containerd.service.d/
    sudo cat > /etc/systemd/system/containerd.service.d/proxy.conf <<EOF
    [Service]
    EnvironmentFile=/etc/environment
    EOF
    # 给沙盒添加代理
    sudo mkdir -p /etc/systemd/system/sandbox-image.service.d
    sudo cat > /etc/systemd/system/sandbox-image.service.d/proxy.conf <<EOF
    [Service]
    EnvironmentFile=/etc/environment
    EOF

    #3 重启服务
    sudo systemctl daemon-reload
    sudo systemctl restart containerd.service

用kubeadm初始化控制平面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
sudo kubeadm init --control-plane-endpoint=172.17.0.220 --pod-network-cidr=10.244.0.0/16

...

# 返回显示如下:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

kubeadm join 172.17.0.220:6443 --token cp7q2f.xxxx \
--discovery-token-ca-cert-hash sha256:xxxxxxxx \
--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.17.0.220:6443 --token cp7q2f.xxxx \
--discovery-token-ca-cert-hash sha256:xxxxxxxx

记住上面的token和证书hash后面加入节点用

安装网络组件Calico

1
2
3
4
5
6
7
8

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/tigera-operator.yaml


# 可选,安装calicoctl(作为kubectl插件)
curl -L https://github.com/projectcalico/calico/releases/download/v3.25.0/calicoctl-linux-amd64 -o kubectl-calico
chmod +X kubectl-calico
mv kubectl-calico /usr/local/bin/

加入节点

1
2
3

kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>

_

前一篇《搭建Kubernetes集群》里说到kubekey+podman。那是因为新版本docker 抽风影响默认路由。后来重装了老版本docker desktop后解决了,我装的是v4.23.0。 于是像试试docker 自带的kubernetes

安装k8s

2024-01-25T142537 进入Settings > Kubernetes ,钩上然后点Apply & restart
2024-01-25T142713 看到这个变绿就装好了,对,就是那么简单

安装dashboard

1
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

2024-01-25T142949 装好后长这样

生token

admin-user.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md
# apply 后执行: kubectl -n kubernetes-dashboard create token admin-user
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
---
# kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token

apply上面这个manifests 用这个命令获取登录token kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d

访问dashboard

kubectl patch svc kubernetes-dashboard -n kubernetes-dashboard -p '{"spec": {"type": "NodePort"}}'

这里要提到docker k8s的好处,它不用ingress 或者LoadBalancer 只要绑个节点端口,本地就可以访问了。

2024-01-25T144518

https://localhost:30324 这就是dashboard的访问地址了。

2024-01-25T144612 输入刚才获得的token 就可以登录了。

新版Docker desktop会起一个def 路由,导致宿主机不能上网 》 上面这篇提到了新版本的问题,于是用了podman,实际上podman真不怎么好用。于是研究了一下。

新版本 2.6.X 的问题是docker起来后会新添加一个默认路由,把流量路由到bridge100这个设备上导致上网问题。 在网上找到一个issue Strange network routes after upgrading to 4.24.x 里面提到了和我一样的问题,但是人家是linux。他的解决方案也比较简单

1
Downgrading again to version 4.23.0 completely solves that issue - routes are gone.

于是我试了一下,装了docker desktop 4.23 确实有效。

实验环境

我在公司内网的PVE上安装了3个VM,k[1:3] 虚拟机的os 是ubuntu 22.04.3 装了docker-ce,ubuntu 上的containerd 是docker 提供的。

1
2
3
4
5
# 记得把swap关了
swapoff -a

# 把自己(ubuntu)加到docker 组里,这样调用docker的时候就不用sudo了。newgrp docker是切组,也可以退出当前会话再登录。
sudo usermod -aG docker $USER && newgrp docker

kubeadm

这个东西真的蛮复杂的,按照官方的教程我确实跑起了一个,但是因为我起的时候没有安装网络组件(cni)搞了2天各种问题,所以放弃了。下面是步骤:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#containerd
# 预先要添加docker的apt repo https://docs.docker.com/engine/install/ubuntu/
sudo apt install -y containerd.io

# 这里装了containerd.io后还需要把 /etc/containerd/config.toml 配置里的下面这行注释掉:
# disabled_plugins = ["cri"]
# 然后重启服务

# 装kube三兄弟
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt install kubeadm kubelet kubectl
# 锁定包版本,解锁的时候用unhold
sudo apt-mark hold kubelet kubeadm kubectl


# 起集群
sudo kubeadm init --pod-network-cidr=10.244.0.0/18 --service-cidr=10.244.64.0/18

# 得到下面的输出就是ok了

# Your Kubernetes control-plane has initialized successfully!
# To start using your cluster, you need to run the following as a regular user:
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Alternatively, if you are the root user, you can run:
# export KUBECONFIG=/etc/kubernetes/admin.conf
# You should now deploy a pod network to the cluster.
# Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
# https://kubernetes.io/docs/concepts/cluster-administration/addons/
# Then you can join any number of worker nodes by running the following on each as root:
# kubeadm join 172.17.0.220:6443 --token 22tkpm.0eetw95la7mtv6qm \
# --discovery-token-ca-cert-hash sha256:****************************************************************

# 添加节点就按上面输出的最后一行 kubeadm join来

祝君好运~

另寻出路

发现自己手挫不成后就找找别的工具。看了一下kops和rancher都是基于云的,我的目标是本地裸机不符合 于是我看了最早的kubesphere的一个工具kubekey。

kubekey

这货也是坑,不过比kubeadm好多了。 kubekey可以只用来部署k8s 不安装kubesphere,参照这个文章 How to Install Kubernetes the Easy Way Using KubeKey 确实是可以跑起一个集群的。

小坑

k8s版本问题,我贴一下我的config,这里的版本是我从1.29.0一点一点试下去的!因为用的registry是kubesphere团队自己维护的并不是registry.k8s.io所以有坑。倒是可以试试把registy切换到google,但是我没有精力折腾了,有缘人如果有心可以给我留言。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: node1, address: 172.17.0.220, internalAddress: 172.17.0.220, user: ubuntu, privateKeyPath: "~/.ssh/id_rsa"}
- {name: node2, address: 172.17.0.221, internalAddress: 172.17.0.221, user: ubuntu, privateKeyPath: "~/.ssh/id_rsa"}
- {name: node3, address: 172.17.0.222, internalAddress: 172.17.0.222, user: ubuntu, privateKeyPath: "~/.ssh/id_rsa"}
roleGroups:
etcd:
- node1
control-plane:
- node1
worker:
- node1
- node2
- node3
controlPlaneEndpoint:
domain: lb.kubesphere.local
address: ""
port: 6443
kubernetes:
version: v1.22.17
clusterName: cluster.local
autoRenewCerts: true
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.244.64.0/18
kubeServiceCIDR: 10.244.0.0/18
multusCNI:
enabled: false
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []

额外的

我是在我本机执行的kk create cluster 。其实完全可以在节点里找一台做主控来开集群。但是kk 本来就是个远程的playbook,他核心是ansible 只要有ssh权限就ok。在集群起来后需要把集群的管理配置拉到本地和本机的配置合并一下然。

1
2
3
4
# k1 是control-plane,看我上面的配置
scp k1:~/.kube/config ./my-cluster
KUBECONFIG=~/.kube/config:./my-cluster kubectl config view --flatten > new-kubeconfig
mv new-kubeconfig ~/.kube/config

列出所有可控集群

1
2
3
4
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
kind-kind-cluster kind-kind-cluster kind-kind-cluster
* [email protected] cluster.local kubernetes-admin

切换上下文

1
$ kubectl config use-context [email protected]

因为要测试一个grpc的服务,我需要压力测试。协作单位有专门的压测部门我就直接用他们的技术栈了。

准备

  • 用pip装locust
  • 用pip 装 grpc-interceptor

少量编程

新建个python项目

grpc_user.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import time
from typing import Any, Callable
import grpc
import grpc.experimental.gevent as grpc_gevent
from grpc_interceptor import ClientInterceptor
from locust import User
from locust.exception import LocustError

# patch grpc so that it uses gevent instead of asyncio
grpc_gevent.init_gevent()


class LocustInterceptor(ClientInterceptor):
def __init__(self, environment, *args, **kwargs):
super().__init__(*args, **kwargs)

self.env = environment

def intercept(
self,
method: Callable,
request_or_iterator: Any,
call_details: grpc.ClientCallDetails,
):
response = None
exception = None
start_perf_counter = time.perf_counter()
response_length = 0
try:
response = method(request_or_iterator, call_details)
response_length = response.result().ByteSize()
except grpc.RpcError as e:
exception = e

self.env.events.request.fire(
request_type="grpc",
name=call_details.method,
response_time=(time.perf_counter() - start_perf_counter) * 1000,
response_length=response_length,
response=response,
context=None,
exception=exception,
)
return response


class GrpcUser(User):
abstract = True
stub_class = None

def __init__(self, environment):
super().__init__(environment)
for attr_value, attr_name in ((self.host, "host"), (self.stub_class, "stub_class")):
if attr_value is None:
raise LocustError(f"You must specify the {attr_name}.")

self._channel = grpc.insecure_channel(self.host)
interceptor = LocustInterceptor(environment=environment)
self._channel = grpc.intercept_channel(self._channel, interceptor)

self.stub = self.stub_class(self._channel)

locustfile.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from locust import task, constant

import AccountService_pb2
import AccountService_pb2_grpc
import grpc_user


class GrpcUser(grpc_user.GrpcUser):
wait_time = constant(0)
# wait_time = constant_throughput(2)
host = '172.18.14.130:50051'
stub_class = AccountService_pb2_grpc.AccountStub

@task
def updateCharacter(self):
input_data = AccountService_pb2.characterRequest(user_id='123', server='test.tooqing.com:6100',
character_id='123', nickname='seven', avatar='1',
level='99')
response = self.stub.updateCharacter(input_data)
return response

运行

python -m locust

在webui上设定用户数和创建频率,就可以上路了。 2023-11-04T105021

从剪切板粘贴文本到vscode 的terminal里会贴2次。

原因是我在 .zprofile里用了set -a 解决方案也很简单,在文件最后加 set +a 就好了。

本地环境是Mac

用了tungstenite-rs这个crate来做websocket支持,过程中发现发包时候长度和我写入的预期长度不符合。

文档里看到这么一段 原文

pub const fn with_fixed_int_encoding(self) -> Configuration<E, Fixint, A, L>

Fixed-size integer encoding.

  • Fixed size integers are encoded directly
  • Enum discriminants are encoded as u32
  • Lengths and usize are encoded as u64

于是在代码中,构造config 的地方加上后,问题得到解决

1
2
let config = config::standard()
.with_fixed_int_encoding();
0%