Skip to content

Cilium系列四:Cilium集成kubeOVN安装k8s集群

前提条件

  • helm
  • sealos二进制cli
  • kubeOVN的安装脚本install.sh

节点规划如下:

角色节点IP集群版本
master192.168.239.35v1.29.0
worker192.168.239.167v1.29.0
worker192.168.239.98v1.29.0

使用sealos安装k8s集群

因为之后需要安装Cilium,并替代kube-proxy,我们首先生成sealos的Clusterfile并对Clusterfile作修改,

bash
$ sealos gen labring/kubernetes:v1.29.0 --masters=192.168.239.35 --nodes=192.168.239.167,192.168.239.98 -p "xxx" --output Clusterfile

在生成的Clusterfile中,添加skip-phases,

InitConfiguration
yaml
BootstrapTokens: null
CertificateKey: ""
LocalAPIEndpoint:
  AdvertiseAddress: 192.168.239.35
  BindPort: 6443
NodeRegistration:
  CRISocket: /run/containerd/containerd.sock
  IgnorePreflightErrors: null
  KubeletExtraArgs: null
  Name: ""
  Taints: null
Patches: null
SkipPhases: null
SkipPhases: 
  - addon/kube-proxy
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration

以及使用阿里云的镜像仓,

ClusterConfiguration
yaml
APIServer:
  CertSANs:
  - 127.0.0.1
  - apiserver.cluster.local
  - 10.103.97.2
  - 192.168.239.35
  ExtraArgs:
    audit-log-format: json
    audit-log-maxage: "7"
    audit-log-maxbackup: "10"
    audit-log-maxsize: "100"
    audit-log-path: /var/log/kubernetes/audit.log
    audit-policy-file: /etc/kubernetes/audit-policy.yml
    enable-aggregator-routing: "true"
    feature-gates: ""
  ExtraVolumes:
  - HostPath: /etc/kubernetes
    MountPath: /etc/kubernetes
    Name: audit
    PathType: DirectoryOrCreate
    ReadOnly: false
  - HostPath: /var/log/kubernetes
    MountPath: /var/log/kubernetes
    Name: audit-log
    PathType: DirectoryOrCreate
    ReadOnly: false
  - HostPath: /etc/localtime
    MountPath: /etc/localtime
    Name: localtime
    PathType: File
    ReadOnly: true
  - HostPath: /etc/kubernetes
    MountPath: /etc/kubernetes
    Name: audit
    PathType: DirectoryOrCreate
    ReadOnly: false
  - HostPath: /var/log/kubernetes
    MountPath: /var/log/kubernetes
    Name: audit-log
    PathType: DirectoryOrCreate
    ReadOnly: false
  - HostPath: /etc/localtime
    MountPath: /etc/localtime
    Name: localtime
    PathType: File
    ReadOnly: true
  TimeoutForControlPlane: null
CIImageRepository: ""
CIKubernetesVersion: ""
CertificatesDir: ""
ClusterName: ""
ComponentConfigs: null
ControlPlaneEndpoint: apiserver.cluster.local:6443
ControllerManager:
  ExtraArgs:
    bind-address: 0.0.0.0
    cluster-signing-duration: 876000h
    feature-gates: ""
  ExtraVolumes:
  - HostPath: /etc/localtime
    MountPath: /etc/localtime
    Name: localtime
    PathType: File
    ReadOnly: true
  - HostPath: /etc/localtime
    MountPath: /etc/localtime
    Name: localtime
    PathType: File
    ReadOnly: true
DNS:
  ImageRepository: ""
  ImageTag: ""
  Type: ""
Etcd:
  External: null
  Local:
    DataDir: ""
    ExtraArgs:
      listen-metrics-urls: http://0.0.0.0:2381
    ImageRepository: ""
    ImageTag: ""
    PeerCertSANs: null
    ServerCertSANs: null
FeatureGates: null
ImageRepository: ""
ImageRepository: "registry.aliyuncs.com/google_containers"
KubernetesVersion: v1.29.0
Networking:
  DNSDomain: ""
  PodSubnet: 100.64.0.0/10
  ServiceSubnet: 10.96.0.0/22
Scheduler:
  ExtraArgs:
    bind-address: 0.0.0.0
    feature-gates: ""
  ExtraVolumes:
  - HostPath: /etc/localtime
    MountPath: /etc/localtime
    Name: localtime
    PathType: File
    ReadOnly: true
  - HostPath: /etc/localtime
    MountPath: /etc/localtime
    Name: localtime
    PathType: File
    ReadOnly: true
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration

安装Cilium

等待安装k8s集群成功以后,这时coredns还没起来,先安装Cilium,

bash
$ helm install cilium cilium/cilium --namespace kube-system \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=192.168.239.35 \
--set k8sServicePort=6443 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true \
--set gatewayAPI.enabled=true \
--set-string extraConfig.enable-envoy-config=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"

由于Cni优先使用Cilium,之后安装的kubeOVN脚本install.sh需要做一下调整,

shell
// 禁用kubeOVN的NetworkPolicy
ENABLE_NP=${ENABLE_NP:-false}

// 调整优先级
CNI_CONFIG_PRIORITY=${CNI_CONFIG_PRIORITY:-10}

// 启用loadBalancer的service
ENABLE_LB_SVC=${ENABLE_LB_SVC:-true}

Cilium集成kubeOVN

由于Cilium集成kubeOVN,我们采用chaining模式,

bash
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: cni-configuration
  namespace: kube-system
data:
  cni-config: |-
    {
      "name": "generic-veth",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "kube-ovn",
          "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
          "ipam": {
              "type": "kube-ovn",
              "server_socket": "/run/openvswitch/kube-ovn-daemon.sock"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        },
        {
          "type": "cilium-cni"
        }
      ]
    }
EOF

并upgrade Cilium,

bash
$ helm upgrade cilium cilium/cilium --namespace kube-system \
--reuse-values \
--set autoDirectNodeRoutes=true \
--set ipv4NativeRoutingCIDR=10.16.0.0/16 \
--set loadBalancer.acceleration=native \
--set cni.chainingMode=generic-veth \
--set cni.customConf=true \
--set cni.configMap=cni-configuration \
--set enableIPv4Masquerade=false \
--set enableIdentityMark=false \
--set routingMode=native

异常问题

hubble relay异常

升级完以后,如果发现hubble的状态不正常,查看日志发现

bash
$ kubectl logs -f  hubble-relay-67ffc5f588-qr8nt -n kube-system
 level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"

集群中coredns的配置没有cluster.local的解析,我们手动增加一个即可。

yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cili-cluster.tclocal in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
    cluster.local.:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns