[AWS] EKS Storage

728x90

실습환경 배포

EKS를 3개 AZ에 Pub / Priv Subnet으로 배포합니다.

로드밸런서 배포를 위한 태그를 각 서브넷에 설정하고, 노드에서 공유하여 사용할 수 있도록 EFS를 배포하고,

각 Pub 서브넷은 ENI를 통해 연동합니다.

또 operator-vpc에는 운영 서버를 배포하고, eks vpc와 통신이 가능하도록 peering을 맺습니다.

Cloud Formation 배포

[VPC, VPC Peering, routing table, EFS, EC2]

## 배포 파일 다운로드
$ curl -O https://s3.ap-northeast-2.amazonaws.com/cloudformation.cloudneta.net/K8S/myeks-3week.yaml

## 배포
$ aws cloudformation deploy --template-file myeks-3week.yaml \
--stack-name myeks --parameter-overrides KeyName={key 이름} SgIngressSshCidr=$(curl -s ipinfo.io/ip)/32 --region ap-northeast-2

## 운영 서버 IP 확인
$ aws cloudformation describe-stacks --stack-name myeks --query 'Stacks[*].Outputs[*].OutputValue' --output text

## 운영 서버 접속
$ myip=$(aws cloudformation describe-stacks --stack-name myeks --query 'Stackes[*].Outputs[*].OutputValue' --output text)
$ ssh -i {key pair file} ec2-user@$myip

Cluster 배포

## EKSCTL 배포 전 변수 설정

$ export CLUSTER_NAME=myeks
$ export VPCID=$(aws ec2 describe-vpcs --filters "Name=tag:Name,Values=$CLUSTER_NAME-VPC" --query 'Vpcs[*].VpcId' --output text
$ echo $VPCID

$ export PubSubnet1=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet1" --query "Subnets[0].[SubnetId]'
$ export PubSubnet2=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet2" --query "Subnets[0].[SubnetId]'
$ export PubSubnet3=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet3" --query "Subnets[0].[SubnetId]'

$ SSHKEYNAME={key pairname}

## yaml 파일 생성
cat << EOF > myeks.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: myeks
  region: ap-northeast-2
  version: "1.31"

iam:
  withOIDC: true # enables the IAM OIDC provider as well as IRSA for the Amazon CNI plugin

  serviceAccounts: # service accounts to create in the cluster. See IAM Service Accounts
  - metadata:
      name: aws-load-balancer-controller
      namespace: kube-system
    wellKnownPolicies:
      awsLoadBalancerController: true

vpc:
  cidr: 192.168.0.0/16
  clusterEndpoints:
    privateAccess: true # if you only want to allow private access to the cluster
    publicAccess: true # if you want to allow public access to the cluster
  id: $VPCID
  subnets:
    public:
      ap-northeast-2a:
        az: ap-northeast-2a
        cidr: 192.168.1.0/24
        id: $PubSubnet1
      ap-northeast-2b:
        az: ap-northeast-2b
        cidr: 192.168.2.0/24
        id: $PubSubnet2
      ap-northeast-2c:
        az: ap-northeast-2c
        cidr: 192.168.3.0/24
        id: $PubSubnet3

addons:
  - name: vpc-cni # no version is specified so it deploys the default version
    version: latest # auto discovers the latest available
    attachPolicyARNs: # attach IAM policies to the add-on's service account
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
    configurationValues: |-
      enableNetworkPolicy: "true"

  - name: kube-proxy
    version: latest

  - name: coredns
    version: latest

  - name: metrics-server
    version: latest

managedNodeGroups:
- amiFamily: AmazonLinux2023
  desiredCapacity: 3
  iam:
    withAddonPolicies:
      certManager: true # Enable cert-manager
      externalDNS: true # Enable ExternalDNS
  instanceType: t3.medium
  preBootstrapCommands:
    # install additional packages
    - "dnf install nvme-cli links tree tcpdump sysstat ipvsadm ipset bind-utils htop -y"
  labels:
    alpha.eksctl.io/cluster-name: myeks
    alpha.eksctl.io/nodegroup-name: ng1
  maxPodsPerNode: 100
  maxSize: 3
  minSize: 3
  name: ng1
  ssh:
    allow: true
    publicKeyName: $SSHKEYNAME
  tags:
    alpha.eksctl.io/nodegroup-name: ng1
    alpha.eksctl.io/nodegroup-type: managed
  volumeIOPS: 3000
  volumeSize: 120
  volumeThroughput: 125
  volumeType: gp3
EOF

yaml 파일을 이용한 eks 배포

$ eksctl create cluster -f myeks.yaml --verbose4

EKS 정보 확인

## Cluster 정보 확인
$ k cluster-info

## namespace 변경
$ kubens default

default
kube-node-lease
kube-public
kube-system

## context 확인
$ k ctx

eksworkshop
kind-myk8s

$ k config rename-context "{Cluster Context 주소}" "{변경할 이름}"

## 배포된 노드의 az 확인
$ k get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
NAME                                               STATUS   ROLES    AGE   VERSION               INSTANCE-TYPE   CAPACITYTYPE   ZONE
ip-192-168-1-149.ap-northeast-2.compute.internal   Ready    <none>   32m   v1.31.5-eks-5d632ec   t3.medium       ON_DEMAND      ap-northeast-2a
ip-192-168-2-151.ap-northeast-2.compute.internal   Ready    <none>   32m   v1.31.5-eks-5d632ec   t3.medium       ON_DEMAND      ap-northeast-2b
ip-192-168-3-71.ap-northeast-2.compute.internal    Ready    <none>   32m   v1.31.5-eks-5d632ec   t3.medium       ON_DEMAND      ap-northeast-2c

$ k get node -v=6
I0222 13:57:37.359323   91042 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
I0222 13:57:37.359332   91042 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
I0222 13:57:37.359334   91042 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
I0222 13:57:37.359337   91042 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
I0222 13:57:37.361931   91042 round_trippers.go:470] GET https://.gr7.ap-northeast-2.eks.amazonaws.com/api/v1/nodes?limit=500
I0222 13:57:37.361938   91042 round_trippers.go:476] Request Headers:
I0222 13:57:37.361942   91042 round_trippers.go:480]     Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json
I0222 13:57:37.361944   91042 round_trippers.go:480]     User-Agent: kubectl/v1.32.0 (darwin/arm64) kubernetes/70d3cc9
I0222 13:57:37.451493   91042 round_trippers.go:581] Response Status: 200 OK in 89 milliseconds
NAME                                               STATUS   ROLES    AGE   VERSION
ip-192-168-1-149.ap-northeast-2.compute.internal   Ready    <none>   32m   v1.31.5-eks-5d632ec
ip-192-168-2-151.ap-northeast-2.compute.internal   Ready    <none>   32m   v1.31.5-eks-5d632ec
ip-192-168-3-71.ap-northeast-2.compute.internal    Ready    <none>   32m   v1.31.5-eks-5d632ec

운영 ec2에서 EFS 마운트

## 운영 vm에서 eks 사용 가능하도록 config update
$ aws eks update-config --name myeks

## cluster 접근 확인
$ k cluster-info
$ k ns default
$ k get node -v6
I0222 15:33:35.159026    4021 round_trippers.go:553] GET https://0D4A34647712221A7D75A35C5C127013.gr7.ap-northeast-2.eks.amazonaws.com/api/v1/nodes?limit=500 200 OK in 763 milliseconds
NAME                                               STATUS   ROLES    AGE    VERSION
ip-192-168-1-149.ap-northeast-2.compute.internal   Ready    <none>   128m   v1.31.5-eks-5d632ec
ip-192-168-2-151.ap-northeast-2.compute.internal   Ready    <none>   128m   v1.31.5-eks-5d632ec
ip-192-168-3-71.ap-northeast-2.compute.internal    Ready    <none>   128m   v1.31.5-eks-5d632ec

## 현재 EFS 정보 확인
$ aws efs describe-file-systems | jq
      "LifeCycleState": "available",
      "Name": "myeks-EFS",
      "NumberOfMountTargets": 3,
      "SizeInBytes": {
        "Value": 6144,
        "Timestamp": "2025-02-22T15:25:02+09:00",
        "ValueInIA": 0,
        "ValueInStandard": 6144,
        "ValueInArchive": 0
      },
      "PerformanceMode": "generalPurpose",
      "Encrypted": false,
      "ThroughputMode": "bursting",
      "Tags": [
        {
          "Key": "Name",
          "Value": "myeks-EFS"
        }
      ],
      "FileSystemProtection": {
        "ReplicationOverwriteProtection": "ENABLED"

## EFS ID만 출력
$ aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text
fs-028d81a817631ca1f

## EFS Mount 대상 정보 확인 (Owner ID 제외를 위한 추가 필터 적용)
$ aws efs describe-mount-targets --file-system-id $(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text) | jq '.MountTargets |= map(del(.OwnerId))'
{
  "MountTargets": [
    {
      "MountTargetId": "fsmt-05a1cef494c9d4a1a",
      "FileSystemId": "fs-028d81a817631ca1f",
      "SubnetId": "subnet-0729d0ff885812a62",
      "LifeCycleState": "available",
      "IpAddress": "192.168.1.47",
      "NetworkInterfaceId": "eni-01bc58051b7a86c4f",
      "AvailabilityZoneId": "apne2-az1",
      "AvailabilityZoneName": "ap-northeast-2a",
      "VpcId": "vpc-0f8aa12fbf6f08c5b"
    },
    {
      "MountTargetId": "fsmt-0d84ac6b875a72d5e",
      "FileSystemId": "fs-028d81a817631ca1f",
      "SubnetId": "subnet-0cc21ae76cef86c7c",
      "LifeCycleState": "available",
      "IpAddress": "192.168.3.132",
      "NetworkInterfaceId": "eni-04a96892b80691a68",
      "AvailabilityZoneId": "apne2-az3",
      "AvailabilityZoneName": "ap-northeast-2c",
      "VpcId": "vpc-0f8aa12fbf6f08c5b"
    },
    {
      "MountTargetId": "fsmt-03d2074538e8c5862",
      "FileSystemId": "fs-028d81a817631ca1f",
      "SubnetId": "subnet-04781281b220d3ffc",
      "LifeCycleState": "available",
      "IpAddress": "192.168.2.196",
      "NetworkInterfaceId": "eni-0f64e351776b48989",
      "AvailabilityZoneId": "apne2-az2",
      "AvailabilityZoneName": "ap-northeast-2b",
      "VpcId": "vpc-0f8aa12fbf6f08c5b"
    }
  ]
}

## EFS 대상 IP만 출력
$ aws efs describe-mount-targets --file-system-id $(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text) --query "MountTargets[*].IpAddress" --output text

## EFS 마운트 테스트
$ EFSIP1=192.168.1.47

## 현재 마운트 정보 확인
$ df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  981M     0  981M   0% /dev
tmpfs          tmpfs     990M     0  990M   0% /dev/shm
tmpfs          tmpfs     990M  428K  989M   1% /run
tmpfs          tmpfs     990M     0  990M   0% /sys/fs/cgroup
/dev/xvda1     xfs        30G  3.0G   28G  10% /
tmpfs          tmpfs     198M     0  198M   0% /run/user/1000

## EFS Mount
$ mkdir /mnt/myefs
$ mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048516,hard,timeo=600,noresvport $EFSIP1:/ /mnt/myefs

## Mount 확인
$ df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  981M     0  981M   0% /dev
tmpfs          tmpfs     990M     0  990M   0% /dev/shm
tmpfs          tmpfs     990M  432K  989M   1% /run
tmpfs          tmpfs     990M     0  990M   0% /sys/fs/cgroup
/dev/xvda1     xfs        30G  3.0G   28G  10% /
tmpfs          tmpfs     198M     0  198M   0% /run/user/1000
192.168.1.47:/ nfs4      8.0E     0  8.0E   0% /mnt/myefs

## 파일 생성
$ nfsstat
$ echo "EKS Workshop" > /mnt/myefs/memo.txt
$ nfsstat
$ ls -l /mnt/myefs
$ cat /mnt/myefs/memo.txt

-t nfs4
→ 마운트할 파일 시스템의 종류를 NFS 버전 4로 지정합니다.

-o nfsvers=4.1
→ NFS 프로토콜 버전을 4.1로 사용하도록 지정합니다. 기본적으로 NFSv4가 사용되지만, 이 옵션으로 세부 버전을 명시할 수 있습니다.

-o rsize=1048576
→ NFS 읽기 작업 시 한 번에 읽어올 수 있는 최대 바이트 수를 1,048,576(1MB)로 설정합니다.
(읽기 효율성을 높이기 위해 큰 값으로 설정할 수 있습니다.)

-o wsize=1048516
→ NFS 쓰기 작업 시 한 번에 쓸 수 있는 최대 바이트 수를 1,048,516로 설정합니다.
(rsize와 유사하게 전송 효율을 위해 사용하며, 약 1MB 크기입니다.)

-o hard
→ "하드 마운트"를 지정합니다. 서버 응답이 없을 경우 클라이언트가 계속해서 재시도하도록 하여, 데이터 손실 없이 안정적으로 복구되도록 합니다.
(반대로 "soft" 옵션을 사용하면 타임아웃 후 오류를 반환할 수 있습니다.)

-o timeo=600
→ NFS 요청 타임아웃 시간을 설정합니다. 단위는 1/10초이므로, 600은 60초(600/10)를 의미합니다.
(타임아웃 값이 길면 서버가 느리더라도 재시도하며 기다리게 됩니다.)

-o noresvport
→ NFS 클라이언트가 예약된 포트(0~1023)를 사용하지 않고, 임의의 비예약 포트를 사용하도록 지정합니다.
(일부 서버에서는 예약 포트 사용을 제한할 수 있으므로 이 옵션을 사용하여 연결 문제를 방지할 수 있습니다.)

스토리지 이해

기본적으로 K8s의 Pod가 삭제되면 pod 내부의 모든 데이터가 삭제됩니다.

container 시스템에서도 동일하게 컨테이너가 삭제되면 모든 데이터가 삭제되는 stateless 어플리케이션이었습니다.

https://aws.amazon.com/ko/blogs/tech/persistent-storage-for-kubernetes/

휘발성의 데이터와 같은 임시 데이터는 삭제되어도 무관하지만 데이터 보존이 필요할 경우 Stateful 한 PV & PVC를 사용합니다.

PV를 통해 어느 노드에나 연결하여 사용이 가능합니다. 예) NFS, EBS, Ceph

파드가 생성될 때 자동으로 볼륨을 마운트하여 파드에 연결하는 기능을 동적 프로비저닝이라고 합니다.

Persistent Volume은 사용이 끝났을 때 해당 볼륨은 어떻게 초기화할 것인지 별도로 설정할 수 있는데, 쿠버네티스는 이를 Reclaim Policy라고 합니다.

Reclaim Policy는 크게 Retain, Delete 방식이 있습니다. Delete 시 EBS 볼륨도 삭제됩니다.

스토리지 소개

- 종류 : emptyDir, hostPath, PV/PVC

- 다양한 볼륨 사용 : k8s 자체 (hostPath, local), On-prem(ceph 등), NFS, Cloud Storage(AWS EBS와 같은 각 csp 의 Block Storage)

- 동적 프로비저닝 & 볼륨 상태, ReclaimPolicy

- CSI

CSI Driver : Kubernetes Source code 내부에 존재하는 AWS EBS Provisioner는 당연히 Kubernetes release lifecycle을 따라서 배포되므로, Provisioner 신규 기능을 사용하기 위해서는 Kubernetes version을 업그레이드해야 하는 제약 사항이 있었다.
따라서, Kubernetes 개발자는 내부에 내장된 Provisioner (in-tree)를 모두 삭제하고, 별도의 controller Pod를 통해 동적 Provisioning을 사용할 수 있도록 만들었습니다.
CSI를 사용하면, Kubernetes의 공통화된 CSI 인터페이스를 통해 다양한 프로바이더를 사용할 수 있습니다.

아래와 같은 방식은 AWS 뿐만 아니라 Public Cloud를 서비스하는 CSP들이 주로 사용하고 있습니다.

Node-specific Volume Limits

AWS EC2 Type에 따라 볼륨 최대 제한 : 25개 or 39개

- Spec type 별 연결 최대 한도 확인 (AWS BLOG)

$ kubectl get csinode ip-<redacted>.eu-west-1.compute.internal -o yaml
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
  annotations:
    storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
...
  name: ip-<redacted>.eu-west-1.compute.internal
  ownerReferences:
  - apiVersion: v1
    kind: Node
...
spec:
  drivers:
  - allocatable:
      count: 127
    name: ebs.csi.aws.com
    nodeID: i-<redacted>

파드 기본 및 empty 저장소 동작 확인

Pod 기본 저장소 동작 확인

## 모니터링 [터미널 1]
$ k get pod -w
NAME    READY   STATUS              RESTARTS   AGE
redis   0/1     ContainerCreating   0          6s
redis   1/1     Running             0          8s

## redis pod 생성 [터미널 2]
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: redis
    image: redis
EOF

## redis pod 내 파일 작성
$ k exec -it redis -- pwd

## 파일 생성
$ k exec -it redis -- sh -c "echo hello > /data/hello.txt"

## 파일 생성 확인
$ k exec -it redis -- ls -l
total 4
-rw-r--r--. 1 root root 6 Feb 22 11:08 hello.txt

## 파일 내용 확인
$ k exec -it redis -- cat hello.txt
hello

## redis kill 시 restartPolicy 정책에 의해 바로 재시작이 이뤄진다.
$ k exec -it redis -- kill 1
$ k describe pod redis
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 22 Feb 2025 20:07:22 +0900
      Finished:     Sat, 22 Feb 2025 20:11:44 +0900
    Ready:          True
    Restart Count:  1
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ncb2t (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-ncb2t:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age                    From               Message
  ----    ------     ----                   ----               -------
  Normal  Scheduled  9m                     default-scheduler  Successfully assigned default/redis to ip-192-168-3-71.ap-northeast-2.compute.internal
  Normal  Pulled     8m53s                  kubelet            Successfully pulled image "redis" in 6.241s (6.241s including waiting). Image size: 45006683 bytes.
  Normal  Pulling    4m30s (x2 over 8m59s)  kubelet            Pulling image "redis"
  Normal  Created    4m29s (x2 over 8m53s)  kubelet            Created container redis
  Normal  Started    4m29s (x2 over 8m53s)  kubelet            Started container redis
  Normal  Pulled     4m29s                  kubelet            Successfully pulled image "redis" in 1.387s (1.387s including waiting). Image size: 45006683 bytes.

$ k get pod redis -o yaml
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-192-168-3-71.ap-northeast-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 0
  tolerations:
  
## 재기동 상태에서 이전에 생성했던 파일을 확인해보면 별도 볼륨 설정을 하지 않았기 때문에 pod 삭제와 함께 삭제된 것을 확인할 수 있다.
$ k exec -it redis -- ls -l

emptyDir 동작 확인

## 모니터링 터미널 유지
## redis pod 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: redis
    image: redis
    volumeMounts:
    - name: redis-storage
      mountPath: /data/redis
  volumes:
  - name: redis-storage
    emptyDir: {}
EOF

$ k describe pod/redis
Volumes:
  redis-storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>

## redis pod 내에 파일 생성
$ k exec -it redis -- pwd
$ k exec -it redis -- sh -c "echo hello > /data/redis/hello.txt"
$ k exec -it redis -- ls -l /data/redis/
-rw-r--r--. 1 root root 6 Feb 22 11:24 hello.txt
$ k exec -it redis -- cat /data/redis/hello.txt
hello

## pod 종료
$ k exec -it redis -- kill 1

## 파일 확인
$ k exec -it redis -- ls -l /data/redis

## pod 삭제
$ k delete redis

## pod 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: redis
    image: redis
    volumeMounts:
    - name: redis-storage
      mountPath: /data/redis
  volumes:
  - name: redis-storage
    emptyDir: {}
EOF

## 파일 확인
$ k exec -it redis -- ls -l /data/redis

emptyDir은 할당 시 node 로컬 디스크의 공간을 pod / 공간에 할당하는 것으로 종료 or 재시작의 경우에는 해당 공간을 반납하지 않기 때문에 데이터가 유지되고 있는 것이며, 삭제 시에는 할당 공간을 반납하기 때문에 데이터가 삭제되는 것입니다.

HostPath를 사용하는 PV/PVC: local-path-provisioner 스토리지 클래스

## local-path provisioner 배포
$ kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml

## 설치된 local-path 정보 확인
$ k get sc/local-path -o yaml | k neat
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-path
provisioner: rancher.io/local-path
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

## configmap 정보 확인
$ k get cm local-path-config -o yaml | k neat
apiVersion: v1
data:
  config.json: |-
    {
            "nodePathMap":[
            {
                    "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
                    "paths":["/opt/local-path-provisioner"]
            }
            ]
    }
  helperPod.yaml: |-
    apiVersion: v1
    kind: Pod
    metadata:
      name: helper-pod
    spec:
      priorityClassName: system-node-critical
      tolerations:
        - key: node.kubernetes.io/disk-pressure
          operator: Exists
          effect: NoSchedule
      containers:
      - name: helper-pod
        image: busybox
        imagePullPolicy: IfNotPresent
  setup: |-
    #!/bin/sh
    set -eu
    mkdir -m 0777 -p "$VOL_DIR"
  teardown: |-
    #!/bin/sh
    set -eu
    rm -rf "$VOL_DIR"
kind: ConfigMap
metadata:
  name: local-path-config
  namespace: local-path-storage

PV / PVC 를 사용하는 파드 생성

## PVC 생성
$ cat <<EOF k apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: localpath-claim
spec:
  accessModes:
    - ReadWriteOnce
  sotrageClassName: local-path
  resources:
    requests:
      storage: 1Gi
EOF

## 생성 확인
$ k get pvc
NAME              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
localpath-claim   Pending                                      local-path     <unset>                 6s
$ k describe pvc
Name:          localpath-claim
Namespace:     local-path-storage
StorageClass:  local-path
Status:        Pending
Volume:
Labels:        <none>
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                Age               From                         Message
  ----    ------                ----              ----                         -------
  Normal  WaitForFirstConsumer  6s (x3 over 30s)  persistentvolume-controller  waiting for first consumer to be created before binding
  
## POD 생성
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
  	image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
    	mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: localpath-claim
EOF
## 리소스 확인 확인
$ k get pod,pv,pvc
NAME                                         READY   STATUS    RESTARTS   AGE
pod/app                                      1/1     Running   0          31s
pod/local-path-provisioner-84967477f-xbvfc   1/1     Running   0          69m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
persistentvolume/pvc-951e6e06-780f-4915-b74a-bd73609f9eab   1Gi        RWO            Delete           Bound    local-path-storage/localpath-claim   local-path     <unset>                          15m

NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/localpath-claim   Bound    pvc-951e6e06-780f-4915-b74a-bd73609f9eab   1Gi        RWO            local-path     <unset>                 24m

## pv 상세 확인
$ k describe pv

Name:              pvc-951e6e06-780f-4915-b74a-bd73609f9eab
Labels:            <none>
Annotations:       local.path.provisioner/selected-node: ip-192-168-1-149.ap-northeast-2.compute.internal
                   pv.kubernetes.io/provisioned-by: rancher.io/local-path
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      local-path
Status:            Bound
Claim:             local-path-storage/localpath-claim
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          1Gi
Node Affinity:
  Required Terms:
    Term 0:        kubernetes.io/hostname in [ip-192-168-1-149.ap-northeast-2.compute.internal]
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/local-path-provisioner/pvc-951e6e06-780f-4915-b74a-bd73609f9eab_local-path-storage_localpath-claim
    HostPathType:  DirectoryOrCreate
Events:            <none>

## pod 파일 확인
k exec -it app -- tail -f /data/out.txt
Sat Feb 22 12:45:13 UTC 2025
Sat Feb 22 12:45:18 UTC 2025
Sat Feb 22 12:45:23 UTC 2025
Sat Feb 22 12:45:28 UTC 2025
Sat Feb 22 12:45:33 UTC 2025
Sat Feb 22 12:45:38 UTC 2025
Sat Feb 22 12:45:43 UTC 2025
Sat Feb 22 12:45:48 UTC 2025
Sat Feb 22 12:45:53 UTC 2025
Sat Feb 22 12:45:58 UTC 2025


## 워커 노드 중 파드가 배포되어 있는 곳에 local-path에 파일 존재 확인
$ for node in $N1 $N2 $N3; do ssh ec2-user@$node tree /opt/local-path-provisioner; done
/opt/local-path-provisioner
└── pvc-951e6e06-780f-4915-b74a-bd73609f9eab_local-path-storage_localpath-claim
    └── out.txt

1 directory, 1 file
/opt/local-path-provisioner [error opening dir]

0 directories, 0 files
/opt/local-path-provisioner [error opening dir]

0 directories, 0 files

파드 삭제 후 재생성하여 데이터 확인

## pod 삭제
$ k delete pod app

## 워커 노드 파일 조회
$ ssh ec2-user@$N1 tree /opt/local-path-provisioner
/opt/local-path-provisioner
└── pvc-951e6e06-780f-4915-b74a-bd73609f9eab_local-path-storage_localpath-claim
    └── out.txt
    
## 파드 재생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: localpath-claim
EOF

## 데이터 확인
$ k exec -it app -- head /data/out.txt

Sat Feb 22 12:29:28 UTC 2025
Sat Feb 22 12:29:33 UTC 2025
Sat Feb 22 12:29:38 UTC 2025
Sat Feb 22 12:29:44 UTC 2025
Sat Feb 22 12:29:49 UTC 2025
Sat Feb 22 12:29:54 UTC 2025
Sat Feb 22 12:29:59 UTC 2025
Sat Feb 22 12:30:04 UTC 2025
Sat Feb 22 12:30:09 UTC 2025
Sat Feb 22 12:30:14 UTC 2025

$ k exec -it app -- tail -f /data/out.txt

Sat Feb 22 12:48:53 UTC 2025
Sat Feb 22 12:50:26 UTC 2025
Sat Feb 22 12:50:31 UTC 2025
Sat Feb 22 12:50:36 UTC 2025
Sat Feb 22 12:50:41 UTC 2025
Sat Feb 22 12:50:46 UTC 2025
Sat Feb 22 12:50:51 UTC 2025
Sat Feb 22 12:50:56 UTC 2025
Sat Feb 22 12:51:01 UTC 2025
Sat Feb 22 12:51:06 UTC 2025

Kubestr 모니터링 및 성능 측정 확인

kubenetes 환경에서 iops를 측정할 수 있는 성능 툴 (DOCS)

## 운영서버에서 진행
## kubestr 설치
$ wget https://github.com/kastenhq/kubestr/releases/download/v0.4.48/kubestr_0.4.48_Linux_amd64.tar.gz
$ tar xvfz kubestr_0.4.48_Linux_amd64.tar.gz && mv kubestr /usr/local/bin/ && chmod +x /usr/local/bin/kubestr

## kubestr 실행
$ kubestr -h
$ kubestr
**************************************
  _  ___   _ ___ ___ ___ _____ ___
  | |/ / | | | _ ) __/ __|_   _| _ \
  | ' <| |_| | _ \ _|\__ \ | | |   /
  |_|\_\\___/|___/___|___/ |_| |_|_\

Explore your Kubernetes storage options
**************************************
Kubernetes Version Check:
  Valid kubernetes version (v1.31.5-eks-8cce635)  -  OK

RBAC Check:
  Kubernetes RBAC is enabled  -  OK

Aggregated Layer Check:
  The Kubernetes Aggregated Layer is enabled  -  OK

Available Storage Provisioners:

  kubernetes.io/aws-ebs:
    This is an in tree provisioner.

    Storage Classes:
      * gp2

    To perform a FIO test, run-
      ./kubestr fio -s <storage class>

    To perform a check for block device support, run-
      ./kubestr blockmount -s <storage class>

  rancher.io/local-path:
    Unknown driver type.

    Storage Classes:
      * local-path

    To perform a FIO test, run-
      ./kubestr fio -s <storage class>

    To perform a check for block device support, run-
      ./kubestr blockmount -s <storage class>
      
## 모니터링
$ watch 'kubectl get pod -o wide; echo; kubectl get pv,vpc'

## 각 노드에서 iostat 실행
$ iostat -xmdz 1

# rrqm/s : 초당 드라이버 요청 대기열에 들어가 병합된 읽기 요청 횟수
# wrqm/s : 초당 드라이버 요청 대기열에 들어가 병합된 쓰기 요청 횟수
# r/s : 초당 디스크 장치에 요청한 읽기 요청 횟수
# w/s : 초당 디스크 장치에 요청한 쓰기 요청 횟수
# rMB/s : 초당 디스크 장치에서 읽은 메가바이트 수
# wMB/s : 초당 디스크 장치에 쓴 메가바이트 수
# await : 가장 중요한 지표, 평균 응답 시간. 드라이버 요청 대기열에서 기다린 시간과 장치의 I/O 응답시간을 모두 포함 (단위: ms)

## 랜덤 읽기 테스트 진행
# 테스트 시나리오
cat << EOF > fio-read.fio
[global]
ioengine=libaio
direct=1
bs=4k
runtime=120
time_based=1
iodepth=16
numjobs=4
group_reporting
size=1g
rw=randread
[read]
EOF

$ kubestr fio -f fio-read.fio -s local-path --size 10G ## size 미지정 시 100GB default

#[worker node 1]
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1       3001.00     11.72     0.00   0.00   17.32     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   51.99  97.00

====================================================================================
PVC created kubestr-fio-pvc-94mct
Pod created kubestr-fio-pod-8glkf
Running FIO test (fio-read.fio) on StorageClass (local-path) with a PVC of Size (10G)
Elapsed time- 2m33.057838765s
FIO test results:

FIO version - fio-3.36
Global options - ioengine=libaio verify= direct=1 gtod_reduce=

JobName:
  blocksize= filesize= iodepth= rw=
read:
  IOPS=3022.496094 BW(KiB/s)=12089
  iops: min=2300 max=8990 avg=3024.359863
  bw(KiB/s): min=9200 max=35960 avg=12097.523438

Disk stats (read/write):
  nvme0n1: ios=362283/283 merge=0/80 ticks=6345036/5712 in_queue=6350748, util=96.630768%
  -  OK
====================================================================================


## 랜덤 쓰기 테스트 진행
# 테스트 시나리오
cat << EOF > fio-write.fio
[global]
ioengine=libaio
numjobs=16
iodepth=16
direct=1
bs=4k
runtime=120
time_based=1
size=1g
group_reporting
rw=randrw
rwmixread=0
rwmixwrite=100
[write]
EOF

$ kubestr fio -f fio-write.fio -s local-path --size 20G
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00 1389.00    125.39     0.00   0.00    5.64    92.44    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    7.83  96.00

====================================================================================
PVC created kubestr-fio-pvc-bqs4m
Pod created kubestr-fio-pod-hm67c
Running FIO test (fio-write.fio) on StorageClass (local-path) with a PVC of Size (20G)
Elapsed time- 4m11.689600051s
FIO test results:

FIO version - fio-3.36
Global options - ioengine=libaio verify= direct=1 gtod_reduce=

JobName:
  blocksize= filesize= iodepth= rw=
write:
  IOPS=3024.503418 BW(KiB/s)=12098
  iops: min=1546 max=8622 avg=3023.619141
  bw(KiB/s): min=6184 max=34489 avg=12097.736328

Disk stats (read/write):
  nvme0n1: ios=9/362345 merge=0/6 ticks=191/7113967 in_queue=7114158, util=95.902184%
  -  OK
====================================================================================

AWS EBS Controller

Volume (ebs-csi-controller) : EBS CSI driver 동작 - 파드 동작에 EBS 볼륨이 필요할 경우 API Server를 기점으로 요청하여,

POD가 생성된 node가 있는 AZ에 볼륨을 생성하고 연결합니다. 연결된 EBS는 마운트되어 POD에 할당됩니다.

AWS CSI 드라이버는 크게 2개 구성요소가 있습니다.

1) CSI-Controller : AWS API를 호출하면서 AWS 스토리지를 관리

2) CSI-Node : kubelet과 상호작용하면서 AWS 스토리지를 pod에 마운트

Amazon EBS CSI driver as an Amazon EKS add-on 설치

## 아래는 aws-ebs-csi-driver 전체 버전 정보와 기본 설치 버전(True) 정보 확인
$ aws eks describe-addon-versions \
    --addon-name aws-ebs-csi-driver \
    --kubernetes-version 1.31 \
    --query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" \
    --output text

## ISRA 설정 : AWS관리형 정책 AmazonEBSCSIDriverPolicy 사용
$ eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster ${CLUSTER_NAME} \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve \
  --role-only \
  --role-name AmazonEKS_EBS_CSI_DriverRole
  
## ISRA 확인
$ eksctl get iamserviceaccount --cluster ${CLUSTER_NAME}

## Amazon EBS CSI driver addon 배포(설치)
$ export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
$ eksctl create addon --name aws-ebs-csi-driver --cluster ${CLUSTER_NAME} --service-account-role-arn arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKS_EBS_CSI_DriverRole --force
$ kubectl get sa -n kube-system ebs-csi-controller-sa -o yaml | head -5

## 확인
$ eksctl get addon --cluster ${CLUSTER_NAME}
$ kubectl get deploy,ds -l=app.kubernetes.io/name=aws-ebs-csi-driver -n kube-system
$ kubectl get pod -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node)'
$ kubectl get pod -n kube-system -l app.kubernetes.io/component=csi-driver

## ebs-csi-controller 파드에 6개 컨테이너 확인
$ kubectl get pod -n kube-system -l app=ebs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}' ; echo
$ ebs-plugin csi-provisioner csi-attacher csi-snapshotter csi-resizer liveness-probe

## csinodes 확인
$ kubectl api-resources | grep -i csi
$ kubectl get csinodes
$ kubectl describe csinodes

$ kubectl get csidrivers
NAME              ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
ebs.csi.aws.com   true             false            false             <unset>         false               Persistent   3m59s
efs.csi.aws.com   false            false            false             <unset>         false               Persistent   9h

$ kubectl describe csidrivers ebs.csi.aws.com

노드에 최대 EBS 부착 수량 변경

이 명령어에서 --configuration-values 옵션이 aws cli 버전을 타는 듯하니 오류가 발생할 경우 cli를 업그레이드 하자

$ aws eks update-addon --cluster-name ${CLUSTER_NAME} --addon-name aws-ebs-csi-driver \
  --addon-version v1.39.0-eksbuild.1 --configuration-values '{
    "node": {
      "volumeAttachLimit": 31,
      "enableMetrics": true
    }
  }'

GP3 스토리지 클래스 생성

# gp3 스토리지 클래스 생성
$ kubectl get sc
cat <<EOF | kubectl apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  #iops: "5000"
  #throughput: "250"
  allowAutoIOPSPerGBIncrease: 'true'
  encrypted: 'true'
  fsType: xfs # 기본값이 ext4
EOF

$ kubectl get sc
$ kubectl describe sc gp3 | grep Parameters

volumeBindingMode 필드는 볼륨 바인딩과 동적 프로비저닝의 시작 시기를 제어합니다. 설정되어 있지 않으면, Immediate 모드가 기본으로 사용 됩니다.
- Immediate 모드는 PersistentVolumeClaim이 생성되면 볼륨 바인딩과 동적 프로비저닝이 즉시 발생하는 것을 나타냅니다. 토폴로지 제약이 잇고 클러스터의 모든 노드에서 전역적으로 접근할 수 없는 스토리지 백엔드의 경우, 파드의 스케줄링 요구 사항에 대한 파악없이 퍼시스턴트 볼륨이 바인딩되거나 프로비저닝되며, 이로 인해 스케줄되지 않은 파드가 발생할 수 있습니다.
- WaitForFirstConsumer 모드를 지정해서 이 문제를 해결할 수 있으며, 이 모드는 PersistentVolumeClaim을 사용하는 파드가 생성될 때까지 PersistentVolume의 바인딩과 프로비저닝을 지연시킵니다. PersistentVolume은 파드의 스케줄링 제약 조건에 의해 지정된토폴로지에 따라 선택되거나 프로비저닝 됩니다.

Immediate	PVC가 생성되는 시점에 동작
WaitForFirstConsumer	Pod가 생성되는 시점에 동작

PV / PVC 파드 테스트

# 워커노드의 EBS 볼륨 확인 : tag(키/값) 필터링 - 링크
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --output table
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[*].Attachments" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[].[VolumeId, VolumeType, Attachments[].[InstanceId, State][]][]" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" | jq

# 워커노드에서 파드에 추가한 EBS 볼륨 확인
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --output table
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | jq
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" | jq

# 워커노드에서 파드에 추가한 EBS 볼륨 모니터링
while true; do aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" --output text; date; sleep 1; done

# PVC 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: gp3
EOF

$ kubectl get pvc,pv

# 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: ebs-claim
EOF

# PVC, 파드 확인
$ kubectl get pvc,pv,pod
$ kubectl get VolumeAttachment
$ kubectl df-pv

# 추가된 EBS 볼륨 상세 정보 확인 : AWS 관리콘솔 EC2(EBS)에서 확인
$ aws ec2 describe-volumes --volume-ids $(kubectl get pv -o jsonpath="{.items[0].spec.csi.volumeHandle}") | jq

# PV 상세 확인
$ kubectl get pv -o yaml

파일 생성 및 내용 저장

$ kubectl exec -it app -- tail -f /data/out.txt

## 파드 내에서 볼륨 정보 확인
$ kubectl exec -it app -- sh -c 'df -hT --type=overlay'
$ kubectl exec -it app -- sh -c 'df -hT --type=xfs'

볼륨 사이즈 증설 : 증설은 가증하지만 축소는 불가능

# 현재 pv 의 이름을 기준하여 4G > 10G 로 증가 : .spec.resources.requests.storage의 4Gi 를 10Gi로 변경
kubectl get pvc ebs-claim -o jsonpath={.spec.resources.requests.storage} ; echo
kubectl get pvc ebs-claim -o jsonpath={.status.capacity.storage} ; echo
kubectl patch pvc ebs-claim -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'

# 확인 : 볼륨 용량 수정 반영이 되어야 되니, 수치 반영이 조금 느릴수 있다
kubectl exec -it app -- sh -c 'df -hT --type=xfs'
kubectl df-pv
aws ec2 describe-volumes --volume-ids $(kubectl get pv -o jsonpath="{.items[0].spec.csi.volumeHandle}") | jq

AWS Volume SnapShots Controller

Volumesnapshots 컨트롤러 설치

# Install Snapshot CRDs
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl get crd | grep snapshot
kubectl api-resources  | grep snapshot

# Install Common Snapshot Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
kubectl get deploy -n kube-system snapshot-controller
kubectl get pod -n kube-system

# Install Snapshotclass
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-ebs-csi-driver/master/examples/kubernetes/snapshot/manifests/classes/snapshotclass.yaml
kubectl get vsclass # 혹은 volumesnapshotclasses
kubectl describe vsclass

사용 실습

# PVC 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: gp3
EOF

## 생성 리소스 확인
$ kubectl get pvc,pv

# 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: ebs-claim
EOF

## 파일 내용 추가 저장 확인
$ kubectl exec app -- tail -f /data/out.txt

Sat Feb 22 15:18:04 UTC 2025
Sat Feb 22 15:18:09 UTC 2025
Sat Feb 22 15:18:14 UTC 2025
Sat Feb 22 15:18:19 UTC 2025
Sat Feb 22 15:18:24 UTC 2025
Sat Feb 22 15:18:29 UTC 2025
Sat Feb 22 15:18:34 UTC 2025
Sat Feb 22 15:18:39 UTC 2025
Sat Feb 22 15:18:44 UTC 2025

## volumeSnapshot 생성
## AWS 콘솔 EBS 스냅샷 생성 확인
cat <<EOF | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: ebs-volume-snapshot
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: ebs-claim
EOF


# VolumeSnapshot 확인
kubectl get volumesnapshot
kubectl get volumesnapshot ebs-volume-snapshot -o jsonpath={.status.boundVolumeSnapshotContentName} ; echo
kubectl describe volumesnapshot.snapshot.storage.k8s.io ebs-volume-snapshot
kubectl get volumesnapshotcontents

# VolumeSnapshot ID 확인 
kubectl get volumesnapshotcontents -o jsonpath='{.items[*].status.snapshotHandle}' ; echo

# AWS EBS 스냅샷 확인
aws ec2 describe-snapshots --owner-ids self | jq
aws ec2 describe-snapshots --owner-ids self --query 'Snapshots[]' --output table

# app & pvc 제거 : 강제로 장애 재현
kubectl delete pod app && kubectl delete pvc ebs-claim

스냅샷 복원 진행

## 리소스 삭제 확인
$ kubectl get pvc,pv

## 스냅샷을 이용하여 PVC 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-snapshot-restored-claim
spec:
  storageClassName: gp3
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  dataSource:
    name: ebs-volume-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
EOF

## 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: ebs-snapshot-restored-claim
EOF

## 데이터 복구 확인
$ kubectl exec app -- cat /data/out.txt

Sat Feb 22 15:19:24 UTC 2025
Sat Feb 22 15:19:29 UTC 2025
Sat Feb 22 15:23:31 UTC 2025
Sat Feb 22 15:23:36 UTC 2025

시간 상 19분 > 23분 사이의 텀이 있지만 삭제되기 이전 데이터가 잘 보존되어 있는 것을 확인할 수 있습니다.

AWS EFS Controller

구성 아키텍처

## EFS 정보 확인
$ aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text

## aws-efs-csi-driver 전체 버전 정보와 기본 설치 버전(True) 정보 확인
$ aws eks describe-addon-versions \
    --addon-name aws-efs-csi-driver \
    --kubernetes-version 1.31 \
    --query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" \
    --output text
    
## ISRA 설정 : 고객관리형 정책 AmazonEKS_EFS_CSI_Driver_Policy 사용
$ eksctl create iamserviceaccount \
  --name efs-csi-controller-sa \
  --namespace kube-system \
  --cluster ${CLUSTER_NAME} \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy \
  --approve \
  --role-only \
  --role-name AmazonEKS_EFS_CSI_DriverRole
  
## EFS CSI driver addon 설치
$ export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
$ eksctl create addon --name aws-efs-csi-driver --cluster ${CLUSTER_NAME} --service-account-role-arn arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKS_EFS_CSI_DriverRole --force

## controller sa 확인
$ kubectl get sa -n kube-system efs-csi-controller-sa -o yaml | head -5

## addon 확인
$ eksctl get addon --cluster ${CLUSTER_NAME}

## 라벨을 통한 오브젝트 확인
$ kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-efs-csi-driver,app.kubernetes.io/instance=aws-efs-csi-driver"
$ kubectl get pod -n kube-system -l app=efs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}' ; echo
$ kubectl get csidrivers efs.csi.aws.com -o yaml

콘솔에서 확인

Pod에 EFS 마운트

## 모니터링
watch 'kubectl get sc efs-sc; echo; kubectl get pv,pvc,pod'

# [운영 서버 EC2]
# 실습 코드 clone
$ git clone https://github.com/kubernetes-sigs/aws-efs-csi-driver.git /root/efs-csi
$ cd /root/efs-csi/examples/kubernetes/multiple_pods/specs && tree

## EFS 스토리지 클래스 생성 확인
$ kubectl apply -f storageclass.yaml
$ kubectl get sc efs-sc

## PV 생성 및 확인
$ EfsFsId=$(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text)

## pv.yaml의 volumeHandle 변경
$ sed -i "s/fs-4af69aab/$EfsFsId/g" pv.yaml
$ cat pv.yaml ## 변경 확인
$ kubectl apply -f pv.yaml

$ kubectl get pv
$ kubectl describe pv

## PVC 생성 및 확인
$ cat claim.yaml
$ kubectl apply -f claim.yaml
$ kubectl get pvc

# 파드 생성 및 연동 : 파드 내에 /data 데이터는 EFS를 사용
# 추후에 파드1,2가 각기 다른 노드에 배포되도록 pod2.yaml 배포 노드 지정

$ cat pod1.yaml pod2.yaml
$ kubectl apply -f pod1.yaml,pod2.yaml

## pod 정보 확인
$ kubectl get pods
$ kubectl exec -it app1 -- sh -c "df -hT -t nfs4"
$ kubectl exec -it app2 -- sh -c "df -hT -t nfs4"

# 공유 저장소 저장 동작 확인
$ tree /mnt/myefs              # 운영서버 EC2 에서 확인
$ tail -f /mnt/myefs/out1.txt  # 운영서버 EC2 에서 확인
$ tail -f /mnt/myefs/out2.txt  # 운영서버 EC2 에서 확인
$ kubectl exec -ti app1 -- tail -f /data/out1.txt
$ kubectl exec -ti app2 -- tail -f /data/out2.txt

EFS를 다수 파드가 사용하도록 설정 : EFS를 사용한 동적 프로비저닝

## 모니터링
$ watch 'kubectl get sc efs-sc; echo; kubectl get pv,pvc,pod'

## 운영서버
## EFS 스토리지 클래스 생성 및 확인
$ curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/examples/kubernetes/dynamic_provisioning/specs/storageclass.yaml
$ cat storageclass.yaml

## EFS ID 수정
$ sed -i "s/fs-92107410/$EfsFsId/g" storageclass.yaml

## 배포
$ kubectl apply -f storageclass.yaml
$ kubectl get sc efs-sc

#=======================================================
## directoryPerms: "700"
# EFS 액세스 포인트에서 생성되는 디렉토리의 권한을 설정
# 
## gidRangeStart: "1000" & gidRangeEnd: "2000"
# 액세스 포인트에 할당될 GID(Group ID) 범위의 시작과 끝을 지정
# 여러 Pod가 같은 EFS를 사용할 때 GID 충돌을 방지하기 위한 설정
#
## basePath: "/dynamic_provisioning"
# EFS 내에서 동적 프로비저닝된 볼륨들이 생성될 기본 경로
# 선택적 파라미터로, 지정하지 않으면 루트(/)에 생성됨
#
## subPathPattern: "${.PVC.namespace}/${.PVC.name}"
# 각 PVC에 대한 하위 경로 패턴을 정의
#
## ensureUniqueDirectory: "true"
# 각 PVC마다 고유한 디렉토리가 생성되도록 보장
# true로 설정하면 동일 경로에 중복 디렉토리 생성 방지
#
## reuseAccessPoint: "false"
# false로 설정하면 매번 새로운 액세스 포인트 생성
# true로 설정하면 이미 존재하는 액세스 포인트 재사용 가능

## PVC 및 파드 생성
$ curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/examples/kubernetes/dynamic_provisioning/specs/pod.yaml
$ cat pod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: efs-app
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: efs-claim

## 생성 확인
$ kubectl get pvc,pv,pod
$ kubectl stern -n kube-system -l app=efs-csi-controller -c csi-provisioner

## 파드에서 디스크 조회
$ kubectl exec -it efs-app -- sh - c "df -hT -t nfs4"

## EFS 내부 파일 조회
$ tree /mnt/myefs
$ kubectl exec efs-app -- bash -c "cat /data/out"

EFS --> Access Point 확인
- EFS Access Point는 EFS의 특정 부분ㅇ르 격리하고, UID/GID를 강제하여 보안성을 높임
- 여러 팀, 여러 어플리케이션이 같은 EFS를 사용할 때 Access Point를 활용하면 보안과 관리가 용이

특징	일반적인 EFS 사용	EFS Access Point 사용
접근 방식	파일 시스템 전체 접근	Access Point 별 특정 디렉터리 접근
보안	IAM으로 파일 시스템 권한 관리	Access Point 별 IAM 권한 관리로 보안 강화
관리	여러 사용자 권한 관리 부담	Access Point 별 권한 관리 용이
사용 사례	단순 환경, 파일 시스템 전체 접근 필요	다중 사용자 환경, 어플리케이션별 접근 제한 필요

EKS Persistent Volumes for Instance Store & Add NodeGroup

신규 노드그룹 ng2 생성 : c5d.large의 ec2 인스턴스 스토어 설정 작업

인스턴스 스토어는 임시 블록 스토리지를 의미하며, host에 물리적으로 연결된 디스크에서 제공한다.

임시 스토리지인 만큼 인스턴스 중지, 종료 시 삭제되기 때문에 tmp 성 데이터나, 가변성이 크거나 휘발성 데이터 저장에 적합하다.

인스턴스 스토어는 EC2 스토리지 정보에 출력되지 않습니다.

## 인스턴스 스토어 볼륨이 있는 c5 모든 타입의 스토리지 크기
aws ec2 describe-instance-types \
 --filters "Name=instance-type,Values=c5*" "Name=instance-storage-supported,Values=true" \
 --query "InstanceTypes[].[InstanceType, InstanceStorageInfo.TotalSizeInGB]" \
 --output table
 
 |  DescribeInstanceTypes |
+---------------+--------+
|  c5d.12xlarge |  1800  |
|  c5d.large    |  50    |
|  c5d.24xlarge |  3600  |
|  c5d.2xlarge  |  200   |
|  c5d.4xlarge  |  400   |
|  c5d.18xlarge |  1800  |
|  c5d.metal    |  3600  |
|  c5d.xlarge   |  100   |
|  c5d.9xlarge  |  900   |
+---------------+--------+

## 신규 노드 그룹 생성 전 정보 확인
$ eksctl create nodegroup -c $CLUSTER_NAME -r ap-northeast-2 --subnet-ids "$PubSubnet1","$PubSubnet2","$PubSubnet3" --ssh-access \
  -n ng2 -t c5d.large -N 1 -m 1 -M 1 --node-volume-size=30 --node-labels disk=instancestore --max-pods-per-node 100 --dry-run > myng2.yaml

$ cat <<EOT > nvme.yaml
  preBootstrapCommands:
    - |
      # Install Tools
      yum install nvme-cli links tree jq tcpdump sysstat -y

      # Filesystem & Mount
      mkfs -t xfs /dev/nvme1n1
      mkdir /data
      mount /dev/nvme1n1 /data

      # Get disk UUID
      uuid=\$(blkid -o value -s UUID mount /dev/nvme1n1 /data) 

      # Mount the disk during a reboot
      echo /dev/nvme1n1 /data xfs defaults,noatime 0 2 >> /etc/fstab
EOT

신규 노드 그룹 생성

cat << EOF > myng2.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: myeks
  region: ap-northeast-2
  version: "1.31"

managedNodeGroups:
- amiFamily: AmazonLinux2
  desiredCapacity: 1
  instanceType: c5d.large
  labels:
    alpha.eksctl.io/cluster-name: myeks
    alpha.eksctl.io/nodegroup-name: ng2
    disk: instancestore
  maxPodsPerNode: 110
  maxSize: 1
  minSize: 1
  name: ng2
  ssh:
    allow: true
    publicKeyName: $SSHKEYNAME
  subnets:
  - $PubSubnet1
  - $PubSubnet2
  - $PubSubnet3
  tags:
    alpha.eksctl.io/nodegroup-name: ng2
    alpha.eksctl.io/nodegroup-type: managed
  volumeIOPS: 3000
  volumeSize: 30
  volumeThroughput: 125
  volumeType: gp3
  preBootstrapCommands:
    - |
      # Install Tools
      yum install nvme-cli links tree jq tcpdump sysstat -y

      # Filesystem & Mount
      mkfs -t xfs /dev/nvme1n1
      mkdir /data
      mount /dev/nvme1n1 /data

      # Get disk UUID
      uuid=\$(blkid -o value -s UUID mount /dev/nvme1n1 /data) 

      # Mount the disk during a reboot
      echo /dev/nvme1n1 /data xfs defaults,noatime 0 2 >> /etc/fstab
EOF

## 노드 그룹 배포
$ eksctl create nodegroup -f myng2.yaml

## 확인
$ kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
$ kubectl get node -l disk=instancestore

### ng2 노드 그룹 *ng2-remoteAccess* 포함된 보안그룹 ID
$ aws ec2 describe-security-groups --filters "Name=group-name,Values=*ng2-remoteAccess*" | jq

## SG ID 변수 지정
$ export NG2SGID=$(aws ec2 describe-security-groups --filters "Name=group-name,Values=*ng2-remoteAccess*" --query 'SecurityGroups[*].GroupId' --output text)

## 방화벽 정책 추가
$ aws ec2 authorize-security-group-ingress --group-id $NG2SGID --protocol '-1' --cidr $(curl -s ipinfo.io/ip)/32
$ aws ec2 authorize-security-group-ingress --group-id $NG2SGID --protocol '-1' --cidr 172.20.1.100/32

## 생성된 노드 확인
$ N4={신규 노드 공인 IP}

## 접근 확인
$ ssh ec2-user@$N4 hostname
$ ssh ec2-user@$N4 sudo nvme list
$ ssh ec2-user@$N4 sudo lsblk -e 7 -d
$ ssh ec2-user@$N4 df -hT -t xfs
$ ssh ec2-user@$N4 sudo tree /data
$ ssh ec2-user@$N4 sudo cat /etc/fstab

ocal-path 스토리지 클래스 재생성

## 기존 local-path 스토리지 클래스 삭제
$ kubectl delete -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml

## 경로 수정
$ curl -sL https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml | sed 's/opt/data/g' | kubectl apply -f -
$ kubectl describe cm -n local-path-storage local-path-config

- Read 테스트

## 모니터링
$ watch 'kubectl get pod -owide;echo;kubectl get pv,pvc'
$ ssh ec2-user@$N4 iostat -xmdz 1 -p nvme1n1

## 운영서버에서 kubestr 테스트
$ kubestr fio -f fio-read.fio -s local-path --size 10G --nodeselector disk=instancestore
PVC created kubestr-fio-pvc-zs5cn
Pod created kubestr-fio-pod-mddd5
Running FIO test (fio-read.fio) on StorageClass (local-path) with a PVC of Size (10G)
Elapsed time- 3m42.693794049s
FIO test results:

FIO version - fio-3.36
Global options - ioengine=libaio verify= direct=1 gtod_reduce=

JobName:
  blocksize= filesize= iodepth= rw=
read:
  IOPS=20309.117188 BW(KiB/s)=81236
  iops: min=16560 max=93780 avg=20318.183594
  bw(KiB/s): min=66240 max=375122 avg=81272.710938

Disk stats (read/write):
  nvme1n1: ios=2433403/10 merge=0/3 ticks=7644757/29 in_queue=7644786, util=99.954117%
  -  OK

노드 그룹

운영 서버 docker buildx 활성화

cross-platform 빌드

## 운영서버 아키텍처 확인
$ arch
x86_64

## 타 아키텍처 컨테이너 실행 시도
$ docker run --rm -it riscv64/ubuntu bash
$ docker run --rm -it arm64v8/ubuntu bash

## 빌드 가능한 리스트 확인
$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS  BUILDKIT PLATFORMS
default * docker
  default default         running v0.12.5  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386

## buildx 활성화
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
$ docker images

$ docker buildx create --use --name mybuilder
$ docker buildx ls

## buildx 동작 확인
$ docker buildx inspect --bootstrap
$ docker buildx ls

컨테이너 이미지 빌드 및 실행

## 폴더 생성
$ mkdir myweb && cd myweb

## 테스트 코드 생성
cat > server.py <<EOF
from http.server import ThreadingHTTPServer, BaseHTTPRequestHandler
from datetime import datetime
import socket

class RequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/plain')
        self.end_headers()
        
        now = datetime.now()
        hostname = socket.gethostname()
        response_string = now.strftime("The time is %-I:%M:%S %p, VERSION 0.0.1\n")
        response_string += f"Server hostname: {hostname}\n"
        self.wfile.write(bytes(response_string, "utf-8")) 

def startServer():
    try:
        server = ThreadingHTTPServer(('', 80), RequestHandler)
        print("Listening on " + ":".join(map(str, server.server_address)))
        server.serve_forever()
    except KeyboardInterrupt:
        server.shutdown()

if __name__ == "__main__":
    startServer()
EOF

## 도커 파일 생성
cat > Dockerfile <<EOF
FROM python:3.12
ENV PYTHONUNBUFFERED 1
COPY . /app
WORKDIR /app 
CMD python3 server.py
EOF

## docker 

## 빌드, 실행 후 삭제
$ docker pull python:3.12
$ docker build -t myweb:1 -t myweb:latest .
$ docker images
$ docker run -d -p 8080:80 --name=timeserver myweb
$ curl http://localhost:8080
$ docker rm -f timeserver

## 멀티 플랫폼 빌드 후 푸시
$ docker images

## docker hub 로그인
$ docker login

## 멀티플랫폼 이미지 빌드
$ docker buildx build --platform linux/amd64,linux/arm64 --push --tag {dockerhubname}/myweb:multi .
$ docker images

## 이미지 매니페스트 확인
$ docker manifest inspect {dockerhubname}/myweb:multi | jq
$ docker buildx imagestools inspect {dockerhubname}/myweb:multi

## 컨테이너 실행
$ docker ps
$ docker run -d -p 8080:80 --name=timeserver {dockerhubname}/myweb:multi
$ docker ps

## 컨테이너 웹 접속 확인
$ curl http:localhost:8080
$ docker logs timeserver

## 컨테이너 내부 파일 확인
$ docker exec -it timeserver ls -l

## 컨테이너 내부 파일 내용 확인
$ docker exec -it timeserver cat server.py

AWS ECR Priv 저장소 사용하기

# 계정 ID 변수 지정
$ export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)

## ECR 로그인
$ aws ecr get-login-password \
--region ap-northeast-2 | docker login \
--username AWS \
--password-stdin ${ACCOUNT_ID}.dkr.ecr.ap-northeast-2.amazonaws.com

## config 정보 확인
$ cat /root/.docker/config.json | jq

## Priv ECR Repo 생성
$ aws ecr create-repository --repository-name myweb

## 생성한 레포에 이미지 푸시
$ docker buildx build --platform linux/amd64,linux/arm64 --push --tag ${ACCOUNT_ID}.dkr.ecr.ap-northeast-2.amazonaws.com/myweb:multi .

## Push 이미지로 컨테이너 실행
$ docker run -d -p 8080:80 --name=timeserver ${ACCOUNT_ID}.dkr.ecr.ap-northeast-2.amazonaws.com/myweb:multi
$ docker ps
$ curl http://localhost:8080

ARM 노드 그룹

aws graviton : 64bit arm 프로세서 코어 기반의 aws 커스텀 반도체

# 노드 아키텍쳐 라벨 확인
$ kubectl get nodes -L kubernetes.io/arch

## 신규 노드 그룹 생성
$ eksctl create nodegroup --help
$ eksctl create nodegroup -c $CLUSTER_NAME -r ap-northeast-2 --subnet-ids "$PubSubnet1","$PubSubnet2","$PubSubnet3" \
  -n ng3 -t t4g.medium -N 1 -m 1 -M 1 --node-volume-size=30 --node-labels family=graviton --dry-run > myng3.yaml

$ cat myng3.yaml
$ eksctl create nodegroup -f myng3.yaml

## 신규 노드 생성 확인
$ kubectl get nodes --label-columns eks.amazonaws.com/nodegroup,kubernetes.io/arch,eks.amazonaws.com/capacityType
$ kubectl describe nodes --selector family=graviton
$ aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng3 | jq .nodegroup.taints

# taints 셋팅 -> 적용에 2~3분 정도 시간 소요
$ aws eks update-nodegroup-config --cluster-name $CLUSTER_NAME --nodegroup-name ng3 --taints "addOrUpdateTaints=[{key=frontend, value=true, effect=NO_EXECUTE}]"

# 확인
$ kubectl describe nodes --selector family=graviton | grep Taints
$ aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng3 | jq .nodegroup.taints

busybox pod 실행하기

## busybox 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: busybox
    image: busybox
    command:
    - "/bin/sh"
    - "-c"
    - "while true; do date >> /home/pod-out.txt; cd /home; sync; sync; sleep 10; done"
  tolerations:
    - effect: NoExecute
      key: frontend
      operator: Exists
  nodeSelector:
    family: graviton
EOF

## 파드가 배포된 노드 정보 확인
$ kubectl get pod -owide
$ kubectl describe pod busybox
$ kubectl exec -it busybox -- arch
$ kubectl exec -it busybox -- tail -f /home/pod-out.txt

# 삭제
kubectl delete pod busybox

운영서버에서 빌드한 이미지 배포하기

## timeserver pod 배포
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: myweb-arm
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: myweb
    image: {dockerhubname}/myweb:multi
  tolerations:
    - effect: NoExecute
      key: frontend
      operator: Exists
  nodeSelector:
    family: graviton
---
apiVersion: v1
kind: Pod
metadata:
  name: myweb-amd
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: myweb
    image: {dockerhubname}/myweb:multi
EOF

## 생성한 파드 확인
kubectl get pod -owide
kubectl exec -it myweb-arm -- arch
kubectl exec -it myweb-amd -- arch

kubectl exec -it myweb-arm -- curl localhost
kubectl exec -it myweb-amd -- curl localhost

# 삭제
kubectl delete pod myweb-arm myweb-amd

Spot 노드 그룹

AWS 고객이 EC2 여유 용량 풀을 활용하여 싼 가격에 인스턴스를 사용할 수 있습니다.
가용량이 부족할 경우 spot 인스턴스를 회수 합니다 2분전 알림
kubernetes 워커 노드로 spot instances는 비저장 api endpoint, 일괄 처리, ml 워크로드, apache spark등 워크로드에서 많이 사용합니다.

## ec2-instance-selector 설치
$ curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v2.4.1/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector

$ mv ec2-instance-selector /usr/local/bin/
$ ec2-instance-selector --version

## 인스턴스 spec 필터링
$ ec2-instance-selector --vcpus 2 --memory 4 --gpus 0 --current-generation -a x86_64 --deny-list 't.*' --output table-wide

## 노드 capacity 타입 확인
$ kubectl get nodes -l eks.amazonaws.com/capacityType=ON_DEMAND
$ kubectl get nodes -L eks.amazonaws.com/capacityType

## 노드 그룹 생성
$ NODEROLEARN=$(aws iam list-roles --query "Roles[?contains(RoleName, 'nodegroup-ng1')].Arn" --output text)
$ echo $NODEROLEARN

$ aws eks create-nodegroup \
  --cluster-name $CLUSTER_NAME \
  --nodegroup-name managed-spot \
  --subnets $PubSubnet1 $PubSubnet2 $PubSubnet3 \
  --node-role $NODEROLEARN \
  --instance-types c5.large c5d.large c5a.large \
  --capacity-type SPOT \
  --scaling-config minSize=2,maxSize=3,desiredSize=2 \
  --disk-size 20
  
## wait 옵션을 통해 클러스터에 조인되는 것을 지연 시킬 수 있습니다. 
$ aws eks wait nodegroup-active --cluster-name $CLUSTER_NAME --nodegroup-name managed-spot

## Spot node에 파드 배포
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: busybox
    image: busybox
    command:
    - "/bin/sh"
    - "-c"
    - "while true; do date >> /home/pod-out.txt; cd /home; sync; sync; sleep 10; done"
  nodeSelector:
    eks.amazonaws.com/capacityType: SPOT
EOF

## 배포된 파드의 노드 확인
$ kubectl get pod -o wide

Bottlerocket AMI

Bottlerocket은 AWS에서 직접 개발한 리눅스 OS입니다.

컨테이너 실행을 위한 Linux 기반의 운영 체제이며, 보안과 성능을 중점으로 설계되었습니다.

1. 컨테이너 전용 OS

일반적인 리눅스 배포판과 달리 컨테이너 실행만을 위한 OS로 설계됨
기본적으로 Docker와 Kubernetes (containerd) 실행 지원

2. 보안 강화

불필요한 패키지 제거 → 공격 표면 축소
읽기 전용(rootfs) → 시스템 파일 변경 방지
자동 업데이트 지원 → OTA 업데이트 방식으로 커널 및 패키지 업데이트

3. 빠른 부팅 & 경량화

일반 리눅스 대비 부팅 속도가 빠름
필요 최소한의 패키지만 포함하여 경량화됨

4. AWS 서비스와 최적화

Amazon EKS (Kubernetes) 및 Amazon ECS (Docker)와 긴밀하게 통합
AWS Systems Manager(SSM)으로 원격 관리 가능

노드 그룹 생성 및 노드 접속

cat << EOF > ng-br.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: myeks
  region: ap-northeast-2
  version: "1.31"

managedNodeGroups:
- name: ng-bottlerocket
  instanceType: m5.large
  amiFamily: Bottlerocket
  bottlerocket:
    enableAdminContainer: true
    settings:
      motd: "Hello, eksctl!"
  desiredCapacity: 1
  maxSize: 1
  minSize: 1
  labels:
    alpha.eksctl.io/cluster-name: myeks
    alpha.eksctl.io/nodegroup-name: ng-bottlerocket
    ami: bottlerocket
  subnets:
  - $PubSubnet1
  - $PubSubnet2
  - $PubSubnet3
  tags:
    alpha.eksctl.io/nodegroup-name: ng-bottlerocket
    alpha.eksctl.io/nodegroup-type: managed

- name: ng-bottlerocket-ssh
  instanceType: m5.large
  amiFamily: Bottlerocket
  desiredCapacity: 1
  maxSize: 1
  minSize: 1
  ssh:
    allow: true
    publicKeyName: $SSHKEYNAME
  labels:
    alpha.eksctl.io/cluster-name: myeks
    alpha.eksctl.io/nodegroup-name: ng-bottlerocket-ssh
    ami: bottlerocket
  subnets:
  - $PubSubnet1
  - $PubSubnet2
  - $PubSubnet3
  tags:
    alpha.eksctl.io/nodegroup-name: ng-bottlerocket-ssh
    alpha.eksctl.io/nodegroup-type: managed
EOF

## 노드 생성
$ eksctl create nodegroup -f ng-br.yaml

## 노드의 OS와 CRI 정보 확인
$ kubectl get node --label-columns=alpha.eksctl.io/nodegroup-name,ami,node.kubernetes.io/instance-type
$ kubectl get node -owide

## 인스턴스 IP 확인
$ aws ec2 describe-instances --query "Reservations[*].Instances[*].{InstanceID:InstanceId, PublicIPAdd:PublicIpAddress, PrivateIPAdd:PrivateIpAddress, InstanceName:Tags[?Key=='Name']|[0].Value, Status:State.Name}" --filters Name=instance-state-name,Values=running --output table

728x90

'Cloud > AWS' 카테고리의 다른 글

[AWS] EKS AutoScaling - HPA (0)	2025.03.08
[AWS] EKS Observability (1)	2025.03.02
[AWS] EKS Networkings (2) (0)	2025.02.16
[AWS] EKS Networkings (1) (0)	2025.02.15
[AWS] EKS 설치 및 기본 사용 (3) (0)	2025.02.08

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[AWS] EKS Storage

실습환경 배포

스토리지 이해

파드 기본 및 empty 저장소 동작 확인

HostPath를 사용하는 PV/PVC: local-path-provisioner 스토리지 클래스

Kubestr 모니터링 및 성능 측정 확인

AWS EBS Controller

AWS Volume SnapShots Controller

AWS EFS Controller

EKS Persistent Volumes for Instance Store & Add NodeGroup

노드 그룹

ARM 노드 그룹

Spot 노드 그룹

Bottlerocket AMI

'Cloud > AWS' 카테고리의 다른 글

실습환경 배포

스토리지 이해

파드 기본 및 empty 저장소 동작 확인

HostPath를 사용하는 PV/PVC: local-path-provisioner 스토리지 클래스

Kubestr 모니터링 및 성능 측정 확인

AWS EBS Controller

AWS Volume SnapShots Controller

AWS EFS Controller

EKS Persistent Volumes for Instance Store & Add NodeGroup

노드 그룹

ARM 노드 그룹

Spot 노드 그룹

Bottlerocket AMI

'Cloud > AWS' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

ARM 노드 그룹

Spot 노드 그룹

Bottlerocket AMI

'Cloud > AWS' 카테고리의 다른 글

ARM 노드 그룹

Spot 노드 그룹

Bottlerocket AMI

'Cloud > AWS' 카테고리의 다른 글

개인정보

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역