클러스터 초기화 대기 시간이 초과되었습니다. 노드 자동 업그레이드가 실패하거나 오류와 함께 실행됩니다. (Timed out waiting for cluster initialization, Auto upgrade of nodes fail / or Run with error)

문제 설명

내 GCP 프로젝트의 각 클러스터 노드 풀에 3개의 노드가 있는 클러스터가 거의 없으며 자동 업그레이드 및 복구가 사용 설정되어 있습니다.

자동 업그레이드는 약 3일 전에 시작되었으며 GKE 버전: 1.12.10‑gke.17에 대해 여전히 실행 중입니다.

이제 내 클러스터가 자동 업그레이드를 선택했으므로 업그레이드 및 자동 복구, 문제 없이 업그레이드되는 클러스터는 거의 없고 문제가 있는 업데이트/업그레이드를 실행 중인 클러스터는 거의 없습니다.

첫 번째 클러스터에서 내 포드 중 몇 개는 예약할 수 없으며 GCP에서 제안하는 가능한 작업은

자동 크기 조정이 비활성화된 하나 이상의 노드 풀에서 자동 크기 조정을 활성화합니다.
하나 이상의 노드 풀 크기를 수동으로 늘립니다.

"gcloud container clusters describe"를 실행할 때 clustername" "zone" "

클러스터의 세부 정보를 얻습니다. 그러나 노드 풀 섹션 아래

 status: RUNNING_WITH_ERROR
  statusMessage: 'asia‑south1‑a: Timed out waiting for cluster initialization; cluster
    API may not be available: k8sclient: 7 ‑ 404 status code returned. Requested resource
    not found.'
  version: 1.12.10‑gke.17

참고:

또한 GCP가

하나 이상의 노드 풀에서 자동 크기 조정이 비활성화되어 있습니다.
하나 이상의 노드 풀을 수동으로 축소합니다.

리소스 요청이 적기 때문입니다.

이 문제를 해결하기 위해 제공할 수 있는 다른 로그를 알려주십시오.

업데이트:

우리는 이 로그를 살펴보았고 Google 지원팀은 kubelet이 인증서 서명 요청(CSR) 제출에 실패했거나 오래된 유효하지 않은 자격 증명이 있을 수 있다고 생각합니다. 문제 해결에 도움이 되도록 다음 질문에 답해 주십시오.

sudo journalctl ‑u kubelet > kubelet.log
sudo journalctl ‑u kube‑node‑installation > kube‑ node‑installation.log
sudo journalctl ‑u kube‑node‑configuration > kube‑node‑configuration.log
sudo journalctl ‑u 노드 문제 감지기 > 노드 문제‑ detector.log
sudo journalctl ‑u docker > docker.log
sudo journalctl ‑u cloud‑init > cloud‑init.log

1.13.12‑gke 실행을 시작하는 모든 노드. 13 마스터에 연결하지 못했습니다. 노드에 발생하는 다른 모든 것(예: 레크리에이션)은 복구 루프에서 노드를 수정하려고 하고 추가 문제를 일으키지 않는 것 같습니다.

참조 솔루션

방법 1:

This isn't exactly a solution but a working fix. We were able to do narrow down to this.

On the nodepools we had the labels "node‑restriction" to what type of nodes should it be.

Google Support has also suggested that currently it is not possible to update the labels of an existing node‑pool when it has begun an upgrade hence they suggested creating a new node‑pool without any of these labels. In case if were able to deploy the node‑pool successfully, we had to think of migrating our workloads to this newly created node‑pool.

so we removed those two node selector labels and created a new nodepool. to our surprise it worked. We had to migrate the whole workload though.

we followed this Cloud Migration

(by Chronograph3r、Chronograph3r)

참조 문서

Timed out waiting for cluster initialization, Auto upgrade of nodes fail / or Run with error (CC BY‑SA 2.5/3.0/4.0)

클러스터 초기화 대기 시간이 초과되었습니다. 노드 자동 업그레이드가 실패하거나 오류와 함께 실행됩니다. (Timed out waiting for cluster initialization, Auto upgrade of nodes fail / or Run with error)

문제 설명

참조 솔루션

방법 1:

참조 문서

관련 질문

코멘트

클러스터 초기화 대기 시간이 초과되었습니다. 노드 자동 업그레이드가 실패하거나 오류와 함께 실행됩니다. (Timed out waiting for cluster initialization, Auto upgrade of nodes fail / or Run with error)

문제 설명

참조 솔루션

방법 1:

참조 문서

관련 질문

Google API용 Android 키 사용 제한 (Restricting usage for an Android key for a Google API)

GCS 버킷에서 큰 폴더를 삭제하는 빠른 방법 (Fast way to delete big folder on GCS bucket)

Terraform 코드와 충돌하는 "소유자"에 의한 GCP 콘솔/클라우드 셸 변경을 어떻게 방지할 수 있습니까? (How can you prevent GCP console/cloud shell changes by "Owners" conflicting with the terraform code?)

서비스 계정으로 인증할 때 project_id를 명시적으로 설정하는 것을 피할 수 있습니까? (Is it possible to avoid setting project_id explicitly when authing with service account?)

Wordpress가 새 PHP 버전을 감지하지 못합니다 (Wordpress doesn't detect new php version)

Google Cloud 자연어 가져오기 오류 (Google Cloud Natural Language Import Error)

gcloud는 전체 컨테이너를 다시 빌드하지만 Dockerfile은 동일하고 스크립트만 변경됨 (gcloud rebuilds complete container but Dockerfile is the same, only the script has changed)

GCP를 통한 Kubernetes 대시보드 (Kubernetes dashboard via GCP)

Firebase 실시간 데이터베이스 키 없이 하위 레코드를 얻는 방법 (Firebase realtime database how to get a child record without the key)

Google Cloud Function / MongoDB VM 인스턴스 통신 (Google Cloud Function / MongoDB VM instance communication)

Google Artifact Registry에서 도커 풀 수를 추적할 수 있습니까? (Is it possible track the number of docker pulls in Google Artifact Registry?)

코멘트