cloudwatch-exporter 설치 방법 참고
2022.07.11 - [Monitoring/Prometheus] - [Prometheus](AWS)Install Cloudwatch-exporter with helm chart
cloudwatch exporter github
https://github.com/prometheus/cloudwatch_exporter
Concept
cloudwatch-exporter를 사용하면 AWS에서 사용중인 리소스를 모니터링할 수 있다.
다만, Cloudwatch 콘솔과 달리 제공해주는 metrics 값들이 모두 모니터링 가능한 것은 아니라 필요한 데이터 수집에 대한 테스트가 필요하다.
수집 가능한 metrics 확인 하는 방법
AWS CLI로 metrics에 해당하는 값 확인
e.g. ELB의 헬스체크를 하고 싶은경우
- aws cloudwatch list-metrics 명령어로 원하는 모니터링 대상 확인: Namespace: AWS/ELB
- aws cloudwatch list-metrics --namespace AWS/ELB 명령어로 원하는 metric-name 확인: MetricName: UnHealthyHostCount
- 모니터링 데이터 수집을 위한 Dimensions값 정의(label에 해당): LoadBalancerName, AvailabilityZone
- ❗ AWS CLI에서 확인한 Dimensions 값을 Cloudwatch Exporter에서는 사용 못하는 경우가 많음
아래에서 원하는 데이터를 확인하고
dewble@MZC01-JMHAN:~/workspace/git/git_mzc/gcp/cli$ aws cloudwatch list-metrics --namespace AWS/ELB --metric-name UnHealthyHostCount
{
"Metrics": [
{
"Namespace": "AWS/ELB",
"MetricName": "UnHealthyHostCount",
"Dimensions": [
{
"Name": "LoadBalancerName",
"Value": "a8505717b94524f24912e0c0eff64d1e"
},
{
"Name": "AvailabilityZone",
"Value": "ap-northeast-2b"
}
]
},
## 중략
다음과 같이 metric값에 추가하여 모니터링 데이터를 수집할 수 있다.
region: ap-northeast-2
period_seconds: 60
metrics:
- aws_namespace: AWS/ELB
aws_metric_name: UnHealthyHostCount
aws_dimensions:
- LoadBalancerName
- AvailabilityZone
aws_statistics:
- Average
Option
아래는 cloudwatch-exporter helm chart의 metrics에서 사용하는 option에 대한 설명이다.
- aws_namespace: Cloudwatch가 수집하는 서비스명, (e.g. AWS/EC2)
- aws_metric_name: 서비스의 모니터링할 지표, (e.g. CPUUtilization)
- aws_dimensions: Metric 값의 특징을 설명(이후 prometheus에서 label에 해당)
- aws_statistics: Optional. A tag configuration to filter on, based on mapping from the tagged resource ID to a CloudWatch dimension. (e.g. Average, Maximum, Minimum, Sum)
- aws_tag_select: 동일한 aws_namespace가 있다면 하나만 작성해줘도 된다.
- tag_selections:
- 지정된 키와 지정된 값이 포함된 태그가 있는 리소스로 출력을 제한
- Optional, under aws_tag_select. Specify a map from a tag key to a list of tag values to apply tag filtering on resources from which metrics will be gathered.
- resource_type_selection:
- 리소스 속성 유형 참조 에서 리소스 유형 참조, resource_id_dimension 와 함께 사용된다.
- Required, under aws_tag_select . Specify the resource type to filter on. resource_type_selection should be comprised as service:resource_type , as per the resource group tagging API.
- resource_id_dimension: Required, under aws_tag_select. For the current metric, specify which CloudWatch dimension maps to the ARN resource ID.
- 리소스 속성 유형 참조 에서 속성 참조
- arn:partition:service:region:account-id:resource-type:resource-id
- arn:aws:iam::aws:policy/AmazonEC2FullAccess
- tag_selections:
- aws_tag_select: tag_selections: Monitoring: ["enabled"] resource_type_selection: "elasticloadbalancing:loadbalancer" resource_id_dimension: LoadBalancerName
- aws_dimension_select: Optional. Which dimension values to filter. Specify a map from the dimension name to a list of values to select from that dimension. (e.g.
aws_dimension_select:
LoadBalancerName: [myLB]
- aws_dimension_select_regex: Optional. Which dimension values to filter on with a regular expression. Specify a map from the dimension name to a list of regexes that will be applied to select from that dimension.
metrics 추가 - Example
- aws_namespace
- aws_metric_name
- aws_dimensions
- aws_metric_name
- AWS/EC2
- CPUUtilization
- InstanceId
- CPUUtilization
- CWAgent
- mem_used_percent
- InstanceId
- disk_used_percent
- InstanceId
- mem_used_percent
- AWS/ELB
- UnHealthyHostCount
- LoadBalancerName
- AvailabilityZone
- UnHealthyHostCount
- ContainerInsights
- pod_cpu_utilization
- ClusterName
- Service
- Namespace
- pod_memory_utilization
- ClusterName
- Service
- Namespace
- node_cpu_utilization
- ClusterName
- InstanceId
- NodeName
- node_memory_utilization
- ClusterName
- InstanceId
- NodeName
- node_filesystem_utilization
- ClusterName
- InstanceId
- NodeName
- pod_cpu_utilization
config: |-
# This is the default configuration for prometheus-cloudwatch-exporter
region: ap-northeast-2
period_seconds: 60
delay_seconds: 60
metrics:
- aws_namespace: AWS/EC2
aws_metric_name: CPUUtilization
aws_dimensions:
- InstanceId
aws_statistics:
- Average
aws_tag_select:
resource_type_selection: ec2:instance
resource_id_dimension: InstanceId
- aws_namespace: CWAgent
aws_metric_name: mem_used_percent
aws_dimensions:
- InstanceId
aws_statistics:
- Average
- aws_namespace: CWAgent
aws_metric_name: disk_used_percent
aws_dimensions:
- InstanceId
aws_statistics:
- Average
- aws_namespace: AWS/ELB
aws_metric_name: UnHealthyHostCount
aws_dimensions:
- LoadBalancerName
- AvailabilityZone
aws_statistics:
- Average
- aws_namespace: AWS/ELB
aws_metric_name: SurgeQueueLength
aws_dimensions:
- LoadBalancerName
- AvailabilityZone
aws_statistics:
- Average
- aws_namespace: ContainerInsights
aws_metric_name: pod_cpu_utilization
aws_dimensions:
- ClusterName
- Service
- Namespace
aws_statistics:
- Average
- aws_namespace: ContainerInsights
aws_metric_name: pod_memory_utilization
aws_dimensions:
- ClusterName
- Service
- Namespace
aws_statistics:
- Average
- aws_namespace: ContainerInsights
aws_metric_name: node_cpu_utilization
aws_dimensions:
- InstanceId
- NodeName
- ClusterName
aws_statistics:
- Average
- aws_namespace: ContainerInsights
aws_metric_name: node_memory_utilization
aws_dimensions:
- InstanceId
- NodeName
- ClusterName
aws_statistics:
- Average
- aws_namespace: ContainerInsights
aws_metric_name: node_filesystem_utilization
aws_dimensions:
- InstanceId
- NodeName
- ClusterName
aws_statistics:
- Average
delay_seconds: cloudwatch는 수집된 데이터의 평가를 위해 몇 분정도 시간이 걸리고 완전히 수렴되지 않은 데이터를 수집하지 않도록 해당 옵션이 default 600s로 설정되어 있다. 이때문에 실제 cloudwatch와 prometheus 데이터 사이에 시간차이가 발생하게 된다. 여기서는 60s로 설정하여 사용하였다.
결과 확인
prometheus 또는 stackdriver-exporter에 들어가서 metric에 등록한 값을 확인한다.