最近在新环境中部署了一个服务,其暴露的指标路径为:10299/metrics,配置文件如下(名称字段有修改):
apiVersion: v1items:- apiVersion: operator.victoriametrics.com/v1beta1 kind: VMServiceScrape metadata: labels: app_id: audit name: audit namespace: default spec: endpoints: - path: /metrics targetPort: 10299 namespaceSelector: matchNames: - default selector: matchLabels: app_id: audit但在vmagent上查看其状态如下,vmagent无法发现该target:

${podIp}:10299/metrics访问到指标selector字段能够正确匹配到对应的资源reload,查看vmagent的日志是否有相关错误提示经过排查发现上述方式均无法解决问题,更奇怪的是在vmagent的api/v1/targets中无法找到该target,说明vmagent压根没有发现该服务,即vmservicescrape配置没有生效。在vmagent中查看上述vmservicescrape生成的配置文件如下(其拼接了静态配置),可以看到它使用了kubernetes_sd_configs的方式来发现target:
- job_name: serviceScrape/default/audit/0 metrics_path: /metrics relabel_configs: - source_labels: [__meta_kubernetes_service_label_app_id] regex: audit action: keep - source_labels: [__meta_kubernetes_pod_container_port_number] regex: "10299" action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; target_label: node regex: Node;(.*) replacement: ${1} - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; target_label: pod regex: Pod;(.*) replacement: ${1} - source_labels: [__meta_kubernetes_pod_name] target_label: pod - source_labels: [__meta_kubernetes_pod_container_name] target_label: container - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_service_name] target_label: service - source_labels: [__meta_kubernetes_service_name] target_label: job replacement: ${1} - target_label: endpoint replacement: "8080" kubernetes_sd_configs: - role: endpoints namespaces: own_namespace: false names: - default既然配置没有问题,那只能通过victoriametrics的kubernetes_sd_configs的运作方式看下到底是哪里出问题了。在victoriametrics的源码可以看到其拼接的target url如下:
scrapeURL := fmt.Sprintf("%s://%s%s%s%s", schemeRelabeled, addressRelabeled, metricsPathRelabeled, optionalQuestion, paramsStr)其中:
metrics_path字段最主要的字段就是addressRelabeled,它来自一个名为"__address__"的标签
func mergeLabels(swc *scrapeWorkConfig, target string, extraLabels, metaLabels map[string]string) []prompbmarshal.Label { ... m["job"] = swc.jobName m["__address__"] = target m["__scheme__"] = swc.scheme m["__metrics_path__"] = swc.metricsPath m["__scrape_interval__"] = swc.scrapeInterval.String() m["__scrape_timeout__"] = swc.scrapeTimeout.String() ...}继续跟踪代码,可以看到该标签是通过sc.KubernetesSDConfigs[i].MustStart获取到的,从KubernetesSDConfigs的名称上看,它就是负责处理kubernetes_sd_configs机制的:
func (sc *ScrapeConfig) mustStart(baseDir string) { swosFunc := func(metaLabels map[string]string) interface{} { target := metaLabels["__address__"] sw, err := sc.swc.getScrapeWork(target, nil, metaLabels) if err != nil { logger.Errorf("cannot create kubernetes_sd_config target %q for job_name %q: %s", target, sc.swc.jobName, err) return nil } return sw } for i := range sc.KubernetesSDConfigs { sc.KubernetesSDConfigs[i].MustStart(baseDir, swosFunc) }}继续往下看,看看这个"__address__"字段到底是什么,函数调用如下:
MustStart --> cfg.aw.mustStart --> aw.gw.startWatchersForRole --> uw.reloadScrapeWorksForAPIWatchersLocked --> o.getTargetLabels
最后一个函数getTargetLabels是个接口方法:
type object interface { key() string // getTargetLabels must be called under gw.mu lock. getTargetLabels(gw *groupWatcher) []map[string]string}getTargetLabels的实现如下,这就是kubernetes_sd_configs的各个role的具体实现。上述服务用到的是kubernetes_sd_configsrole为endpoints:

其实现如下:
func (eps *Endpoints) getTargetLabels(gw *groupWatcher) []map[string]string { var svc *Service if o := gw.getObjectByRoleLocked("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil { svc = o.(*Service) } podPortsSeen := make(map[*Pod][]int) var ms []map[string]string for _, ess := range eps.Subsets { for _, epp := range ess.Ports { ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.Addresses, epp, svc, "true") ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false") } } // See https://kubernetes.io/docs/reference/labels-annotations-taints/#endpoints-kubernetes-io-over-capacity // and https://github.com/kubernetes/kubernetes/pull/99975 switch eps.Metadata.Annotations.GetByName("endpoints.kubernetes.io/over-capacity") { case "truncated": logger.Warnf(`the number of targets for "role: endpoints" %q exceeds 1000 and has been truncated; please use "role: endpointslice" instead`, eps.Metadata.key()) case "warning": logger.Warnf(`the number of targets for "role: endpoints" %q exceeds 1000 and will be truncated in the next k8s releases; please use "role: endpointslice" instead`, eps.Metadata.key()) } // Append labels for skipped ports on seen pods. portSeen := func(port int, ports []int) bool { for _, p := range ports { if p == port { return true } } return false } for p, ports := range podPortsSeen { for _, c := range p.Spec.Containers { for _, cp := range c.Ports { if portSeen(cp.ContainerPort, ports) { continue } addr := discoveryutils.JoinHostPort(p.Status.PodIP, cp.ContainerPort) m := map[string]string{ "__address__": addr, } p.appendCommonLabels(m) p.appendContainerLabels(m, c, &cp) if svc != nil { svc.appendCommonLabels(m) } ms = append(ms, m) } } } return ms}可以看到,"__address__"其实就是拼接了p.Status.PodIP和cp.ContainerPort,而p则代表一个kubernetes的pod数据结构,因此要求:
p.Spec.Containers[].ports[].ContainerPort中配置了暴露metrics target的端口鉴于上述分析,查看了一下环境中的deployment,发现该deployment只配置了8080端口,并没有配置暴露指标的端口10299。问题解决。
apiVersion: apps/v1kind: Deploymentmetadata: labels: app_id: audit name: audit namespace: defaultspec: ... template: metadata: ... spec: containers: - env: - name: APP_ID value: audit ports: - containerPort: 8080 protocol: TCPkubernetes_sd_configs方式其实就是通过listwatch的方式获取对应role的配置,然后拼接出target的__address__,此外它还会暴露一些额外的指标,如:
__meta_kubernetes_endpoint_hostname: Hostname of the endpoint.__meta_kubernetes_endpoint_node_name: Name of the node hosting the endpoint.__meta_kubernetes_endpoint_ready: Set totrueorfalsefor the endpoint's ready state.__meta_kubernetes_endpoint_port_name: Name of the endpoint port.__meta_kubernetes_endpoint_port_protocol: Protocol of the endpoint port.__meta_kubernetes_endpoint_address_target_kind: Kind of the endpoint address target.__meta_kubernetes_endpoint_address_target_name: Name of the endpoint address target.
本文来自博客园,作者:charlieroro,转载请注明原文链接:https://www.cnblogs.com/charlieroro/p/16245402.html