Skip to content

Metrics

OpenViking provides a machine-oriented metrics system for exposing runtime health, request quality, model usage, resource processing throughput, and probe health states.

Unlike the human-facing /api/v1/observer/* endpoints and the analytics-oriented /api/v1/stats/* endpoints, Metrics are designed for:

  • high-frequency scraping by Prometheus, Grafana Agent, and similar systems
  • low-cardinality, aggregatable metric models
  • monitoring, alerting, capacity observation, and regression diagnosis

Overview

Why Metrics

Metrics are well suited to answer questions like:

  • Has HTTP traffic increased abnormally over the last few minutes?
  • Are resource ingestion, retrieval, or model calls getting slower?
  • Is there queue backlog?
  • Are key dependencies such as storage, model providers, VikingDB, encryption, and async systems currently healthy?
  • Is a specific tenant showing abnormal traffic or error rates?

Compared with logs and observer snapshots, metrics are better for:

  • continuous scraping
  • time-series aggregation
  • dashboard visualization
  • alert rules

How Metrics Differ from Observer and Stats

CapabilityBest ForOutput FormatTypical Usage
/metricsonline monitoring, alerting, trend aggregationPrometheus exposition textGrafana dashboards, Prometheus scraping
/api/v1/observer/*human inspection of component snapshotsJSON / status tablesdebugging, health checks
/api/v1/stats/*analytics-oriented statisticsJSONmemory health, staleness, session extraction

The boundary is:

  • /metrics only carries low-cardinality, low-cost metrics
  • /api/v1/stats/* continues to carry analytics-oriented statistics without being constrained by the Prometheus scraping model

Metrics Architecture

The current metrics stack in OpenViking has four layers:

text
Business logic / HTTP requests / background tasks


      DataSource
   (event emission / state reads)


      Collector
 (semantic routing + labels)


    MetricRegistry
   (in-process metric store)


      Exporter
 (Prometheus text rendering)


       /metrics

DataSource

DataSources provide inputs to the metrics system in two main forms:

  • Event-based: business code emits events at key points, such as retrieval completion, successful model calls, or resource ingestion stage completion
  • Read-based: current state is read before /metrics export, such as queue state, lock state, or probe state

Collector

Collectors turn inputs into metric semantics:

  • choose which metric to write
  • choose which labels to attach
  • define how failure is exposed, such as valid=1/0

MetricRegistry

The MetricRegistry is the in-process metric store that keeps the current metric values and serves them to the exporter.

Exporter

The first exporter implementation is the Prometheus exporter, which renders registry contents into Prometheus exposition text.

Usage

Accessing /metrics

In the current implementation, /metrics is not wired to get_request_context or other auth dependencies, so from the code-path perspective it currently behaves as a public scrape endpoint.

bash
curl http://localhost:1933/metrics

If your deployment protects /metrics at the gateway, reverse proxy, or service discovery layer, attach auth according to the deployment environment.

Prometheus Scrape Example

yaml
scrape_configs:
  - job_name: openviking
    metrics_path: /metrics
    static_configs:
      - targets: ["localhost:1933"]

Understanding Common Labels

LabelMeaningExample
account_idtenant dimension labeltest-account, __unknown__, __overflow__
routeHTTP route template/api/v1/search/find
methodHTTP methodGET, POST
statusrequest or stage status200, ok, error
operationstructured operation namesearch.find, resources.add_resource
context_typeretrieval context typeresource
providermodel or external service providervolcengine
model_namemodel namedoubao-seed-1-8-251228
stagestage label (defined by each metric family)resource stage: parse; token attribution stage: embed_query
validwhether the current sample is fresh and valid1 / 0

Notes:

  • account_id is only enabled on controlled allowlisted metric families to prevent high-cardinality growth
  • valid=0 means the current state/probe sample is a fallback or stale value, not that the label itself is malformed
  • stage semantics depend on the metric family:
    • openviking_resource_stage_*: resource ingestion pipeline stages (for example parse/persist/process)
    • openviking_operation_tokens_total: token attribution stages (for example embed_query/rerank/vlm)

Key Metric Families

The metric summaries below are based on representative metrics currently exposed by the collectors in openviking/metrics/collectors/.

Requests and Operations

Metric FamilyTypeCommon LabelsMeaning
openviking_http_requests_totalCounteraccount_id, method, route, statustotal HTTP requests
openviking_http_request_duration_secondsHistogramaccount_id, method, route, statusHTTP latency distribution
openviking_http_inflight_requestsGaugeaccount_id, routecurrent inflight requests (in-process approximation)
openviking_operation_requests_totalCounteraccount_id, operation, statustotal structured operations
openviking_operation_duration_secondsHistogramaccount_id, operation, statusstructured operation duration distribution

Typical usage:

  • inspect whether /api/v1/search/find or /api/v1/resources is slowing down
  • inspect whether a specific operation has elevated error rates

Retrieval and Resource Processing

Metric FamilyTypeCommon LabelsMeaning
openviking_retrieval_requests_totalCounteraccount_id, context_typeretrieval request count
openviking_retrieval_results_totalCounteraccount_id, context_typetotal retrieved results
openviking_retrieval_latency_secondsHistogramaccount_id, context_typeretrieval latency distribution
openviking_retrieval_zero_result_totalCounteraccount_id, context_typeretrieval zero-result count
openviking_retrieval_rerank_used_totalCounteraccount_idnumber of retrievals that used rerank
openviking_retrieval_rerank_fallback_totalCounteraccount_idretrieval rerank fallback count
openviking_resource_stage_totalCounteraccount_id, stage, statuscount of resource ingestion stages
openviking_resource_stage_duration_secondsHistogramaccount_id, stage, statusduration distribution of ingestion stages
openviking_resource_wait_duration_secondsHistogramaccount_id, operationresource ingestion wait duration distribution (for example queue waiting)

Typical stage values include:

  • request
  • parse
  • summarize
  • persist
  • finalize
  • process

Vector, Memory, and Semantic Metrics

Metric FamilyTypeCommon LabelsMeaning
openviking_vector_searches_totalCounteroperationvector search count
openviking_vector_scored_totalCounteroperationtotal scored candidates
openviking_vector_passed_totalCounteroperationtotal passed candidates
openviking_vector_returned_totalCounteroperationtotal returned candidates
openviking_vector_scanned_totalCounteroperationtotal scanned candidates
openviking_memory_extracted_totalCounteroperationtotal extracted memory items
openviking_semantic_nodes_totalCounterstatustotal semantic nodes

Model Calls and Tokens

Metric FamilyTypeCommon LabelsMeaning
openviking_model_calls_totalCountermodel_type, provider, model_nameunified model call count
openviking_model_tokens_totalCountermodel_type, provider, model_name, token_typeunified model token count
openviking_vlm_calls_totalCounteraccount_id, provider, model_nameVLM call count
openviking_vlm_tokens_input_totalCounteraccount_id, provider, model_nameVLM input tokens
openviking_vlm_tokens_output_totalCounteraccount_id, provider, model_nameVLM output tokens
openviking_vlm_tokens_totalCounteraccount_id, provider, model_nameVLM total tokens
openviking_vlm_call_duration_secondsHistogramaccount_id, provider, model_nameVLM call duration distribution
openviking_embedding_requests_totalCounteraccount_id, statusembedding request count
openviking_embedding_latency_secondsHistogramaccount_id, statusembedding latency distribution
openviking_embedding_errors_totalCounteraccount_id, error_codeembedding error count
openviking_embedding_calls_totalCounteraccount_id, provider, model_nameembedding provider call count (per-call)
openviking_embedding_call_duration_secondsHistogramaccount_id, provider, model_nameembedding provider call duration distribution (per-call)
openviking_embedding_tokens_input_totalCounteraccount_id, provider, model_nameembedding input tokens (per-call aggregate)
openviking_embedding_tokens_output_totalCounteraccount_id, provider, model_nameembedding output tokens (per-call aggregate; may not appear if always 0)
openviking_embedding_tokens_totalCounteraccount_id, provider, model_nameembedding total tokens (per-call aggregate)
openviking_rerank_calls_totalCounteraccount_id, provider, model_namererank provider call count (per-call)
openviking_rerank_call_duration_secondsHistogramaccount_id, provider, model_namererank provider call duration distribution (per-call)
openviking_rerank_tokens_input_totalCounteraccount_id, provider, model_namererank input tokens (per-call aggregate)
openviking_rerank_tokens_output_totalCounteraccount_id, provider, model_namererank output tokens (per-call aggregate; may not appear if always 0)
openviking_rerank_tokens_totalCounteraccount_id, provider, model_namererank total tokens (per-call aggregate)
openviking_operation_tokens_totalCounteraccount_id, operation, stage, token_typeoperation token aggregation (token attribution stages)

Notes:

  • openviking_model_* gives a unified cross-model view for embedding and VLM usage
  • openviking_vlm_* and openviking_embedding_* are better suited for workload-specific dashboards

Queues, Locks, and Runtime State

Metric FamilyTypeCommon LabelsMeaning
openviking_queue_processed_totalCounterqueuetotal processed items per queue
openviking_queue_errors_totalCounterqueuetotal error count per queue
openviking_queue_pendingGaugequeuepending queue items
openviking_queue_in_progressGaugequeuein-progress queue items
openviking_lock_activeGaugenonecurrent active locks
openviking_lock_waitingGaugenonelocks currently waiting
openviking_lock_staleGaugenonepotentially stale locks

These help answer:

  • Is there queue backlog?
  • Is there lock contention or stale locking?

Tasks and Task Tracker

Metric FamilyTypeCommon LabelsMeaning
openviking_task_pendingGaugetask_typepending tasks tracked by task tracker
openviking_task_runningGaugetask_typerunning tasks tracked by task tracker
openviking_task_completedGaugetask_typecompleted tasks tracked by task tracker
openviking_task_failedGaugetask_typefailed tasks tracked by task tracker

Cache

Metric FamilyTypeCommon LabelsMeaning
openviking_cache_hits_totalCounterlevelcache hit count
openviking_cache_misses_totalCounterlevelcache miss count

Session

Metric FamilyTypeCommon LabelsMeaning
openviking_session_lifecycle_totalCounteraccount_id, action, statussession lifecycle event count
openviking_session_contexts_used_totalCounteraccount_id, actionsession contexts used total
openviking_session_archive_totalCounteraccount_id, statussession archive count

Probes and Health State

Metric FamilyTypeCommon LabelsMeaning
openviking_service_readinessGaugemay include validmain service readiness
openviking_api_key_manager_readinessGaugemay include validAPI key manager readiness
openviking_storage_readinessGaugeprobe, validstorage probe, for example agfs
openviking_model_provider_readinessGaugeprovider, validmodel provider readiness
openviking_async_system_readinessGaugeprobe, validasync system readiness
openviking_retrieval_backend_readinessGaugeprobe, validretrieval backend readiness
openviking_encryption_component_healthGaugevalidoverall encryption component health
openviking_encryption_root_key_readyGaugevalidwhether the root key is ready
openviking_encryption_kms_provider_readyGaugeprovider, validKMS provider readiness

Meaning of valid:

  • valid="1": the sample was produced by a successful refresh
  • valid="0": the sample is a fallback or stale value and should be treated with caution

Encryption (Operational Metrics)

Metric FamilyTypeCommon LabelsMeaning
openviking_encryption_operations_totalCounteraccount_id, operation, statusencrypt/decrypt operation count
openviking_encryption_duration_secondsHistogramaccount_id, operation, statusencrypt/decrypt duration distribution
openviking_encryption_bytes_totalCounteraccount_id, operationencrypt/decrypt processed bytes total
openviking_encryption_payload_size_bytesHistogramaccount_id, operationencrypt/decrypt payload size distribution
openviking_encryption_auth_failed_totalCounteraccount_id, statusauth-failed count
openviking_encryption_key_derivation_totalCounteraccount_id, statuskey derivation count
openviking_encryption_key_derivation_duration_secondsHistogramaccount_id, statuskey derivation duration distribution
openviking_encryption_key_load_duration_secondsHistogramaccount_id, status, providerkey load duration distribution
openviking_encryption_key_cache_hits_totalCounteraccount_id, providerkey cache hit count
openviking_encryption_key_cache_misses_totalCounteraccount_id, providerkey cache miss count
openviking_encryption_key_version_usage_totalCounteraccount_id, key_versionkey version usage count

Component and Observer Aggregate Metrics

Metric FamilyTypeCommon LabelsMeaning
openviking_component_healthGaugecomponent, validcomponent health state
openviking_component_errorsGaugecomponent, validcomponent error state
openviking_observer_components_totalGaugevalidnumber of observed components
openviking_observer_components_unhealthyGaugevalidnumber of unhealthy components
openviking_observer_components_with_errorsGaugevalidnumber of components with errors

Typical component values include:

  • queue
  • models
  • lock
  • retrieval
  • vikingdb

VikingDB and Model Usage Statistics

Metric FamilyTypeCommon LabelsMeaning
openviking_vikingdb_collection_healthGaugecollection, validcollection health
openviking_vikingdb_collection_vectorsGaugecollection, validcurrent vector count per collection
openviking_model_usage_availableGaugemodel_type, validwhether model usage statistics are currently available

Possible model_type values include:

  • vlm
  • embedding
  • rerank

Configuration Example

Enabling Metrics

In ov.conf, the metrics subsystem can be explicitly enabled through server.observability.metrics:

json
{
  "server": {
    "observability": {
      "metrics": {
        "enabled": true,
        "account_dimension": {
          "enabled": true,
          "max_active_accounts": 100,
          "metric_allowlist": [
            "openviking_http_requests_total",
            "openviking_http_request_duration_seconds",
            "openviking_http_inflight_requests",
            "openviking_operation_requests_total",
            "openviking_operation_duration_seconds",
            "openviking_vlm_calls_total",
          "openviking_vlm_call_duration_seconds",
          "openviking_rerank_*"
          ]
        }
      }
    }
  }
}

Recommended mental model:

  • server.observability.metrics.enabled: master switch for the metrics subsystem
  • server.observability.metrics.account_dimension: controls whether account_id labels are enabled and where they are allowed

Exporters

By default, OpenViking exports metrics via Prometheus exposition format at /metrics. You can also enable additional exporters under server.observability.metrics.exporters.

Key fields:

  • server.observability.metrics.exporters.prometheus.enabled: enable the Prometheus exporter (serves /metrics)
  • server.observability.metrics.exporters.otel.enabled: enable OTLP export from the same in-process registry
  • server.observability.metrics.exporters.otel.protocol: "grpc" or "http"
  • server.observability.metrics.exporters.otel.tls.insecure: OTLP/gRPC only; true means plaintext (no TLS)
  • server.observability.metrics.exporters.otel.endpoint: OTLP endpoint (for gRPC, use host:4317; for HTTP, use a full URL)
  • server.observability.metrics.exporters.otel.service_name: OTLP service.name resource attribute (default "openviking-server")
  • server.observability.metrics.exporters.otel.export_interval_ms: OTLP push interval in milliseconds (default 10000)

Example:

json
{
  "server": {
    "observability": {
      "metrics": {
        "enabled": true,
        "exporters": {
          "prometheus": {
            "enabled": true
          },
          "otel": {
            "enabled": true,
            "protocol": "grpc",
            "tls": {
              "insecure": true
            },
            "endpoint": "otel-collector:4317",
            "service_name": "openviking-server",
            "export_interval_ms": 10000
          }
        }
      }
    }
  }
}
  • enabled by default, but only allowlisted metric families will receive tenant ids (empty allowlist still yields __unknown__)
  • do not turn user_id, session_id, or resource_uri into labels
  • only enable tenant dimensions on a small set of critical dashboard and alert metrics
  • metric_allowlist supports a limited wildcard syntax: only trailing * prefix matches (e.g. openviking_rerank_*, openviking_embedding_*)
  • a standalone * is not supported, nor full glob/regex patterns

Released under the Apache-2.0 License.