fingerprintQuery Identifier

From the beginning, Datasentinel has relied on an MD5 hash of the query text to compute query identifiers.

This approach was beneficial for older PostgreSQL versions, prior to the introduction of the native query_id field in the pg_stat_activityarrow-up-right view and the compute_query_idarrow-up-right parameter (available since PostgreSQL 14).

It enables mapping entries between pg_stat_activity and pg_stat_statements to build the database workload of active sessions, and remains beneficial for recent versions.

Starting with Agent 3.9.0, you can switch to using PostgreSQL’s native queryid as the query identifier.

A new configuration parameter, use-postgres-queryid, allows switching from the md5-based identifier to the internal queryid computed by PostgreSQL. This parameter is disabled by default.

circle-info

The existing query_md5_id tag is preserved in the UI.

When this option is enabled, it contains the PostgreSQL queryid value instead of an md5 hash.

This option also provides a workaround for a rare issue in pg_stat_statements where an invalid query text encoding (for example, containing a null byte) can trigger the following error when collecting metrics from the extension.

ERROR: invalid byte sequence for encoding "UTF8": 0x00
circle-exclamation

Enabling the parameter

The parameter can be configured at multiple levels:

  • Instance level, to enable it for a specific monitored instance

# Using Agent CLI
datasentinel update connection "<<CONNECTION_NAME>>" use-postgres-queryid on
# restart agent
datasentinel stop agent
datasentinel start agent

or modify agent configuration file $HOME/.datasentinel/agent.yml

  • Agent level, in which case it applies to all instances monitored by that agent

or modify agent configuration file $HOME/.datasentinel/agent.yml

  • Platform level when using the Agentless feature, in which case it applies to all monitored instances.


Points of Attention

There are situations to be aware of when using PostgreSQL queryids with Datasentinel (non-exhaustive):

  • query_id missing from pg_stat_activity even when the backend state is active

The query_id is not available in pg_stat_activity during the initial Parsing and Analysis phases, or if the backend blocks on a lock during these steps. This occurs because the id is computed and reported only after the analysis phase successfully completes. If the query_id is not available for a query, Datasentinel computes an md5 hash after removing literal values from the query text.

Example

  • Multiple query IDs for queries that look the same

PostgreSQL computes the query_id by hashing the internal structure of a parsed statement. This fingerprinting process is sensitive to how the parser classifies data types, particularly numeric literals, which can lead to different query_id values for queries that appear structurally identical.

When the PostgreSQL parser processes a query containing numbers, it categorizes them based on their magnitude and format.

Even though both queries perform a comparison against a number, the internal representation of that number can differ (integer vs bigint for example). This difference in the parse tree structure results in distinct hashes, generating different query_id values.

In the Datasentinel UI, different query_id values result in separate queries, while the md5 method aggregates their metrics into one.

Example

Last updated