Data Observability: improve your data monitoring with advanced statistics and custom natures

4 min read
(April 2025)

Data Observability: improve your data monitoring with advanced statistics and custom natures

Data Observability

The ability to effectively monitor your data has become essential for anticipating anomalies and guaranteeing consistent quality. By enriching its functionalities with advanced statistics and customized natures, Tale of Data enhances your data observability. These tools enable you to better understand the state of health of your data, quickly identify significant deviations and configure alerts tailored to your needs. Thanks to these innovations, you now have more precise and proactive control over your information systems.

I - Advanced statistics in the Mass Discovery module

We have added an advanced statistics calculation to the Tale of Data data discovery module.

In concrete terms, when you run an analysis on thousands or even millions of tables, Tale of Data collects, for each column, the number of distinct values, the minimum, maximum, average, standard deviation and various percentiles. This information is added to the data quality statistics and semantic analysis (e.g. identification of sensitive or personal data) already offered by the Data Discovery module.

This new functionality offers two advantages:

  1. A much more precise snapshot (by column) of the actual state of your data.

  2. A new range of possibilities in terms of Data Observability: one of the aims of Data Observability is to provide a precise mapping of the health of your data, and to trigger alerts when quality indicators exceed certain thresholds.

Mass Data Discovery
In Tale of Data, you can now trigger alerts on much more specific events. Here are just a few examples:
  • I want to receive an alert when the number of modalities in a column falls below twenty values, indicating that something has gone wrong with a data import process.

  • I want to receive an alert when the threshold of the 5% highest values for my column rises above a certain threshold, which means that a certain number of outliers have appeared in my dataset.

  • I want to receive an alert when the standard deviation for a given column (numerical or date type) has fallen sharply. This may mean that some processing has led to a regression that has produced an unusual distribution on that column.

II - Adding custom natures

As standard, Tale of Data is capable of recognizing nearly fifty "natures" of data. In fact, by analyzing thousands of structured files or database tables, Tale of Data automatically provides a precise mapping of the columns in which telephone numbers, e-mails, IBANs, surnames, first names, etc. are present.

Tale of Data also provides quality statistics on these columns: the percentage of missing data and the percentage of invalid data (e.g. malformed emails).

What's new is that you can now define your own data natures and benefit from the massive analysis and statistics offered by Tale of Data.

Tale of Data offers three ways of defining customized natures:

  1. Specify a list of values: you can, for example, define a "Color" nature for which the list of permitted values is white, yellow, orange, red, blue, green, brown, gray and black. Tale of data will be able to identify columns of type "Color" and provide the number of cells with a value not belonging to the specified list of colors.

  2. Specify a regular expression : for example, if the detection of license plates in datasets scattered across your information system is an important subject for you, you can specify in Tale of data that a French license plate consists of 2 letters followed by a dash, followed by 3 letters, another dash, then 2 digits. Tale of data will then be able to search tens of thousands of datasets for columns containing license plates.

  3. Provide a script: this last option is important when certain calculations need to be performed to ensure the validity of the data. For example, if you're looking for datasets containing intra-Community VAT numbers, a number of scripting rules need to be verified to identify and validate this type of data rigorously. For example, in France, the intra-Community VAT number is made up of the code FR followed by 11 digits: a 2-digit computer key (to be verified with an algorithm) followed by the company's 9-digit SIREN number.)

Customized natures enable you to adapt Tale of Data's analysis and monitoring capabilities to your data typology. This gives you a powerful means of triggering alerts on data anomalies specific to your business, before they impact the smooth running of your company.

Conclusion: proactive monitoring for optimal, risk-free data quality

Advanced statistics and custom natures bring a new dimension to Data Observability, enabling you to examine your data with even greater precision. Thanks to these tools, you can not only monitor the evolution of your data in real time, but also configure specific alerts based on defined criteria, such as anomalies in standard deviations or outliers. This level of monitoring enables you to anticipate potential problems before they impact your business processes, guaranteeing optimum data quality while reducing risks.

Tale of Data offers you a proactive means of controlling your data, improving decision-making and limiting downtime due to anomalies. To find out more about the importance of data quality in governance and its strategic role, read our article on data quality, a major pillar of Data Governance.

Request a demo