Data Quality vs Data Quantity: part 1 – “Big Data” challenges

Data Quality can be improved by connecting additional data sources. Linking can be done via connectors, API’s and one-off imports or even by linking different ITAM tooling solutions to each other. But, this can result in large data collections.

Blog series - September 2022: Find out why context is king when it comes to interpreting data. 

Also read part 2 of this series: End point solution vs. Platform

Challenge 1: Representativeness

Although the data collected by ITAM is extensive, it cannot be used for statistics just like that. This can be linked to the concept of bias: the “systematic non-accidental biases in the answers of interviewees due to the influence of the interviewer or the wording of the question or the situation in which the person is questioned“. There are many different types of bias, but the context in which a question is asked or the context in which data is collected is of enormous importance in determining how representative that data is.

Take for example the degree of coverage of your ITAM tooling. This is often very specific, such as a specific platform, or a specific legal entity. For instance, there are tools that only look at SAM, or SAM and HAM, but not at cloud, or certain business units are (un)consciously left out of the equation. The data is therefore not always representative.

Challenge 2: Generalisability

Definition: “Although observational data always represent a source very well, they represent only what it represents and nothing more. While it is tempting to generalise from specific observations from one platform to broader settings, it is often misleading.“ Simply put, the data is what it is, and no more. A key way we see this within IT Asset Management is that, for example, the data gathered by a tooling agent is very specific. Each agent has its own way of collecting data and may even have become more specific through configuration and customisation. That data does not consist of all the applications or all the usage, but of all the applications that agent retrieves and that information is specifically linked to that agent‘s recognition database. It produces results that fit exactly into that ITAM solution.

The way the tooling is developed and the intellectual property ensures that what is found cannot automatically be generalised. There can also be all kinds of other technical and organisational limitations, such as not being able to read a particular technology or platform. For example, Linux, macOS or a certain way of packaging specific to that organisation mean that something cannot be generalised.

Challenge 3: Harmonisation

The third challenge will be recognisable to many. The moment data are enriched and sources are combined, more data are obtained. But this always has a price: data fusion, the harmonisation of different sources. The first questions are “What should be matched? On asset ID, on serial number? On both or does one prevail over the other?“

Tooling can help, but a human hand is still needed in harmonising data sources. And then there are the classic problems, such as the differences in data format. What is expected at a certain place, what is combined? Is it a free text field or is it a date or perhaps currency?

Furthermore there are the conventions attached to it. When we talk about currency, is it in US dollars or is it in euros? And what does that say about full stops and commas that can mean different things or the date format? These are things that get in the way when it comes to harmonisation.

Challenge 4: Data overload

Finally, data overload. The amount of data available is growing much faster than the capabilities of analysts and organisations, and the way we are able to deal with it. ITAM managers are often very good at collecting and interpreting data, but they are not yet data scientists. Data analysts can only start working when all the above-mentioned problems have been solved by the data scientists. Only when the data scientists have cleared the field can the data analysts get to work. Databases are becoming more complex and larger and the amount of metadata is no longer manageable for regular employees and the tooling is also very specific.

Interpreting data: Context is King!

Data exchange, data enrichment and data quality are not just technical challenges. It is not just about APIs, SSL connections, connectors. It is about realising data exists in the context of organisation, process and technology. As a result stakeholders like enterprise architects, data owners, process owners - the people who use the data - are important in this kind of projects.

There are many variables such as the data collection process, what data is collected, what is the scope, what flow does the data go through, what is the timing of it, when is it normalised, what is improved, and what is not? These are all just as important as the technology used. Context is King here.


Further steps

Do you recognise these challenges, and are you interested in ways to combat them? Reach out to our experienced consultants to find out more. Or continue reading part two of “Data Quality vs Data Quantity where we discuss how contextual differences between ITAM End point solution and ITAM solutions integrated in a platform influences the challenges our clients have faced. A MUST READ for anyone who works in an environment where a tooling migration is being discussed (so that is all of you out there 😊).