Introduction
Data quality management involves the institution and dissemination of functions, tasks, and course of action as well as measures regarding the attainment, protection, distribution and giving out of information (Aebi & Perrochon, 2007). Data quality management not only involves the utilization of various tools for it to succeed but also the technology firms involved in the constant development of the needed supportive software.
Essentially, a collaboration between technology firms and corporations are critical for the success of the data quality management (Aebi & Perrochon, 2007). The international corporations are responsible for the establishment of rules and guiding principles that manage data as well as procedures for quality data verification (Olson, 2006).
The Information Technology (IT) firms are responsible for the development and management of overall infrastructure that includes architecture, technical facilities, and systems as well as databases (Aebi & Perrochon, 2007).
International corporations have continuously understood the importance of data as an asset. However, the definite value cannot be assigned on the asset since data is intangible.
Besides, corporations are constantly encountering difficulties in funding activities relating to data as one of a critical strategic asset due to an intangible deficiency returns on investments (Goasdoué, Nugier, Duquennoy & Laboisse, 2013). The paper will be looking into data quality management, particularly tools used by international corporations in the information management as well as the benchmarks put in place to choose the appropriate tool.
Data Quality Management Defined
Data quality management involves the institution and dissemination of functions, tasks, and course of action as well as measures regarding the attainment, protection, distribution and giving out of information (Aebi & Perrochon, 2007). As indicated, data quality management involves not only the corporations for it to succeed but also the technology firms. Essentially, the collaboration between technology firms and international corporations are critical for the success of the data quality management.
The corporations are responsible for the establishment of rules and guiding principles that manage data as well as procedures for quality data verification (Wang, Lee, Pipino & Funk, 2006). On the other hand, IT firms have the responsibility of developing and managing supportive infrastructures. In most case, the supportive infrastructures range from architectural designs to systems and database.
International businesses’ decisions are dependent on the available data. Often, firms utilize data warehouse to scrutinize business trends such as stock movement, customers’ behaviors, as well as sales (Wang et al., 2006). The examinations of the trends are critical in the formulation of future strategies as well as in the decision-making process.
Essentially, data is categorized into various groups, including financial and customers’ information. Data concerning the customers are used to make appropriate decisions about the customers. Similarly, data about financial systems are used in the analysis of financial projections such as profitability as well as investment decisions (Wang et al., 2006).
Recently, new tools on data quality management have been developed particularly tools that are oriented towards managing customer data (Wang et al., 2006). In fact, IT has played a critical role in developing required tools for the management of customer data. In other words, information technology has stepped up the challenge by automatically providing quick solutions to most of the data management problems relating to customer information.
Even though the current focus on data quality management is on end-user information, international businesses’ set of information has to be accompanied by supportive technology. For instance, international organizations need new standardized part codes and names that combine product information stored in diverse systems (Huang, Lee & Wang, 2008).
Such processes pose a great challenge to data management and new tools that deal with non-names and address data must be put in place to avoid misspell as well as the duplication of information. In other words, the tool dealing with such data set should be equipped with systems that are capable of incorporating new kind of information.
In addition, the tool should be capable of allowing the system to perform functions such as filtering, integrating, and harmonizing the data set. Moreover, new tools should be developed to deal with issues that arise from the functionalities of data warehouses.
Data Management Tools in Large Organizations
Data quality occupies major components in the management of data within large organizations. As such, software firms are proposing an increasing number of tools that can be applied to manage quality data, particularly where large volumes of information are involved. Currently, the scopes of the newly developed tools are focusing on integrating all areas in data quality management, including profiling as well as rule detection (Barateiro & Galhardas, 2005).
In other words, the manner in which data is currently being used and managed has changed from previous applications to the current integrated form of information. Previously data management focused on specific applications tools that aided the removal of duplication of data or normalization. However, current tools integrate various information areas.
However, managers need a framework that aid in the choice of appropriate tools. Tool functionalities are one of the criteria that have been used to identify the appropriate instrument in the management of data quality. The tool functionality normally aimed at measuring the quality of the databases, and it must be within the context of the data set. In order to come up with a framework useful in the choice of appropriate data management tool, a general matrix used in the evaluation and comparison of the apparatus are applied.
As indicated, data quality has been highly valued by large corporations. In these organizations, data quality is needed in various areas. For instance, in the Customer Relations Management (CRM), data integration and regulation is critical. Besides, large corporations understand that poor management increases the costs incurred by the firm ranging from maintenance and repair to increased sales costs.
Despite economic costs, poor management of data may also have greater effects on the customers’ satisfaction, the reputation of the firm as well as in strategic decisions (Agosta, 2010). As such, it is critical to have tools that are capable of measuring the quality of databases.
Most of the IT firms have developed various tools ranging from DataFlux (SAS) to Business Object (BDQS) tools. The tools are necessary for the management of data. The applied tools provide for increased opportunities to notice problems relating to data quality, including quantity, duplication as well as assessments.
In most cases, multinational corporations deal in databases related to Customer Relationship Management (CRM), Business-to-Customer (B2C) as well as Businesses-to-Business (B2B). The catalogs contain huge data capacities as well as multifaceted information. In most cases, the catalogs contain data that range from consumer information to energy consumption. However, database relating to CRM contains information centered on customers.
In fact, such huge volumes of data comprise of terabytes of data information and highly complex. Moreover, the complexity of the information and the databases requires a complicated tool capable of managing such huge data (Huang et al., 2008). Besides, in such a complicated environment, the quality of data cannot always be exceptional.
Errors resulting from a deficiency in quality may lead to disastrous consequences to the firm including incessant customer dissatisfaction, wrong decisions, negative financial effects, increased operating costs, lost confidence in the firms’ products as well as a stained corporate image (Agosta, 2010).
Data Quality Dimensions
The quality of data is measured according to its suitability for its intended purpose. In other words, the value of information is gauged against its suitability, adherence to the formal requirements as well as how it meets the needs of the customers. However, the quality of data can be evaluated according to the level in which it meets the needs of the users (Huang et al., 2008). In other words, the quality of data depends on the manner in which the database is aligned to the reality of the business processes.
For instance, the diverse ways in which the data handlers such as managers, consumers as well as operators assess the reliability of information result in various value proportions. In most instances, the accuracy of data (correctness), completeness of data, relevancy, and consistency are critical dimensions that have been applied university in data quality management (Aebi & Perrochon, 2007). As such, data management tools have to focus on these dimensions in order to be considered appropriate.
Some of the Tools used by International Organizations to Manage Quality Data
Numerous tools have been developed to aid in data quality management. However, the tools have been categorized according to the intended purpose. There are tools that normalize data while some tools are used for standardization. However, the choice of the tool applied by the firm on the data management depends on the cost, the purpose, and need to fulfill the customer requirement (Aebi & Perrochon, 2007).
Nevertheless, most of the international organizations adopt tools that provide integrated platforms in data management. In other words, various functionalities can be attained within a single tool. Such strategies are applied in consideration of costs and quality needed in the data management.
DataFlux
DataFlux is one of the management tools that are used in the improvement of data and analysis through the application of the integrated platform. Through the application of the integrated platform, the international organizations can be successful and resourceful in establishing an amalgamated view of the clientele, products, and services as well as the suppliers within a single platform (Batini & Scannapieco, 2006).
In other words, the DataFlux enterprise integration products enable businesses to develop a single platform in which various databases can easily be accessed. Essentially, the tool enables international organizations to rapidly assess and improve problematic data and develop the foundations for enterprise data governance (Aebi & Perrochon, 2007).
Efficient and effective data management delivers high-quality information that can propel successful enterprise efforts such as risk management, operational efficiency, and Master Data Management (MDM) (Batini & Scannapieco, 2006).
Quality Stage (IBM)
Quality Stage is tools that enable organizations to manage and maintain quality information as well as governance initiatives. The IBM tools enable organizations to create and maintain consistent views of various important entities, including customers, vendors, products, and locations (Huang et al., 2008). Essentially, Quality Stage (IBM) allows users to cleanse and filter data to prevent duplication. In most cases, Quality Stage (IBM) is used to manage a huge quantity of data in large organizations.
The tool is used in data profiling, standardization, probabilistic matching, and enrichment (Huang et al., 2008). The capabilities are used to manage high-quality data regarding core business entities. Besides, the tool delivers data quality platform as part of the integrated whole. In other words, the tool delivers high data quality with the amalgamated platform (Huang et al., 2008).
Data Quality (Business Object)
The tool allows the organization data users to gain easy access to Business Intelligence (BI). In fact, the tool allows easy visualization and solutions required by the organization to make quick and informed decisions (Aebi & Perrochon, 2007). The Business Intelligence (BI) filters data and provides the required information required for improved performance. The ability to purify data enables the platform to be more effective and efficient in data management (Aebi & Perrochon, 2007).
The Data Management Systems Benchmark
As indicated, various data management tools have been developed to ensure appropriate and effective governance of the organization’s data (Aebi & Perrochon, 2007). As indicated, not all the tools are relevant to the organization. Therefore, organizations have to come up with a framework that would allow the choice of tools appropriate for their data management. Even though each organization has its unique benchmark, the most commonly applied benchmark utilizes general data quality dimensions (Aebi & Perrochon, 2007).
Moreover, the benchmark applies various functionalities, which organizations can utilize to choose the most suitable tool for its quality data management. Essentially, the benchmark utilizes the universally accepted dimensions, including accuracy of data (correctness), completeness of data, relevancy, and consistency in determining the appropriate tool for data management.
Completeness Dimension
Tend to identify any missing value in the data set. Actually, regarding comprehensiveness, columns or the entire table should not have missing values or missing tuples respectively. In principle, greater focus is often accorded to concerns of extensiveness when dealing with production catalogs (Batini & Scannapieco, 2006). The reason is that completeness issues normally play significant roles in performing business procedures and processes in an approved manner.
For example, to perform precise aggregation of invoices, the availability of entire invoice lines is a necessity (Batini & Scannapieco, 2006). In addition, when conducting aggregation of data involving large chunks of data depositories, higher completeness levels of information play fundamental roles in ensuring the improvement of management pertaining to customer relationships (Barateiro & Galhardas, 2005).
Indeed, operational processes provide essential and candid measures of completeness. For example, in the setting of marketing databanks, the dimension of quality criteria often recognize the invaluable roles played by complete information (Barateiro & Galhardas, 2005). Essentially, data quality management processes are important in controlling as well as improving the completeness.
Data quality management is critical in paying off omitted information relating to essential consumer attributes. For instance, a data quality-management model known as DECLIC-L enables the encapsulation of the statistical data insertion as well as excavation procedures (Batini & Scannapieco, 2006). In other words, the prototype ensures effective and efficient programmed control of analytical models for database upgrading.
In general, complete data quality management models offer outcomes that are fundamental in the analysis of the output of different prototypes with the confidence limit of the projections (Barateiro & Galhardas, 2005). Besides, completeness of data aids end-users to analyze along with cleaning of data concerning an organization’s operations (Batini & Scannapieco, 2006). Moreover, completeness of data guarantees that projected values are provided with the concomitant numerical exactness.
Accuracy Dimension
The dimension tends to emphasize the precise entry of data. Over the years, accuracy as a criterion of data quality management has continued to receive poor reporting. The reason is that finding the dimension of accuracy is often problematic and challenging (Batini & Scannapieco, 2006). Further, the measurement of accuracy results into higher outlays. In principle, the control as well as improvement of accuracy, external reference data is mandatory.
For example, in the evaluation of definite data against factual equivalent via survey, the application of external reference data is often needed (Wang et al., 2006). On the same note, the control and improvement of accuracy entail high outlays leading to reduced amounts of determined authentications, including consistency controls. For instance, considering personal phone numbers in France, the phone figures are necessitated to commence with 01, 02, 03, and 04 to ensure consistency (Wang et al., 2006).
Consistency Dimension
Adhering to the stipulated rules is an important factor in ensuring consistency (Aebi & Perrochon, 2007). For instance, the requirements of certain international organizations are that the addresses be in line with the countries codes. Additionally, certain regulations hypothesize that the correspondence of invoicing to electric power consumption is a necessity. Actually, consistency is an essential measurement in the control of data steadiness.
The measurement of consistency entails the definition of conventional constraints (Huang et al., 2008). Nonetheless, people often measure the proportion of information that only meets the established constraints to deduce the extent of suspicious data (Wang et al., 2006). As such, consistency incidentally proves precision. Consistency issues are critical aspects applied in data quality management, address standardization as well as data summarizing procedures.
Relevancy Issues
The expediency of data forms the core of relevancy (Agosta, 2010). In essence, the utilization of databases often involves vast masses of data. As such, the identification of useful information along with the adaptation of data to user needs is always difficult. Based on this, the utilizers normally develop unfortunate imitations relating to relevancy aspects leading to diminished attentiveness in the database (Wang et al., 2006).
Indeed, relevancy is critical since it ensures that the users consent the basis of data. Normally, providers are obliged to offer surety concerning the eminence of information to users due to the expenses involved in the acquisition and management of information as well as financial and strategic risk factors involved in the data usage (Barateiro & Galhardas, 2005). In reality, authentication of the reliability of the information that distributors offer to market players is essential (Aebi & Perrochon, 2007).
Conclusion
The generation of data quality management is invaluable in addressing the normalization of data. Over the years, a generalization of data has been an issue of concern. As such, data quality management tools enhance optimization by providing accessible interfaces, integrate profiling, deconstructing, and standardization along with cleansing and matching processes. In principle, the major focus of data management benchmark is on the measurement and functionalities of data management systems.
References
Aebi, D., & Perrochon, L. (2007). Estimating data accuracy in a federated database environment. Information Systems and Management of Data, 21(3), 86-94.
Agosta L. (2010). Definitions of data quality. Upper Saddle River, NJ: Prentice Hall.
Barateiro, J., & Galhardas, H. (2005). A survey of data quality tools. New York, NY: Springer.
Batini, C., & Scannapieco, M. (2006). Data quality: concepts, methodologies and techniques. New York, NY: Springer.
Goasdoué, V., Nugier, S., Duquennoy , D. & Laboisse, B. (2013). An evaluation framework for data quality tools. Enterprise Information Systems, 4(1), 341-347.
Huang, K. T., Lee, Y. W., & Wang, R. (2008). Quality Information and Knowledge. Upper Saddle River, NJ: Prentice Hall.
Olson, J. E. (2006). Data quality, the accuracy dimension. Burlington, Massachusetts: Morgan Kaufmann.
Wang, R. Y., Lee, Y. W., Pipino, L. L., & Funk, J. D. (2006). Journey to data quality. Massachusetts, MA: MIT Press.