By Lee Johns
A data catalog is an invaluable asset to any organization. Let’s take a look at what an effective data catalog achieves before identifying the top 5 reasons to implement a data catalog. Like any organization, you have growing amounts of data that are constantly changing.
New data is generated every day and is spread across multiple heterogeneous servers, virtual machines and storage systems. Depending on the size and reach of your organization, this data may be housed across multiple physical locations.
How much do you know about the nature of your data? How many copies do you have?
What are the top 100 files in terms of size? How long has it been since these files have been accessed? Are the critical files and applications protected? What files should be archived?
These and many more questions are imperative for implementing effective data management and controlling storage costs.
“A catalog should provide you with an accurate and comprehensive view of your data across the organization. It should provide you with a searchable repository of metadata for improved data management which leads to increased business efficiency.”
Here are the top five reasons you should be deploying a data catalog:
Visibility - You cannot manage what you cannot measure! A global catalog of data enables a better understanding of how data flows within your business. This facilitates better decision making. You should be able to BROWSE, SEARCH and QUERY data across your entire organization, similar to managing a database.
Achieving visibility across your data enables you to build a dashboard of elements to monitor and improve data management. There are many reasons to query your catalog. I have provided a few examples below:
Do I have data that is not being backed up?
I have data I am managing that has not been accessed for years and is not part of my data retention policy?
How many VM’s do I have across my IT environment and how fast are they expanding?
How many copies of a particular file do I have and where are they located?
Who has access to my secure files and is there any data leakage across the company?
Utilization – How efficiently is your data being stored? It is quite possible that the largest drain on your storage capacity is an excess of copy data.
Many primary storage and backup products provide deduplication functionality. However, they tend to de-dupe and store data within siloes. You may have de-duplicated data on one storage system, but have unwanted duplicates residing on another.
Do you have data on primary storage that should be in an offline archive? Not effectively migrating data through its lifecycle is another way to waste expensive primary storage. Creating an enterprise data catalog enables you to query for duplicate files and locate them across your organization. With an enterprise data catalog you can eliminate unnecessary files, free up capacity and reduce storage costs. You can also query files for age and when they were last access to determine which data should be protected and archived.
Security – Is your data properly protected? Is it housed securely within the right location? Do the right people have access to it? These are common concerns for IT departments.
What if you could receive a simple report showing all of your unprotected files? What if you can easily determine if any copy data leaked their way onto unauthorized network segments?
With increasing security concerns, a data catalog is a necessary tool to add to your security monitoring solutions. If the NSA can have documents stolen by a contractor, it can happen to anyone. Eliminating data leakage improves data security.
Compliance – Each industry has their own special compliance needs and regulations which impact many aspects of your data.
Most industries have strict governmental regulatory compliances such as HIPAA within healthcare, FDIC & SPIC within banking & finance and Sarbanes Oxley (SOX) within public corporations.
How long should you keep various types of data? Who should have access to different classes of data? How well protected is your data? How quickly can you retrieve your data for eDiscovery? Which data should be destroyed and when? What are your data privacy requirements?
A data catalog becomes a requirement to ensure compliance and governance.
Efficiency – You may already have processes in place to handle security and compliance, but how much easier would those processes be if you had a global data catalog?
The more efficiently you can profile, collate, manage, browse, search, query and report on your data, the more time you can spend on other projects that drive your business forward and improve productivity.
There are real cost savings from implementing an enterprise cataloging solution as part of your IT data protection and data management strategy.
Is it time for you to get your data under control?
Creating an enterprise catalog of your primary and copy data is a great place to start!