How to manage Unstructured Data

Most organisations approach unstructured data management the same way someone who is in financial trouble approaches their bank statements… they don’t.

Or worse, they’ll look at one credit card statement, feel good about that balance, and ignore their other five cards, car payments, and the personal loan they took out in 2019 that they’ve been pretending doesn’t exist.

If this sound familiar (and coming from someone who has worked themselves out of debt) the analogy is very interchangeable. Swap “credit cards” with “storage platforms” and you’ve described enterprise data management in 2025.

The Whole Picture Problem

Here’s how most organisations “manage” their unstructured data:

The NetApp team logs into a NetApp dashboard. Everything looks ok. Storage is at 60% utilisation; performance metrics are green. Job done.

The cloud team checks AWS. S3 buckets are within budget. No alerts. Looking good.

Someone in IT occasionally glances at the NAS. It’s full again, but that’s always the case. Next budget cycle, ask for money to buy more capacity.

Meanwhile, in shadow IT land… Marketing has put Box on a credit card. Sales has adopted Dropbox without telling anyone. And R&D has a huge Dell PowerScale cluster that nobody outside their department even knows about.

Each silo looks manageable in isolation. But nobody looks at the whole picture.

This is the fundamental challenge when managing unstructured data. You can’t manage what you can’t see. Five separate dashboards for five separate storage platforms isn’t visibility – it’s compartmentalisation.

Managing Data Like Managing Your Finances

Let’s define a better framework for thinking about unstructured data management.

It’s exactly like managing your personal finances.

Stick with me here.

When you’re trying to get your financial house in order, you don’t just look at the biggest bank account and call it a day. You need a complete picture:

All bank accounts
All credit cards
All loans
All subscriptions
All regular expenses
All that random stuff you’re paying for and forgot about

Only then can you understand your financial reality and make sensible decisions from there.

Unstructured data management works in the same way. We need visibility across:

All storage filesystems (NAS, SAN, DAS)
All cloud object storage (AWS, Azure, Google Cloud)
All SaaS storage platforms (Box, Dropbox, OneDrive, SharePoint)
All departmental storage that IT doesn’t officially know about
All the archive data that is sat in places you forgot existed

Without a complete picture, we’re just rearranging deck chairs on the Titanic.

Step One: Getting Out of Debt (Identifying ROT Data)

In personal finance, step one of financial health is understanding your debts. What do you owe? Who to? What’s the interest rate?

In data management, our biggest “debt” is ROT data – Redundant, Obsolete, and Trivial files sat on expensive storage.

Research shows that 20-40% of enterprise storage is ROT data. At a petabyte storage scale (and we covered what that looks like in our previous post <link>), that’s not just inefficient – it’s reckless.

There could potentially be 200-400 terabytes of data debt (at $1000/TB that’s a big real money number). Duplicated files scattered across systems. Projects from three years ago that nobody will reference again. Marketing assets that have been superseded multiple times.

Just like high-interest credit cards, ROT data is expensive. It wastes storage capacity, increases backup times and costs and compounds your infrastructure spending.

The easy wins come from identifying and addressing ROT data:

Duplicates – The same files living different locations because everyone has saved their own copy. That’s like paying an electricity bill multiple times because you forgot you already paid it.

Obsolete data – Projects that shipped years ago with raw assets that are still sat on premium storage. This is the digital equivalent of paying insurance on a car you sold in 2019.

Trivial data – Temporary files, cache files, logs that should have been cleaned up automatically but weren’t. They’re subscription services that you forgot you signed up for.

The goal here is quick wins.

Identify duplicates
archive or delete obsolete data
clean up the trivial stuff

Within a short space of time we can free up 20-30% of your storage capacity without buying anything new.

Step Two: Optimising Regular Expenses (Data Placement)

Once you’ve tackled the obvious debts, the next step in financial management is to manage your regular expenses. Are you paying premium prices for things that have cheaper alternatives?

In data management, this is data placement across your existing storage estate.

Data ages. Its value and access patterns change. Most organisations keep everything on the same premium storage tier because moving it is complicated or because vendor-specific lifecycle tools only work within that vendor’s storage ecosystem.

This is like paying for a gym membership you use twice a year because cancelling it seems complicated.

Ask yourself:

Should three-year-old project data be sat on your fastest NAS filesystem?
Should email archives of previous employees consume the same storage tier as today’s active users?
Are you paying for premium cloud storage when the data only gets accessed on a quarterly basis?

Using your existing storage estate correctly means matching data to the right tier and storage class based on actual access patterns. Not just dumping it wherever there’s some free space.

Active data gets premium storage with instant access. Aging data moves to capacity-optimised storage. Archive data goes to the cheapest tier that still meets retrieval requirements.

Just as with our warehouse analogy <link> hot products at the front, cold inventory at the back. Don’t buy new warehouse space until you’ve organised what you have intelligently.

Step Three: Forecasting and Modelling (Planning Your Future)

This is where data management gets interesting, and where it becomes strategic rather than just reactive.

In personal finance, once you’ve tackled debt and optimised expenses, you start modelling future scenarios:

What if we pay an extra £500 toward the mortgage each month?
When could we afford that kitchen renovation?
How much do we need to save for university fees?

Data management needs the same forward-thinking approach.

Scenario modelling should answer questions like:

What happens to our storage costs if data continues growing at its current rate?
How much could we save by implementing automated tiering policies?
What’s the cost difference between keeping everything on-premises versus moving aged data to an on-prem or cloud object archive?
If we migrate this project to object storage, what’s the total real world cost including egress?

And this is where most organisations hit a wall. Storage vendor dashboards can’t do cross-platform modelling. Cloud cost calculators don’t account for on-premises infrastructure. Nobody gives you a unified view that shows what your data estate will look like – and cost – in different scenarios.

It’s like trying to plan your financial future using five different banking apps that don’t talk to each other and a spreadsheet that you hope is accurate.

The Tool You Actually Need

So what does unstructured data management really need?

Platform Agnostic Visibility

You need to see everything. Not just the storage platforms IT officially supports, but your entire data estate. Every filesystem, every cloud bucket, every SaaS platform, every dark corner where data lives.

If a data management tool is sold by your storage vendor, chances are it’s not giving you the full picture. It’s giving you their picture.

Business-Relevant Intelligence

Raw metadata isn’t enough. You need intelligence that maps to business decisions:

Age and access patterns (is data hot or cold?)
Duplication analysis (are we paying to store the same thing multiple times?)
Cost attribution (which department or project is consuming expensive storage?)
Compliance categorisation (what data has retention requirements?)

This is about building the picture of your data estate in business terms, not just technical storage metrics.

Actionable Capabilities

Visibility without action is just expensive reporting. Your data management platform needs to let you do something with the insights it’s collecting:

Move data between any storage platforms (not just within one vendor’s ecosystem)
Implement automated policies based on business rules
Archive or delete ROT data with proper validation
Migrate projects between storage tiers as they age

The key word is “between.” If your tool only works within a single vendor’s infrastructure, you’re locked in and your choices are limited.

Cost Modelling and Forecasting

Before moving a single byte, you should be able to see:

What is the cost for different data placement scenarios
What is your storage trajectory based on current growth patterns
Do you know the ROI for different optimisation strategies
What is the budget impact of policy changes

Remember the “what if” scenarios from personal finance? Same thing. Your CFO needs to understand what different strategies will cost before committing budget to it.

The Bottom Line: It’s All Connected

Managing unstructured data isn’t three separate problems. It’s one holistic challenge that requires:

Understand what you have (the complete financial picture)
Identify quick wins by eliminating waste (paying off high-interest debt)
Optimise placement across your existing infrastructure (reducing regular expenses)
Model future scenarios before making decisions (financial planning)

Organisations who are winning at data management in 2025 aren’t the ones with the most expensive storage platforms or the biggest cloud spend. They’re the ones who have visibility, intelligent automation, and the ability to make data placement decisions based on business value rather than their vendor’s convenience.

Storage vendors will keep selling you more capacity. Cloud providers will happily take more subscription money. Neither of them will help you understand whether you’re actually using what you already have effectively.

You need a data management platform that lets you see everything, build a relevant picture of age and usage patterns, take action across any storage platform, and forecast what consumption and costs look like under different scenarios.

This isn’t just a dashboard. It’s not a vendor-specific feature. It’s a proper platform designed to manage unstructured data as a business asset rather than just an IT problem.

Because in 2025, the question isn’t how much storage you have. It’s whether you’re managing it like someone who understands their complete financial picture – or like someone with credit card and other debts who only checks one bank statement and hopes for the best.