For data, the phrase “unstructured” is unfortunate; it’s a legacy label from the world of databases and IT that makes 80% of enterprise data sound like chaos. But here’s what makes it fascinating: unstructured data isn’t just structured, it’s structured in layers. And each layer is a different conversation waiting to happen.
Think of it as a geological survey. You don’t drill straight to the core. The surface tells you one thing. The strata tell you more. The deeper you go, the richer deposits your find. But even the topsoil has value if you know how to read it.
Starting at the outermost layer: the folder. Folders are for humans. They carry project names, department codes, job numbers, client identifiers, task descriptions. None of this lives inside the file itself, it’s encoded in the path. A well-organised folder hierarchy is a business taxonomy written by the people closest to the work itself. It’s free metadata, hiding in plain sight.
Drop one level to the file and its filesystem attributes. Filenames carry context; sometimes cryptic, sometimes descriptive. And the filesystem itself records access time, modification time, creation time, sizes and ownership. These aren’t just details for housekeeping. They’re behavioural signals. A file last accessed three years ago tells a different financial story than one touched yesterday, even if they’re sitting on the same tier of storage.
Go deeper still into the file’s header, and things get really interesting. Media files in particular are extraordinarily generous here. Camera make and model, lens metadata, timecode, frame rate, colour space, codec settings: all readable without needing to decode the essence of the file. It’s like reading the label on a bottle without opening it. For anyone building visibility into storage, headers are the highest-value, lowest-cost extraction point in the entire stack.
Then there’s the essence itself: the meat in the sandwich. This would be actual image data, the audio waveforms, the rendered video. Reading this layer takes real compute. But the potential value is potentially high: scene detection, speech-to-text, object recognition all need the essence. We’re only beginning to scratch what’s extractable here, and the economics of doing so improves with every new AI model.
What makes unstructured data cool isn’t any single layer. It’s the fact that value exists at every depth. Most organisations are only skimming the surface, if they’re looking at all. There’s a whole discipline emerging around reading these layers systematically. Honestly, it’s a great time to nerd out about storage.
