What is the difference between Diskover and CloudSoda?

Diskover acquired CloudSoda June 2025 as part of a major expansion that also included $7.5 million in seed funding and partnerships with Snowflake and NetApp

https://diskoverdata.com/about/press-releases/20250617-seed-round-snowflake-netapp-acquisition-cloudsoda

The acquisition brings together complementary strengths: Diskover’s enterprise scale capabilities with CloudSoda’s user-friendly simplicity

Below we’ll describe the key similarities and differences between CloudSoda to help you choose the best unstructured data management solution for your needs.

Common Features between Diskover and CloudSoda

Storage Support – Diskover and CloudSoda support multiple storage types including on-prem filesystems, cloud storage (S3, Azure) and storage platforms such as Dropbox
Data Search & Discovery – Diskover and CloudSoda let you search across all unstructured data on any indexed storage system
Cost Analysis – Both offer cost analytics features but the way data is presented differs between systems
Duplicate Detection – Diskover and CloudSoda can identify duplicate files between storage systems
User Management & Permissions – Both have role-based access control (RBAC) when logging in and using the systems
API Access – Both have APIs which allow varying levels of integration between third party systems
Reporting Engine – Both offer reporting capabilities for analysis of unstructured data
SSO Integration – Both support single sign-on
Agent-Based Architecture – Diskover and CloudSoda use an agent-based architecture for scanning and indexing storage systems
Secure Metadata Indexing – Both platforms index file and folder metadata for fast searching and analytics without any data leaving the storage environment

Key Differences between Diskover and CloudSoda

Scalability

CloudSoda is perfect for unstructured datasets of up to 250 million files.
Diskover is designed for unstructured data environments with more than 250 million (even into the 2+ billion files or more) and with many cluster deployment options

Data Movement

CloudSoda is a combined analytics and data orchestration solution, with manual and automated transfers and other data movement features
Diskover is the core of any large scale unstructured data management solution. It supports many integrations and its plugin system enables it to be extended or integrated into more than 20+ file movement solutions

Data Enrichment

CloudSoda has basic filesystem metadata collection and tagging functionality to enable you to drive data management workflows
Diskover has an extensible plugins architecture as part of its scanning system, with plugins for extensive metadata enrichment depending on industry specific use-cases

Analytics Depth

CloudSoda provides easy to understand reporting that focuses on storage costs, duplicates which enable users to take action on their data quickly, and see a tangible positive impact on the cost to store their data
Diskover provides more extensive analytics tools such as Heatmaps, Treemaps, DirectorySmart Searches along with an inbuilt Lucene query system to provide deeper analytics of unstructured data and additional metadata that has been harvested using its plugin system

Should I choose Diskover or CloudSoda?

While we expect in the long term that Diskover and CloudSoda will be merged into a single platform, with options for smaller or larger datasets and different levels of functionality, the decision really comes down to one question…

Do you think you have more than 250 million files?

Actually there are a few more in the table below, but the above is a pretty good measure

Here’s a table with a few more considerations

CloudSoda	Diskover
< 250 million files	> 250 million files
I can my unstructured data management using standard filesystem metadata and tags	I need custom metadata extraction AND filesystem metadata to drive my data mangement automation
I want a one-stop solution to provide analytics and data movement capabilities	I need custom metadata extraction AND filesystem metadata to drive my data mangement automation