CloudSoda Data Transfer Flow

How Data Moves Between Storage Systems

Moving data between storage systems sounds simple in theory. However, most organisations hit the same problems every time. Transfers take longer than expected. Costs surprise everyone after the fact. And nobody knows whether the data arrived intact until someone manually checks.

CloudSoda eliminates these problems with a structured six-step transfer flow. Every job — whether it moves 10GB or 50TB — follows the same process. As a result, you always know what a transfer will cost, how long it will take and whether the data arrived complete.


1️⃣ Job Creation

Every transfer starts in the CloudSoda UI. You select a source storage, a destination storage and configure your job parameters. Specifically, you define which paths to include and how to handle conflicts. Options include overwrite, rename, skip, or overwrite only when the source file is newer.

On top of that, you can pin specific agents to the job. By default, CloudSoda delegates to the best available agents automatically. However, manual pinning gives you full control for sensitive workloads or specific network routing needs.

The controller’s job engine then validates the configuration. At this point, nothing moves — the job simply exists as a definition ready to execute.


2️⃣ Preview and Dry Run

Before any data moves, CloudSoda models the complete cost. This capability prevents the surprise egress bills that derail storage budgets.

The dry run calculates three cost components. First, it estimates egress fees from the source. This matters most when moving data out of AWS, Azure or GCS, where providers charge per gigabyte. Second, it calculates ingress costs at the destination. Third, it projects ongoing monthly storage charges at the new location.

As a result, you see one-time migration costs and recurring costs side by side. You can then compare multiple options — for example, S3 Standard versus S3 Glacier versus Azure Cool. This makes it easy to choose the best balance of accessibility and cost.

CloudSoda also estimates transfer duration. It factors in your network conditions, dataset size and file composition. Consequently, you can book maintenance windows with confidence rather than guessing whether a migration will finish over the weekend.


3️⃣ Source Scan

Once you approve the job, the controller dispatches it to the nearest source agent. That agent scans the source through the appropriate accessor. SMB handles Windows shares. Local path handles NFS mounts. The S3 SDK handles cloud buckets. Dell PowerScale connects via its own API.

During the scan, the agent builds a complete file manifest. It captures every file path, size, modification time and relevant attribute. Multithreaded scanning keeps this fast — a well-provisioned agent indexes up to 25,000 files per second. Even storage systems with tens of millions of files get catalogued in minutes.

Importantly, the scan phase also identifies conflicts with the destination. CloudSoda flags existing files at the target and applies your chosen resolution rules. As a result, you avoid unexpected overwrites or duplicates.


4️⃣ Agent-to-Agent Transfer

This step delivers CloudSoda’s biggest advantage. Data flows directly between source and destination agents over encrypted HTTPS. Crucially, nothing routes through the controller.

The parallel transfer architecture breaks the workload into thousands of concurrent operations. Rather than queuing files one at a time, CloudSoda transfers many simultaneously. It saturates whatever bandwidth you have available — with no artificial caps.

Reference customers routinely achieve 100Gbps between geographic locations. In practical terms, a 50TB archive migration drops from six days on legacy tools to under 14 hours. For post-production studios moving footage between London and Los Angeles, that difference transforms project timelines.

Agent clustering takes this further. CloudSoda distributes the file list across multiple agents working the same job. Four agents instead of one can cut transfer times by up to 60%. If one agent fails mid-transfer, the others continue without restarting. CloudSoda reassigns the failed work automatically.


5️⃣ Write to Destination

The destination agent receives the data stream and writes through its accessor. The S3 SDK handles AWS targets. The Azure SDK handles Blob storage. SMB or NFS handle on-premises NAS. CloudSoda manages all protocol translation transparently.

You can also specify the destination storage class at job creation. For example, write straight to S3 Glacier Deep Archive or Azure Archive tier. This avoids the common two-step pattern of landing on hot storage first and tiering later.

Furthermore, CloudSoda preserves file metadata throughout. Timestamps, ownership, permissions and attributes carry across intact. For Windows environments, the platform supports ACL replication via the Windows agent. As a result, permission structures transfer cleanly between SMB shares.


6️⃣ Validate and Report

After the transfer completes, CloudSoda verifies every file against the source. It checks file size, metadata and integrity markers to confirm the destination matches exactly.

Individual file failures get logged separately. You retry only what failed — not the entire job. A 50TB migration where 200 files encounter errors does not start from scratch. Instead, CloudSoda retries those 200 files, which typically completes in seconds.

CloudSoda also calculates the actual transfer cost. This figure includes retries, cancelled operations and any variance from the dry run. Both the job detail view and reporting section display these final numbers. As a result, finance and operations teams get an accurate audit trail — not estimates.


🔀 Any Source, Any Destination

The six-step flow works identically regardless of storage platform. CloudSoda supports transfers between any combination of the following:

Filesystems: NetApp ONTAP, Dell PowerScale, NFS mounts, SMB/CIFS shares, local paths

Object Storage: AWS S3, Azure Blob, Google Cloud Storage, Wasabi, Backblaze B2, Oracle Cloud, any S3-compatible endpoint

Move data in any direction. Filesystem to cloud. Cloud to filesystem. Cloud to cloud. Filesystem to filesystem. CloudSoda’s accessor layer handles protocol differences, so the same six steps apply every time.


⚙️ Automate the Flow with Policies

The six-step process works for ad-hoc transfers. But most organisations also need ongoing automated movement. CloudSoda’s policy engine wraps the same flow in scheduling and rule logic.

For example, create a policy that runs nightly at 2am. It moves files older than 90 days from production NAS to S3 Glacier. Alternatively, schedule a weekly sync between two locations for disaster recovery. Policies support cron expressions for precise timing, plus file filters based on age, extension, size or path.

Because policies use the same underlying flow, every automated job includes cost modelling, parallel transfer, validation and reporting. Consequently, you get full visibility into what each policy costs — without anyone monitoring it manually.


📊 Full Audit Trail

Every transfer generates a complete record. CloudSoda logs the agents involved, files transferred, files skipped, conflicts resolved, bytes moved and true cost. This appears in both the job detail view and the centralised reporting section.

Compliance teams use this trail to demonstrate what moved, when, where and at what cost. Operations teams use it to optimise future transfers. They can identify bottlenecks, compare agent performance and track cost trends over time.