Tag Archives | dr

Implementing Site-to-Site Replication with Coho SiteProtect

Now that I’ve given you a quick overview of the architecture of Coho SiteProtect, I’d like to provide you with the basics for implementing SiteProtect in your data center. This is the 2nd in my series of posts on our site-to-site replication offering. As I discover the best practices for deploying SiteProtect in various infrastructures and scenarios, I’ll document those here as well, so stay tuned for those…

Without further ado, here is the step-by-step set-up procedure for SiteProtect…

Pairing the Sites

The first step in setting up remote replication is establishing a trusted relationship from the local site to the remote site. This is done from the Settings > Replication page in the Coho web UI, indicated by the gear (settings) icon (Figure 1).


Figure 1: Settings > Replication page

From here, click the “Begin replication setup” link which brings you to the configuration screen for the local site (Figure 2).


Figure 2: Settings > Replication > Local Site page

Here, you’ll specify the network settings for the site to site communication. It is worth noting that the replication traffic is sent on a VLAN to simplify network management for enterprise environments.

Here you can also configure bandwidth throttling for outbound traffic in case you need to limit the usage of the site to site interconnect. The same can be done on the remote site which means that both incoming and outgoing throughput can be controlled. Bear in mind that by limiting the traffic, you may increase the time it takes for a workload to finish replicating, in other words, increase the RPO.

Once that’s complete, you’ll click “Next” and specify the IP and password of the remote DataStream. Click “Next” again to proceed (Figure 3).


Figure 3: Settings > Replication > Remote Credentials page

Once the wizard confirms a connection to the other side, you’ll specify the remote system’s VLAN, replication IP address, and netmask, as well as the default gateway for the other side and click “Next” (Figure 4).

Note: On this page the bandwidth limit relates to outbound traffic from the remote site; or put another way, the inbound replication traffic arriving at the local site.


Figure 4: Settings > Replication > Remote Network page

Finally, you’re brought to step 4, which is the “Summary” page and allows you to review the configuration before applying the settings. Click “Apply and Connect” to complete the wizard (Figure 5).


Figure 5: Settings > Replication > Summary page

From this point forward, you’ll be presented with the following view when you go to the Settings > Replication page. You can see here (Figure 6), the IP of the remote node and that replication is active.


Figure 6: Settings > Replication page (completed)

Configuring Workloads and Schedules

Now that the initial pairing is complete, you’ll visit the “Snapshots and Replication” page to customize which workloads are replicated as well as the snapshot & replication interval for each (Figure 7).


Figure 7: Snapshots / Replication > Overview page

Here (Figure 7), we provide an overview of the workloads. This is a dashboard which tells us the number of VMs with snapshots as well as replicated snapshots. For all of a site’s workloads to be protected, they should all have replicated snapshots, ensuring that any of those workloads can be recovered on the remote site in the event of a disaster.

We also provide a summary of the workloads covered by replication, how many bytes have been transferred as well as the average replication time. These statistics provide the assurance that replication is functional, and also the rate of change of the data, allowing you to determine if your replication interval is appropriate for the bandwidth you have available. If your average replication time is greater than your snapshot schedule, you can modify it accordingly.

To configure or modify workloads, proceed to the “Workloads” page (Figure 8).


Figure 8: Snapshots and Replication > Workloads page

Here (Figure 8), we denote the local vs. the remote workloads, provide a record of when the last snapshot was taken, and display the assigned schedule.

Note: VMs which have been deleted are denoted with a strike through the name.

Under “Snapshot Record”, you can click on the calendar icon to view snapshot date, name and description, as well as the status of replication. In this example, we have recently enabled the workload for replication denoted by the word “Scheduled” (Figure 9).


Figure 9: Snapshots and Replication > Workloads > Snapshot Record page

To manually protect a specific workload, click the camera icon next to that workload. This will allow you to take a manual snapshot and replicate that snapshot (Figure 10).


Figure 10: Snapshots and Replication > Workloads > Snapshot page

Most users will want to protect a number of VMs at once. The best way to do this is from the “Default Schedule” page (Figure 11).


Figure 11: Snapshots and Replication > Default Schedule page

In this example we have selected a RPO of 15 minutes by replicating the snapshot every 15 minutes. The frequency of snapshots is best determined by the needs of the application and the automated snapshot schedule for Coho offers flexibility, from minutes to months.

Note: Quiescing snapshots puts the system in a state that maintains application consistency before taking the snapshot, however this is only available in the daily and weekly schedule. Taking quiesced snapshots more frequently may cause significant performance penalties. These performance penalties are not related to the Coho storage but to how snapshots are executed within the VMware environment. A crash consistent snapshot (no quiesce) can be done very frequently on the Coho storage without performance penalty.


In the event of a disaster you’ll want to be be able to bring up your applications in the remote site. This is done from the “Failover/Failback” view (Figure 12).


Figure 12: Snapshots and Replication > Failover/Failback page

Initially, failover and failback are disabled in order to protect you from instantiating multiple copies of the same VM. You make the decision (from either location) to put the disaster recovery plan in-motion. If you’re ready to proceed, click the “Enable” button to enable failover (Figure 13).


Figure 13: Snapshots and Replication > Failover/Failback page (enabled)

You can now go to the remote DataStream and clone your replicated workloads to the remote system. Open up the web UI of the remote DataStream and, again, go to the Snapshots and Replication > Workloads page (Figure 14).


Figure 14: Snapshots and Replication > Workloads page (remote)

Click the “Remote Workloads” checkbox to filter by those workloads. These are the workloads available for failover from the primary to the disaster site. Choose the workload by clicking the calendar icon. Browse the recent snapshots and choose one to clone from, by clicking the clone icon (Figure 15).


Figure 15: Snapshots and Replication > Workloads page (failover)

Once you’ve selected the desired snapshot, enter a VM name and choose a target vSphere host. Click “Clone” to clone it and recover it to the destination site. The workload is now failed-over to continue serving data to your users. Just power it on in vCenter and you’re ready to go.


If at some point, the primary site comes back online, we support failing workloads back to their original location. This is done from the Snapshots and Replication page. On the workload that you’d like to failback (Figure 16), click the calendar icon to view the available snapshots, then click the red arrow to sync the snapshot to the original VM. Once the VM is powered on, your app will be back in the original location with all of the changed data from snaphots replicated from the remote site since the failure occurred; simple and easy just like it should be.


Figure 16: Snapshots and Replication > Workloads page (failback)

Well, that’s it for the initial implementation. As you can see, Coho SiteProtect is easy to get set-up and configured in any environment. Next, we’ll dive into some of the best practices of how to configure SiteProtect for optimal performance for environments of various sizes and requirements.

Until then, if you’d like more info about Coho SiteProtect, click here!

8,685 total views, no views today


Introducing Site-to-Site Replication with Coho SiteProtect


While our engineers have been hard at work preparing the bits for our site-to-site replication offering, I have been testing the technology in preparation for a slew of technical collateral on the feature. In addition to introducing Coho SiteProtect here on my blog, I want to share with you a quick overview of the architecture. You can find more on this feature at the Coho Data blog here and here. Stay tuned for more on this topic in future posts!

Replication is something I am extremely passionate about and I’m very happy to talk about it with whoever has interest. I’ve witnessed firsthand what having a solid DR plan can mean to a business, and I and many others rely on it to deliver data in any circumstances, both predictable and unplanned, to their customers today more than ever before.

Now, let’s dive into the architecture…

Coho’s SiteProtect replication implementation reflects the unique features of our patented scale-out system architecture. The two most notable elements of SiteProtect are dynamic data replication and lightweight snapshots.

Dynamic Data Replication

For Coho, replication is a core architectural pillar that not only replaces technologies like RAID for data protection, but also is used in scaling out the capacity of your cluster when you add nodes and for data re-balancing across those nodes in times of congestion. Additionally,we use replication when decommissioning nodes or during a failure of a node to rebuild a replica of data on the surviving nodes. Because we replicate objects in the Coho Bare Metal Object Store, we can do this virtually at the block level as new files are created or as old files are modified. We keep the data synchronously updated so that the workloads never skip a beat.

For data availability in the event of a disaster, we have  extended this functionality to other clusters at  remote sites. Because distance typically introduces latency and bandwidth challenges, we shift to an asynchronous approach for remote replicas. This prevents the performance issues you may see when the primary workloads are competing with synchronous replication traffic, not to mention saturating your network links.

Lightweight Snapshots

Our snapshot implementation leverages copy-on-write clones of the original VMs. That means, storage capacity consumed is proportional to the amount of data changed since the previous snapshot was taken. The DataStream replicates snapshots at regular, user-selected, intervals, so each subsequent data transfer only replicates the changes since the previous one. Add to this the fact that we compress the data over the wire and you’ll see significant reduction of bandwidth usage.

The real-world benefit is alignment to application Recovery Point Objective (RPO) needs. It can be as frequent as a few minutes to days or weeks. Coho SiteProtect does not force you into one size fits all.

Failover & Failback

To recover workloads, you simply clone the replicated copy into vCenter at the remote site. It will immediately inherit the original snapshot/replication schedule, providing the ability to failback when the original site comes back online. This provides a Recovery Time Objective (RTO) in the order of seconds for your critical workloads. If the workload already exists in vCenter, we will simply update the storage configuration to reflect the latest replicated snapshot. If you want to run on an older snapshot you can do that as well.

DR Testing

Finally, while a good disaster recovery plan is important, testing replicated data isn’t always easy. Replicated Snapshots are immutable and a simple clone of a snapshot can be used for DR testing. The clone can safely be discarded after DR testing has completed.

Key Benefits

  • Asynchronous, snapshot-based – provides fast recovery
  • Active/Active sites – delivers efficiency
  • Granularity at the virtual machine – provides control
  • SSL data transport – ensures security of your data
  • Replicate only changed data – bandwidth efficient
  • Compression – Reduced bandwidth usage


For more information on Coho SiteProtect, click here!

11,627 total views, 3 views today


File Sharing and Sync Platforms – Part I

Now that my presentation from the NetApp Insight conference is completed, I would like to share our plan going forward for integration with VMware’s Horizon Suite, specifically with regard to Horizon Data (formerly Project Octopus). 

The session that Cedric Courteix and I did was a joint vision for integrations with Citrix ShareFile as well as VMware Horizon Data, but I will be concentrating on the VMware solution here, as that is my focus. 

It shouldn’t be a suprise that many of the integrations between VMware and NetApp begin with some very unique backup and recovery capabilities. For the Horizon Data solution that integration is in the form of our Snap Creator plug-in for Horizon Data. This plugin will provide application-consistent backups of all the components of the Horizon Data appliance, protecting both the OS and the data from potential data loss and providing disaster recovery capabilities, should the unforeseen happen.

We showcased the capabilities of our Snap Creator framework in this year’s partner HoL at VMworld, where we used it to perform backup & recovery for VMware vCloud Director. Snap Creator is a very unique backup and recovery solution, due to the separation that exists between the backup logic and the application logic. This allows us to create plugins for a wide variety of applications without changing the core backup code.

For more specific information on Snap Creator, visit: www.snapcreator.com

I’ll share some more details of the solution after the holiday break. Thanks for reading!

5,084 total views, no views today


Japan Earthquake Aftermath – Revisited

Today I am feeling a bit of the same emotion that I felt in the months after the great earthquake that my family and I, along with the people of Japan, experienced just over a year ago. The reason for this is something that I saw posted when I woke up this morning regarding how SoftBank, with the help of NetApp, helped Japan and its citizens in the recovery effort. See video

My experience, which I talked about here, mirrors some of the same things referred to by SoftBank. Imagine not being able to travel to the office due to transportation being completely shut down. Or rolling blackouts meaning that throughout the day, if the trains happened to be running, not knowing whether they would still be running once it was time to return home. Or not having heat or cooling in the office to be able to work comfortably. Or food at the convenience store or any other number of basic necessities being available. Or gas to drive your car…

Even though cloud was not intended for the purpose of “working from home”, as with any disruptive and innovative technology, it is interesting to see what technology is capable of in times of need or times of urgency or even new ways to use a technology that were never dreamed of before. Think about the pure energy-saving aspects of virtualization and cloud and expand that to energy savings from working remotely and you can clearly see what’s possible.

I do remember hearing stories of how SoftBank was giving away its services and donating money to help in the recovery. I commend them for their efforts. I can say that I am even more proud that they happened to do it using NetApp technologies.

As you may know from reading my blog entry from a year ago… I had the experience, as a customer, of using technologies from NetApp and VMware to failover critical services in our infrastructure during the disaster. This only scratched the service of what transformative technologies can accomplish, as evidenced by SoftBank’s success. Look to the past and present and you will see further evidence of how technology saves lives. Look to the future and you can see the potential and the promise of cloud computing. Exciting times indeed!


Executive Summary (PDF)

Technical Case Study (PDF)

The story behind SoftBank’s Epic Story

Dave Hitz’s blog

Val Bercovici’s blog

5,388 total views, no views today


VMworld Follow-up

Now that the dust has settled on my trip to VMworld Las Vegas, as well as my big move back to the states, I wanted to take a few minutes for a brief update.

This year’s VMworld was a bit strange for me, being “in-limbo” so to speak, between my old job and new job at NetApp as well as the customer panel session that I took part in.

I had originally scheduled to attend many breakout sessions and go through the hands-on labs, however, I found myself spending a lot of time talking with people about my new position, about my experiences in Japan after the earthquake and just meeting and talking with interesting people and vendors with new/interesting products.

For me the excitement was building up until Wed. morning, when I took part in my customer panel session on real-world use of VMware SRM. I will say, I was indeed a bit nervous at first. I think the session moderator was a bit worried that I wouldn’t show up, as I arrived about 5m before the start of the session. After he did an intro of the new features of SRM v5, I took the stage and talked about our experience. I placed special emphasis on the fact that SRM allowed us to do the failover automatically so that we could focus on whether our families were safe (something that you can’t put a price on!).

After the session, I was plesantly surprised to talk with a customer in California looking to implement SRM and looking for the proof of concept to make it happen. If that one customer ends up implementing based on my experience, then I will have done my job. Very cool!

I was also interviewed by Richard Garsthagen for VMworldTV and talked briefly about our experience.

Also on Wed, I was fortunate enough to get invited to the very exclusive “vCTO Party” hosted by Dr. Stephen Herrod, CTO of VMware. The food and drinks were awesome, but even more awesome was the unrestricted access we had to senior members of R&D as well as a majority of the VMware executives. I had a chance to tell our SRM story to Stephen, Paul Maritz and Tod Nielsen. I could tell by their responses that they had heard bits and pieces of the story already, which was amazing and cool.

Things started to wind down on Thu. and I made the trip back to Tokyo to prepare for our trip. We finally arrived to NC on 9/14 and I started my position at NetApp yesterday. Very much liking the present and looking forward to the future!

5,950 total views, no views today


Powered by WordPress. Designed by WooThemes