Implementing Site-to-Site Replication with Coho SiteProtect

Now that I’ve given you with a quick overview of the architecture of Coho SiteProtect, I’d like to provide you with the basics for implementing SiteProtect in your data center. This is the 2nd in my series of posts on our site-to-site replication offering. As I discover the best practices for deploying SiteProtect in various infrastructures and scenarios, I’ll document those here as well, so stay tuned for those…

Without further ado, here is the step-by-step set-up procedure for SiteProtect…

Pairing the Sites

The first step in setting up remote replication is establishing a trusted relationship from the local site to the remote site. This is done from the Settings > Replication page in the Coho web UI, indicated by the gear (settings) icon (Figure 1).

site_protect-implementing-fig1

Figure 1: Settings > Replication page

From here, click the “Begin replication setup” link which brings you to the configuration screen for the local site (Figure 2).

site_protect-implementing-fig2

Figure 2: Settings > Replication > Local Site page

Here, you’ll specify the network settings for the site to site communication. It is worth noting that the replication traffic is sent on a VLAN to simplify network management for enterprise environments.

Here you can also configure bandwidth throttling for outbound traffic in case you need to limit the usage of the site to site interconnect. The same can be done on the remote site which means that both incoming and outgoing throughput can be controlled. Bear in mind that by limiting the traffic, you may increase the time it takes for a workload to finish replicating, in other words, increase the RPO.

Once that’s complete, you’ll click “Next” and specify the IP and password of the remote DataStream. Click “Next” again to proceed (Figure 3).

site_protect-implementing-fig3

Figure 3: Settings > Replication > Remote Credentials page

Once the wizard confirms a connection to the other side, you’ll specify the remote system’s VLAN, replication IP address, and netmask, as well as the default gateway for the other side and click “Next” (Figure 4).

Note: On this page the bandwidth limit relates to outbound traffic from the remote site; or put another way, the inbound replication traffic arriving at the local site.

site_protect-implementing-fig4

Figure 4: Settings > Replication > Remote Network page

Finally, you’re brought to step 4, which is the “Summary” page and allows you to review the configuration before applying the settings. Click “Apply and Connect” to complete the wizard (Figure 5).

site_protect-implementing-fig5

Figure 5: Settings > Replication > Summary page

From this point forward, you’ll be presented with the following view when you go to the Settings > Replication page. You can see here (Figure 6), the IP of the remote node and that replication is active.

site_protect-implementing-fig6

Figure 6: Settings > Replication page (completed)

Configuring Workloads and Schedules

Now that the initial pairing is complete, you’ll visit the “Snapshots and Replication” page to customize which workloads are replicated as well as the snapshot & replication interval for each (Figure 7).

site_protect-implementing-fig7

Figure 7: Snapshots / Replication > Overview page

Here (Figure 7), we provide an overview of the workloads. This is a dashboard which tells us the number of VMs with snapshots as well as replicated snapshots. For all of a site’s workloads to be protected, they should all have replicated snapshots, ensuring that any of those workloads can be recovered on the remote site in the event of a disaster.

We also provide a summary of the workloads covered by replication, how many bytes have been transferred as well as the average replication time. These statistics provide the assurance that replication is functional, and also the rate of change of the data, allowing you to determine if your replication interval is appropriate for the bandwidth you have available. If your average replication time is greater than your snapshot schedule, you can modify it accordingly.

To configure or modify workloads, proceed to the “Workloads” page (Figure 8).

site_protect-implementing-fig8

Figure 8: Snapshots and Replication > Workloads page

Here (Figure 8), we denote the local vs. the remote workloads, provide a record of when the last snapshot was taken, and display the assigned schedule.

Note: VMs which have been deleted are denoted with a strike through the name.

Under “Snapshot Record”, you can click on the calendar icon to view snapshot date, name and description, as well as the status of replication. In this example, we have recently enabled the workload for replication denoted by the word “Scheduled” (Figure 9).

site_protect-implementing-fig9

Figure 9: Snapshots and Replication > Workloads > Snapshot Record page

To manually protect a specific workload, click the camera icon next to that workload. This will allow you to take a manual snapshot and replicate that snapshot (Figure 10).

site_protect-implementing-fig10

Figure 10: Snapshots and Replication > Workloads > Snapshot page

Most users will want to protect a number of VMs at once. The best way to do this is from the “Default Schedule” page (Figure 11).

site_protect-implementing-fig10

Figure 11: Snapshots and Replication > Default Schedule page

In this example we have selected a RPO of 15 minutes by replicating the snapshot every 15 minutes. The frequency of snapshots is best determined by the needs of the application and the automated snapshot schedule for Coho offers flexibility, from minutes to months.

Note: Quiescing snapshots puts the system in a state that maintains application consistency before taking the snapshot, however this is only available in the daily and weekly schedule. Taking quiesced snapshots more frequently may cause significant performance penalties. These performance penalties are not related to the Coho storage but to how snapshots are executed within the VMware environment. A crash consistent snapshot (no quiesce) can be done very frequently on the Coho storage without performance penalty.

Failover

In the event of a disaster you’ll want to be be able to bring up your applications in the remote site. This is done from the “Failover/Failback” view (Figure 12).

site_protect-implementing-fig12

Figure 12: Snapshots and Replication > Failover/Failback page

Initially, failover and failback are disabled in order to protect you from instantiating multiple copies of the same VM. You make the decision (from either location) to put the disaster recovery plan in-motion. If you’re ready to proceed, click the “Enable” button to enable failover (Figure 13).

site_protect-implementing-fig13

Figure 13: Snapshots and Replication > Failover/Failback page (enabled)

You can now go to the remote DataStream and clone your replicated workloads to the remote system. Open up the web UI of the remote DataStream and, again, go to the Snapshots and Replication > Workloads page (Figure 14).

site_protect-implementing-fig14

Figure 14: Snapshots and Replication > Workloads page (remote)

Click the “Remote Workloads” checkbox to filter by those workloads. These are the workloads available for failover from the primary to the disaster site. Choose the workload by clicking the calendar icon. Browse the recent snapshots and choose one to clone from, by clicking the clone icon (Figure 15).

site_protect-implementing-fig15

Figure 15: Snapshots and Replication > Workloads page (failover)

Once you’ve selected the desired snapshot, enter a VM name and choose a target vSphere host. Click “Clone” to clone it and recover it to the destination site. The workload is now failed-over to continue serving data to your users. Just power it on in vCenter and you’re ready to go.

Failback

If at some point, the primary site comes back online, we support failing workloads back to their original location. This is done from the Snapshots and Replication page. On the workload that you’d like to failback (Figure 16), click the calendar icon to view the available snapshots, then click the red arrow to sync the snapshot to the original VM. Once the VM is powered on, your app will be back in the original location with all of the changed data from snaphots replicated from the remote site since the failure occurred; simple and easy just like it should be.

site_protect-implementing-fig16

Figure 16: Snapshots and Replication > Workloads page (failback)

Well, that’s it for the initial implementation. As you can see, Coho SiteProtect is easy to get set-up and configured in any environment. Next, we’ll dive into some of the best practices of how to configure SiteProtect for optimal performance for environments of various sizes and requirements.

Until then, if you’d like more info about Coho SiteProtect, click here!

262 total views, 15 views today

0

Introducing Site-to-Site Replication with Coho SiteProtect

site_protect-logical

While our engineers have been hard at work preparing the bits for our site-to-site replication offering, I have been testing the technology in preparation for a slew of technical collateral on the feature. In addition to introducing Coho SiteProtect here on my blog, I want to share with you a quick overview of the architecture. You can find more on this feature at the Coho Data blog here and here. Stay tuned for more on this topic in future posts!

Replication is something I am extremely passionate about and I’m very happy to talk about it with whoever has interest. I’ve witnessed firsthand what having a solid DR plan can mean to a business, and I and many others rely on it to deliver data in any circumstances, both predictable and unplanned, to their customers today more than ever before.

Now, let’s dive into the architecture…

Coho’s SiteProtect replication implementation reflects the unique features of our patented scale-out system architecture. The two most notable elements of SiteProtect are dynamic data replication and lightweight snapshots.

Dynamic Data Replication

For Coho, replication is a core architectural pillar that not only replaces technologies like RAID for data protection, but also is used in scaling out the capacity of your cluster when you add nodes and for data re-balancing across those nodes in times of congestion. Additionally,we use replication when decommissioning nodes or during a failure of a node to rebuild a replica of data on the surviving nodes. Because we replicate objects in the Coho Bare Metal Object Store, we can do this virtually at the block level as new files are created or as old files are modified. We keep the data synchronously updated so that the workloads never skip a beat.

For data availability in the event of a disaster, we have  extended this functionality to other clusters at  remote sites. Because distance typically introduces latency and bandwidth challenges, we shift to an asynchronous approach for remote replicas. This prevents the performance issues you may see when the primary workloads are competing with synchronous replication traffic, not to mention saturating your network links.

Lightweight Snapshots

Our snapshot implementation leverages copy-on-write clones of the original VMs. That means, storage capacity consumed is proportional to the amount of data changed since the previous snapshot was taken. The DataStream replicates snapshots at regular, user-selected, intervals, so each subsequent data transfer only replicates the changes since the previous one. Add to this the fact that we compress the data over the wire and you’ll see significant reduction of bandwidth usage.

The real-world benefit is alignment to application Recovery Point Objective (RPO) needs. It can be as frequent as a few minutes to days or weeks. Coho SiteProtect does not force you into one size fits all.

Failover & Failback

To recover workloads, you simply clone the replicated copy into vCenter at the remote site. It will immediately inherit the original snapshot/replication schedule, providing the ability to failback when the original site comes back online. This provides a Recovery Time Objective (RTO) in the order of seconds for your critical workloads. If the workload already exists in vCenter, we will simply update the storage configuration to reflect the latest replicated snapshot. If you want to run on an older snapshot you can do that as well.

DR Testing

Finally, while a good disaster recovery plan is important, testing replicated data isn’t always easy. Replicated Snapshots are immutable and a simple clone of a snapshot can be used for DR testing. The clone can safely be discarded after DR testing has completed.

Key Benefits

  • Asynchronous, snapshot-based – provides fast recovery
  • Active/Active sites – delivers efficiency
  • Granularity at the virtual machine – provides control
  • SSL data transport – ensures security of your data
  • Replicate only changed data – bandwidth efficient
  • Compression – Reduced bandwidth usage

 

For more information on Coho SiteProtect, click here!

306 total views, 10 views today

0

Storage Field Day 6: Videos

I really enjoyed the Storage Field Day 6 crew taking the time to drop by Coho‘s office last week. It’s always a pleasure to see Stephen, Tom, and the delegates geek out about storage. I also got to hang with my fellow vExpert, Forbes Guthrie. I always enjoy the chance to see Andy Warfield, our founder and CTO, present. It’s a great opportunity for me (and the audience) to be educated on modern trends in storage and networking.

Without further ado, here are the embeds of the videos from last week:

815 total views, 5 views today

0

Coho Site-to-Site Replication

coho_logo

Anyone that knows me well, knows that I have a unique history with Business Continuity and Disaster Recovery software as well as its practical use in business. In case you didn’t know, I used VMware SRM along with NetApp to recover from the 2011 Tōhoku earthquake and tsunami in Japan. In areas with natural disasters, it is becoming increasingly imperative but now also relatively common for companies to require or even demand their storage products support remote replication. Companies will use it either for a simple reason, such as shipping backups off-site or the more complex use case of disaster recovery, or both. It’s extremely convenient and easy to use when implemented properly.

That said, a common ask of us from customers here at Coho has been some form of site-to-site replication. We definitely didn’t design this as a “me too” feature. The technology has been built into the product since day one. We use synchronous replication instead of technologies like RAID to store redundant copies of data across the independent backend storage nodes. We leverage an asynchronous version of this for the remote replication feature. There was a lot of thought that went into the other components of our implementation to make it easy to use and enterprise fully-featured from the get-go.

Here are some of the key features of our replication release:

  • Asynchronous, periodic, snapshot-based replication
  • Active – Active site support
  • Virtual Machine granularity
  • Encryption
  • Compression
  • Bandwidth throttling
  • Simple UI with one-time setup and very easy configuration
  • Flexible replication schedule

* I’ll also add to this list that we’ll be introducing support for VMware’s SRM via a SRA (Storage Replication Adapter) in the very near future, so be on the lookout for information here and elsewhere on that.

If you’d like to read more details about the release, head over to the official Coho blog and read Doug Fallstrom’s (Sr. Director of Product Management) post on this topic.

Speaking from experience, site-to-site replication gives us another must-have enterprise grade feature, further solidifying Coho’s place at the cutting edge of new storage technologies… and this is only just the beginning!

891 total views, no views today

0

Coho Data at Storage Field Day 6!

sfd_logo

I am pleased to announce that Coho Data is participating in Storage Field Day 6 this Thursday morning 11/6 (from 8-10AM PST). Tech Field Day has special meaning for me, as it harkens back to the beginnings of my involvement in the greater technical influencer community. Leading up to my attendance at my first VMworld (2010), where I first met Stephen, and John Troyer, I started blogging and helped jump start what has become the Tokyo VMware User Group. Shortly following that, in early 2011 I got invited to attend Tech Field Day 5 in Silicon Valley as a delegate. I also attended Tech Field Day 6 in Boston. That was the 1st unofficial “virtualization” focused event which was a lot of fun. I have met many people and made good friends with several individuals from the Tech Field Day community, among others. That experience also helped open up the opportunity to present at VMworld regarding my experience during the 2011 Great Tohoku Earthquake and my 1st technical marketing job at NetApp (and my current position, for that matter).

I owe a lot to Stephen for the opportunity and am very much looking forward to helping Coho to host the delegates for the event later this week. I learned much of what I knew (at the time; before I joined the company) about Coho from the various events that they have done in the past, and I look forward to helping educate more and more influencers on what Coho has to offer with our market-differentiating, high- performance, web-scale storage solutions. I look forward to seeing you there as well!

741 total views, no views today

0

Powered by WordPress. Designed by Woo Themes