Talk 2 in the SSUG::Digital series looks at how to build a stretched cluster. What are the best practices? What pitfalls are there? Why would you consider a stretched cluster built with Spectrum Scale, as opposed to one of the alternative approaches to high availability? How do stretched clusters work, and what considerations go into planning a successful stretched cluster? We will examine the theory behind Spectrum Scale stretched clusters, review some best practices for designing stretched clusters, and talk about a few cases where stretched clusters have been successfully deployed.
Q&A
Q: For DR use case where ClusterA and ClusterB are the 2 separate data centres (DC A and DCB), do I need my Tiebreaker Quorum node installed in Data Centre C?
A: (This is covered in the presentation). It is recommended to have the tiebreaker quorum node at a third site but it could be in one of the sites with the caveat that if that site goes down the second site will not be able to stay up.
Q: The documentation shows a high speed shared storage is needed…does it mean that san fabric should be merged over ISL for volume allocation across site?
A: When using Spectrum Scale replication for stretch clusters there is no need to for the SAN to be extended across the sites. The stretched cluster architecture described in the presentation works even when underlying storage does not replicate the data across sites.
Q: Will there be any performance difference between extended SAN and accessing NSD over network using their owner?
A: Well aside from the protocol difference block vs file, it depends on the type of connectivity you have to SAN vs network. Spectrum Scale has been placing more resiliency in recent releases for what to do for network behaviour (eg recently proactiveReconnet feature was added to Spectrum Scale).
Q: Does 10ms latency required between SiteA, SiteB and also Tiebreaker quorum node? Can my tiebreaker quorum node have higher latency?
A: Yes, the third site can have a higher latency but it should still be “within reason”. So maybe double that number ie 20ms. It is recommended to keep it under a second.
Q: Is tiebreaker node hosted on AWS or any other Cloud Providers a supported configuration?
A: Yes, we have customers who using a public cloud for their third site.
Q: What is the RPO and RTO?
A: Remember that this is synchronous replication. So as long as you don’t run out of space on your storage there is 0 RPO. RTO answer depends on your workload and infrastructure. It depends on the rate of change of data change, your storage and the WAN.
Q: How to check/measure the rate of data change?
A: This really depends on the application and the rate of data change by the application. If you already have implemented Spectrum Scale, you can use the historical data from performance monitoring within Spectrum Scale to estimate the rate of data change.
Q: Do you have any general tips/recommendations regarding CES in a stretched cluster?
A: The CES nodes in your cluster need to be split between the two sites as they are still part of single cluster. SMB performs its own locking with ctdb component. Thus the latency between the CES nodes needs to be fairly small value. Also be aware if you have different address spaces on two sites, there may not be an automated failover of services and you may need to manually perform the same.
Continue reading “SSUG::Digital: 002 – Best Practices for building a stretched cluster”