Perhaps those of you who use DFS can help.
I have a pretty typical setup with a cluster at my prod site and a cluster at my DR site, with snapmirror protecting my CIFS. My hardware is AFF8080s running CDOT 9.
I have a DNS A record for my production CIFS SVM: Let's call it
uscifsProd1.companyname.net
and a DNS A record for my DR SVM. Let's call it
uscifsDR1.companyname.net
I also have a CNAME record pointing to the production CIFS SVM:
usfs1.companyname.net >uscifsProd1.mathematica.net
I use this name in my DFS namespace, so a typical target looks like:
\\usfs1.companyname.net\Project1\NYC
During a failover, I will "float" the CNAME record over to my DR CIFS SVM:
usfs1.companyname.net >uscifsDR1.mathematica.net
I also change the Service Principal Names for the CIFS service:
Old:
setspn.exe -D HOST/usfs1.companyname.net USCIFSProd1
setspn.exe -D HOST/usfs1 USCIFSProd1
New:
setspn.exe -A HOST/usfs1.companyname.net USCIFSDR1
setspn.exe -A HOST/usfs1 USCIFSDR1
I then force a replication in AD. Once the DNS change propagates, clients should be able to access the CIFS shares at the DR site.
Key facts:
The CNAME record is updated and responds correctly to pings.
The workstation can access the share if I browse directly via
\\usfs1.companyname.net\Project1\NYC
or
\\uscifsDR1.companyname.net\Project1\NYC
However, when I browse to the network locations via the drive letter assigned to the namespace, i.e.
N:\Project1\NYC
or the UNC which uses the namespace, i.e.
\\companyname.net\NDrive\ProjectVol\Project1\NYC
I receive an error: The network path cannot be found.
A wireshark trace reveals a Kerberos mismatch. So it's not a network issue; it's that Kerberos is failing.
I have tried using KLIST to purge every ticket I can think of, including those of the network service, and the local system account. I have also purged the DFS caches using DfsUtil. All to no avail.
I do know that the client gets its DFS info through the Workstation Service. And, restarting the Workstation service (or rebooting the client) clears the issue.
So, question one:
1) Is there a way to remedy the issue without rebooting the clients (~1500) or restarting the Workstation service?
2) If not, is there another/better way to engineer the failover? I am NOT willing to move my CIFS service to a Windows environment as many have suggested, for many reasons.
I've considered instead modifying the links in the namespace directly via a script but this would obviously not be preferred as I'd much rather change one CNAME record than 2000 DFS target links.
Thanks.