Microsoft ‘deeply’ apologizes for global Azure, Teams disruption

Microsoft on Tuesday apologized for a global outage affecting Azure cloud services, including Microsoft Teams, Office 365 and Dynamics 365.

“We understand how incredibly impactful and unacceptable it is and apologize,” Microsoft said in a review report following the outage incident, which was the result of “verification errors” in several Microsoft cloud services. “We are constantly taking steps to improve the Microsoft Azure platform and our processes to ensure that such incidents do not occur in the future.”

In the report, Microsoft refers to changes made after an interruption of September 28, 2020 that affected Microsoft 365 users for five hours.

“In the September incident, we indicated that we should apply additional protection to the Azure AD (Active Directory) Service Backend SDP (Session Description Protocol) system to prevent the class of issues identified here.”

Microsoft said the first phase of SDP changes had been completed, and the second phase was in a “very carefully executed deployment” that would be completed by the middle of the year.

“The initial analysis does indicate that once fully deployed, it will prevent the type of disruption that occurred today, as well as the related incident in September 2020,” Microsoft said. “Meanwhile, additional precautions have been added to our key removal process, which will remain until the second phase of SDP implementation is completed.”

Microsoft said Tuesday morning that the “majority of services” affected by the global outage of Azure and Teams are back online, except for Intune and Microsoft Managed Desktop.

The latest update on the outage came in a 6:34 Tweet from the Microsoft 365 status account.

Microsoft’s apology came after a global outage on Monday that affected the Teams collaboration program, as well as other Azure, Office 365 and Dynamics 365 services.

The issues – which were announced by Microsoft on Twitter on Twitter at 15:40 on Monday – could affect any user “worldwide”, the company said at the time.

Even with the outage, some business executives are calling on MSPs to move customers faster to the cloud following the March 2 attack on Exchange Server by Chinese state-sponsored hackers.

This attack affected only local versions of Exchange Server and not Exchange Online or the cloud-based e-mail service of Office 365. Some 30,000 U.S. organizations and 60,000 organizations worldwide have been robbed of the e-mails because of the violation, because it still offered local versions of Exchange.

Last week, Microsoft warned customers about the breaches of DearCry Ransomware as a result of the Exchange local server attack. On March 12, it warned that “people attacked by ransomware are using the vulnerabilities of Microsoft Exchange to exploit customers.”

Emmet Tydings, president of Columbia, M&B Telecom, which provides voice and data and failover stability for MSPs on the Internet, said it was critical that partners move customers to the cloud to address serious security issues such as those with the Chinese attack accompanied, to avoid. on Exchange servers on the premises.

“MSPs need to move their customers faster to the cloud, and they also need to stabilize their communications infrastructure with diversity in their circuits and failover,” Tydings said. “Microsoft has emphasized that they are better able to provide security in the cloud than with local Exchange.”

Tidings said partners need to provide robust internet connections with SD-WAN and wireless failover with service plans via a SIM module and a cable backup to a primary fiber line.

In the event of an interruption such as Microsoft Teams, MSPs should use alternative communications infrastructure such as Zoom or Cisco Webex, he said.

With the global pandemic leading to more dispersed staff, according to Tydings, local trade is no longer meaningful to customers.

“The MSPs we work with have been heroes in transforming their clients from premium to cloud since the pandemic,” he said.

The rapid migration to the cloud has led companies to invest in making software products faster, but they are not investing in making cloud services more resilient, said Ofer Smadari, co-founder and CEO of Portland, StackPulse, in Ore. whose reliability is. platform helps teams detect, respond to and correct incidents with automated code.

“The results seem to appear in the headlines every week as major brands have site breaks,” Smadari said. ‘Most companies still use traditional IT tools such as ticket systems, service management tools or communication apps to share information and work together to restore service. Companies need to change from an IT management mindset to an engineering one where they incorporate resilience into their applications and their business operations to take a more risk-conscious approach. Only then can they quickly recover from their interruptions and keep their promise to their customers. ‘

.Source