Monday, June 2, 2008

The Planet - H1 Phase 2 Servers Up and Running

More and more server owners at phase 2 H1 are up and running.

That includes my server as well. However, site appears to be running at a slower speed. Might have to wait for the Nameserver new ip to be propogated completely.

The Planet - H1 Phase 2 Racks Power On In Progress

My server will be up anytime from now, hopefully latest by 23:59 2nd June 2008 Singapore time.

Today, 01:55 AM

Following the restoration of power to the second floor of the data center, we've cooled the data center floor and are now in the process of systematically restoring power to racks.

We've got a full staff in the data center to power up racks in sections and verify that the server hardware starts up successfully. This process may take a few hours to restore service to all customer servers on the second floor.

Kevin Hazard

The Planet - H1 Phase 2 Power Up In Progress

Tomy Durden explains why it takes time to power up.


Before they start powering servers up, some testing has to happen and the temps have to stabilize. Cooling has to be done in a controlled manner, otherwise we run into possible condensation issues.

Generally, when powering servers up, it has to be done in steps. The amperage peaks during the boot process, so we don't want to do it all at once or we risk taking a large part or all of the phase back offline.

I know you guys are frustrated, as are we, and I appreciate the patience. Hopefully, we, you guys, and the industry can learn from this incident.

Tomy Durden
Data Center Manager -- Dllstx2 / Dllstx6

The Planet - H1 Data Center Updates 12:55am

Singapore time 13:55pm. Power On.

Today, 12:55 AM
After the fire marshall inspected the H1 location, we were given the green light to bring power back to the facility. The generators have been turned on, and we are receiving power on the second floor. The generator power restoration is the first step in the full restoration of service to the data center.

From here, we will begin the process of cooling the DC floor, which could take a few hours. As soon as the power integrity is confirmed and the DC floor is ready for operation, we will be restoring power and checking server hardware on a rack-by-rack basis.

Kevin Hazard

The Planet - My Server at Phase 2 H1 Data Center

According to The Planet staff,

Hardware Object's Upstream Connection:dq23a.02.hstntx1 (66.98.204.246) Port: e12

means switch: dq23a, phase: 02, data center: hstntx1

That means my server is at phase 2 of H1 Data Center, one of the 6,000 servers.

Hopefully it will be up and running within the next 12 hours. (23:59 2nd June 2008 Singapore time)

The Planet - H1 Data Center Updates 11:00pm

June 1 – 11:00pm

As previously committed, I would like to provide an update on where we stand following yesterday's explosion in our H1 data center. First, I would like to extend my sincere thanks for your patience during the past 28 hours. We are acutely aware that uptime is critical to your business, and you have my personal commitment that The Planet team will continue to work around the clock to restore your service.

As you have read, we have begun receiving some of the equipment required to start repairs. While no customer servers have been damaged or lost, we have new information that damage to our H1 data center is worse than initially expected. Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed.

There is some good news, however. We have found a way to get power to Phase 2 (upstairs, second floor) of the data center and to restore network connectivity. We will be powering up the air conditioning system and other necessary equipment within the next few hours. Once these systems are tested, we will begin bringing the 6,000 servers online. It will take four to five hours to get them all running.

We have brought in additional support from Dallas to have more hands and eyes on site to help with any servers that may experience problems. The call center has also brought in double staff to handle the increase in tickets we're expecting. Hopefully by sunrise tomorrow Phase 2 will be well on its way to full production.

Let me next address Phase 1 (first floor) of the data center and the affected 3,000 servers. The news is not as good, and we were not as lucky. The damage there was far more extensive, and we have a bigger challenge that will require a two-step process. For the first step, we have designed a temporary method that we believe will bring power back to those servers sometime tomorrow evening, but the solution will be temporary. We will use a generator to supply power through next weekend when the necessary gear will be delivered to permanently restore normal utility power and our battery backup system. During the upcoming week, we will be working with those customers to resolve issues.

We know this may not be a satisfactory solution for you and your business but at this time, it is the best we can do.

We understand that you will be due service credits based on our Service Level Agreement. We will proactively begin providing those following the restoration of service, which is our number priority, so please bear with us until this has been completed.

I recognize that this is not all good news. I can only assure you we will continue to utilize every means possible to fully restore service.

I plan to have an audio update tomorrow evening.

Until then,

Douglas J. Erwin
Chairman & Chief Executive Officer

The Planet - ServerCommand is back Online

ServerCommand is back online, but I don't use servercommand any more. My server is still down. It's down for more than 28 hours.

Today, 07:47 PM

We continue to work to restore power to the data center and bring all affected customer servers online.

Currently, ServerCommand is back online.

https://www.servercommand.net

-Kevin Hazard

Sunday, June 1, 2008

The Planet - H1 Data Center Outage Update - 08:54 AM

Another update at 08:54am (Singapore 09:54pm)

Today, 08:54 AM

Hello,

The team here at The Planet continues to work through the various issues that we continue to encounter. We are still making progress on the previous items that I mentioned in my last post. DNS infrastructure has been migrated to another data center and propagation has begun. We are working through some database issues with ServerCommand and fully expect those to be resolved within the next hour.

I’d also like to address the idea of migrating from one data center to another. During the early stages of the H1 data center we opportunistically relocated some customers to another data center. However, due to network and data center (power/cooling) constraints, this option is no longer available and requests for migration cannot be honored. Please rest assured that our teams are working diligently to return service to all affected customers.

At this time we do not have an Estimated Time to Repair at present; we should have a better estimate this morning. Our staff and management continue to work through the night and morning-- we will continue to provide hourly updates.

Todd Mitchell
General Manager -- Dedicated Hosting
The Planet Internet Services, Inc. / theplanet.com

The Planet - H1 Data Center Outage Update - 06:54am

Another message coming in:

Today, 06:54 AM

Morning,

We are continuing to work through various issues this morning. We will have additional contractors on-site this morning starting at approx. 7 AM. Some will hand-off from contractors who worked overnight and others will start the recovery/installation of new electrical gear to power the data center.

We are still working through the EV1 DNS and ServerCommand items. We are making progress on both items and expect to have both functional within the next 120 minutes.

In addition to the above, the network engineering group worked overnight to prepare the network for the recovery of H1. We expect the reconvergence of the network to go smooth once H1 comes back online.

We do not have an Estimated Time to Repair at present; we should have a better estimate this morning. Our staff and management continue to work through the night and morning-- we will continue to provide hourly updates.

Todd Mitchell
General Manager -- Dedicated Hosting
The Planet Internet Services, Inc. / theplanet.com

The Planet - First Lesson Learnt From H1 Incident

My first lesson learnt from The Planet H1 incident is

Never Put All Your Eggs (sites) in One Basket (server)!

The Planet - H1 Priority is to Get the Network Up

Another message:

06/01/2008 1:40 AM CST Update

As you know, we have vendors onsite at the H1 data center. With their help, we’ve created a list of equipment that will be required, and we’re already dealing with those manufacturers to find the gear. Since it’s Saturday night, we do have a few challenges.

We are prioritizing issues as follows:

1. Getting the network up at H1 is first and foremost. We’re pulling components from our five other data centers – including Dallas – which will be an all-night effort.
2. Getting power back to the data center is key, though it is too early to establish success there.
3. Because ServerCommand is in H1, our legacy EV1 customers are blinded about this incident. We are in the process of moving the ServerCommand servers to other Houston data centers so that we’re able to loop them into communications.
4. We absolutely intend to live up to our SLA agreements, and we will proactively credit accounts once we understand full outage times. Right now, getting customers back online is the most critical.

The Planet - H1 Data Center Outage

Just received this email from ThePlanet.com CEO.

Dear Valued Customers:

This evening at 4:55 in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room Thankfully, no one was injured. In addition, no customer servers were damaged or lost.

We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.

This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.

We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.

There is no impact in any of our other five data centers.

I am sorry that this accident has occurred and apologize for the impact.

Sincerely,

Douglas J. Erwin
Chairman & Chief Executive Officer

Server at ThePlanet.com Down

Just discovered that my server at theplanet.com went down.