NetApp Simulator Disk Fix

One of the great things about NetApp is that you can actually run fully functional Data OnTap Simulator of their storage appliances on any UNIX machine. It is a great way to test their product suite and demo up solutions before deploying to your production environment. I do have one minor gripe that is that you have to have a NOW account (NetApp’s Customer Portal) before you can download, and to get that you have to be a customer with registered product so evaluating before your first box arrives is impossible unless you can get you VAR to get you a copy.

Anyway, I’ve had a problem with the last couple of simulators that I have set up. When I create additional “disk” during the setup, they always come in as failed when actually start the simulator running. I first experienced this running as virtual machine on a VMWare server, and then on a actual hardware linux box. I was getting errors similar to the following:

Disk v4.28 Shelf ? Bay ? [NETAPP VD-1000MB-FZ-520 0042] S/N [66324112] has no valid labels. It will be taken out of service to prevent possible data loss.

I finally found a solution in the following comment on Scott Lowe’s blog entry about NetApp and ESX Server.

Once the simulator is running here are the commands that I use to get the disks back into a operational state:

>priv set diag
*>disk unfail -s v4.19

I repeat for each disk that I want to recover. When they are all unfailed, I follow that with a disk zero spares I’m not sure that is really needed but it guarantees that the disk are good to go. At this point I’m all set to use the disks however I want.

Like a Gunshot In the Server Room

I’m told that’s what it sounded like when the power supply in our NetApp FAS 3020 dramatically failed last week. One of my administrators got a message saying there was problem with one of NetApps when he was home during the holiday weekend. He immediately called NetApp tech support which had already opened a case for the unit (you have got to love AutoSupport). They couldn’t tell 100% which power supply it was because it apparently blew the circuit breaker in the power strip (and because the power supplies are bus powered so that even though one had died it still showed up as being “in” the system). So they sent one for the drive shelf, and my admin headed into the office.

When he arrived they determined that it probably as the power supply on the head unit and not the drive shelf. So to make sure they had him pull the head unit’s power supply, reset the circuit breaker on the power strip and then reinsert the power supply. At that point a huge spark came shooting out of the head unit’s power supply, and the circuit breaker blew in the rack’s surge suppressor, as well as the circuit breaker in the electrical panel. That’s what sounded like the gunshot. Supposedly the smell of burnt electronic was very strong as well. So they canceled the one power supply shipment and sent another one. It arrived in about 2 hours, and when swapped we were back to 100%.

But the great thing was that during the entire time the unit kept running and passing data without even one hiccup. Way to go NetApp.

Monitoring NetApp with Nagios and Nagiosgraph

With the installation of our new Network Appliance (NetApp) filers, I needed to be able to monitor them. Yes I know that they have an autosupport feature where they email you as well as NetApp whenever anything happens, but I still like to do my own monitoring.

The first thing that I did was check at the Nagios Exchange to see if they had any plugins for NetApps. They actually had two different plugins. The first worked and the second didn’t (if it offered a failed disk check). So I modified the first to add the additional feature, and becuase I knew I was going to be using Nagiosgraph, I corrected the performance data output for two of the checks to be compliant with the Nagios Plugin Development Guidelines.

Download my modified check_netapp Nagios plugin.

In order to graph the NetApp data with Nagiosgraph you will need to use my modified check_netapp plugin so please download and test before proceeding.

The following nagiosgraph map entries will allow you to graph both CPU Load and Disk Space Used per volume (by name):


Goodbye Dell, Hello NetApp (almost)

I’ve been writing about how Dell’s PowerVault 220s are junk for quite awhile now. We experienced our 3rd major crash for the year this week. We had everything restored and back in operation 23 hours later.

In the meantime we brought in a data storage consultant to analyze our storage infrastructure. They presented a detailed report about a month ago, and this past Monday the partners decided that it was time change the way we store data. The decision was made to no longer use Dell for our storage system (we will still keep using them for our application servers), and instead move to a Network Attached Storage appliance. In our case we’ll start by moving our headquarters office to NetApp FAS3020c enterprise storage application from Network Appliance (NetApp).

We are still working out the final details and pricing (that’s another story for another time), and hopefully will be placing an order within the next couple of weeks. Then the real fun begins as we get to install and configure the new system.