" /> The Japan Page: January 2008 Archives

« October 2007 | Main | February 2008 »

January 22, 2008

Performance notes on the LSI 1068 SATA/SAS Controller

I recently bought an ASUS P5M2/SAS motherboard to set up a whitebox ESX server capable of hosting 64 bit VM's. I chose this mobo because it has a pretty strong following of home ESX users, and the embedded LSI 1068 controller is supported by ESX.

I would have liked to do a detailed performance comparison to this whitebox server vs. my loaner HP DL380 G3 server, but I don't really have the time for that, so some general performance notes will have to suffice. I borrowed the HP server from my boss at my current employer (whom mostly uses those 4-letter-word-company servers), and I needed to give it back in a timely manner.



First, to assist in comparing the hardware:

ESX whitebox server

  • ESX server 3.5.0.64607 eval version
  • Asus P5M2/SAS motherboard with Intel 3000 chipset, Intel ICH7R
  • Xeon 3060, 2.4 GHz dual core, 65W, VT, LGA775 CPU
  • 8 GB (4 x 2GB) Corsair DDR2-5300 RAM, ECC unbuffered
  • 4 * Western Digital 300 GB SATA HDD

Compaq DL380 G3 server

  • ESX server 3.5.0.64607 eval version
  • 6 x 146 GB 10k U320 SCSI HDD in RAID5 with 1 hotspare, ~580 GB formatted storage
  • Dual Xeon 3.1 GHZ CPU
  • 8 GB RAM, ECC PC2100 DDR

Miscellaneous Setup Notes:

  • The Asus P5M2 mobo includes both the embedded LSI SATA/SAS controller, and the Intel ICH7R South Bridge which contains the Intel Matrix RAID. Initial setup for ESX might be a little confusing if you've never done RAID before, especially with two separate RAID controllers in the sam system. The secret to this mobo is remember, CTRL + C is for the LSI 1068 SATA/SAS and CTRL + I is for the Intel Matrix.
  • It seems the Intel Matrix BIOS prompt only pops up on boot when it's configured, so just hit CTRL + I shortly after the LSI CTRL + C prompt goes away.
  • Use the enclosed Mini-SAS to SATA breakaway cables for your ESX array. The Mini-SAS connectors are on the mobo, below the SATA connectors. In the BIOS you have to set the IDE drives into [RAID] mode.
  • Do not split your array across the two Mini-SAS connectors. I tried this because old-skool SCSI dictates spreading your drives across all available SCSI buses. This is very bad in embedded RAID-land and almost cuts in half your performance.
  • The Intel Matrix RAID only supports Windows operating systems, there is no Linux or ESX support.
  • The Matrix RAID is basically software RAID, like those $50 RAID cards you see floating out there. Well, that might be a little harsh... don't get me wrong, you'll get "hardware assisted software RAID performance" out of this solution, not to mention higher levels of data protection when configured correctly.
  • The LSI 1068 raid is slightly better, but this is still not Enterprise class. It will work fine for home testing. The LSI solution offers an upgrade to standard hardware RAID in the form of a PCI-X card, AKA the LSI 8300XLP zero channel card for $330 street price. Battery backup is still extra.


Lessons learned:

  • As mentioned above, never NEVER split a RAID array across the two SAS/SATA ports. Very bad disk I/O performance will be your reward for this behaviour.
  • I configured by 4 x 300 GB SATA WD HDD's in a RAID 10 array, referred to in LSI 1068 speak as an IME array. Theoretically you should be able to lose up to two drives and not have data loss, but I found that pulling any one drive resulted in a "degraded" array status (nooooo!) and pulling any two drives resulted in a "failed" array status. This is where you start wishing for a real RAID card.
  • Don't expect anything nice like a loud beeping sound when your first disk fails. You'll get a small warning during boot that flashes by in no time, but you'll know for sure when you lose another drive... <evil laugh>
  • The configuration for the LSI 1068 is clunky. I also have a LSI MegaRaid SATA 150-4 to which I'm making my comparison, the LSI 1068 is just plain basic and clunky.

--------------------------------------

Un-scientific and completely subjective test results:

Test Config 1:

The Whitebox server hardware above. 4 * 300 GB WD HDD's configured two drives per SAS channel in IME (RAID 10). Installed ESX 3.5 in about 1 hour 10 minutes or so, disk IO appears to be the bottleneck. Once configured, I immediately copied a 30 GB (20 GB used) Windows 2003 R2 VM across the unmanaged gigE switch from the DL380.

This config is slooooow. Painfully slow. I eventually loaded the whitebox server up with 6 Windows and 2 Linux VM's, and since the VM's are mostly idling anyway it was tolerable. Start a backup with esXpress, a large file copy or another OS install and the box chokes. For reference, a Windows Server 2003 install takes 90+ minutes.

Conclusion: DO NOT split I/O across 2 SAS channels. Very, mucho bad.

--------------------------------------

Test Config 2:

The Whitebox server hardware above. 4 * 300 GB WD HDD's configured all four drives on one SAS channel in IME (RAID 10). Installed ESX 3.5 in about 50 minutes or so, disk IO appears to be the bottleneck, but much less than before. Once configured, I immediately copied the same 30 GB (20 GB used) Windows 2003 R2 VM across the unmanaged gigE switch from the DL380.

This config is a noticable improvement from before, but it's still slooooow. It is however, usable for a home test lab.

Conclusion: the LSI 1068 by itself is a "software RAID" card, it will provide some RAID data security features and improved performance over single SATA drives.
It has enough kick to run a small home network or testlab, but you have to be a little patient. I did install a Exchange 2007 64 bit virtual server, but performance was enough of a concern I didn't move any MB's over to it.

--------------------------------------

Performance comparison to the DL380:


Slow. The venerable old DL380 doesn't support 64 bit guests, of course doesn't have Intel VT, and memory over 4 GB is not natively addressed in this hardware. That said, it still packs a punch and due to the better disk I/O (U320 SCSI, RAID 5) it blew away the whitebox server. The DL380 also draws 4.5 A (540 watts) vs. the whitebox server at 1.25 A (150 watts). So if you want to heat your room, and don't mind the droning sound, go with the DL380.

By the end of my testing, I configured 8 virtual machines consisting of 3 Windows 2003 R2 Domain Controllers, a combo IIS / SQL server, a Windows XP box running IP camera frame grabbing software, a Linux based e.mail security server (ESVA) and an un-named Linux-like firewall. In addition are a few lightly used Linux hosts. On the Asus P5M2 I also installed Exchange 2007 (64 bit) on Windows 2003 R2 64 bit, but as the DL380 doesn't support 64 bit this server was just idling.

The Asus P5M2/SAS ran a nice <20% of CPU and <10% of disk when all servers were idling. The real fun is when you put any I/O load on, such as migrating a VM across hosts through VI3. By raw numbers the P5M2 sustained 11254 KBps and the DL380 came in lower at 8084 KBps. But at this point the P5M2 had 7 idling VMs and was struggling to keep up while the DL380 was closer to idling. That said, keep in mind that the DL380 array has 5 active spindles and the parity overhead of RAID5, while the P5M2 has 4 spindles.

--------------------------------------

Final Configuration:

I LSI MegaRaid SATA 150-4 (PCI-X 64 bit card version)

January 8, 2008

How to crash your Cisco 871 Series Router

Turns out it's pretty easy to brick a Cisco 871 series router... just reboot it. OK, it's a little harder but not much. Here's the 411.

We selected the Cisco 871 Series router for use in a medical device. The customers of our medical device turn it on before using it, then turn it off when they're finished. Pretty normal, right? The gotcha is when you turn the router off during the boot process, and this router takes longer than most old PC's to boot. (Almost 3 minutes to hand out an address over DHCP).

The exact kill-spot depends on the version of IOS, we used 12.4(15) at first and then experimented with several other versions. It appears that during the final stages of the IOS boot process, around 69 seconds for 12.4(15), it reads the config from NVRAM, and if power is interrupted during this brief window the config is lost and the router is "bricked".

In our test lab we have a PC controlled Solid State Relay connected to a 120v AC power outlet, so I can power cycle the router automagically. I wrote a script to reboot the router in a loop after 1,2,3...infinity seconds. The good news is that each version of IOS tested died in the same place, so at least it's consistent.

Here's the logic of the Cisco Router "brick" script:
0. Begin loop at 01 seconds.
1. Power off the router.
2. Wait 10 seconds.

3. Turn on the router for XX number of seconds, where XX is the loop count.
4. Turn off the router.

5. Wait 10 seconds.
6. Power on the router, wait for it to boot (360 seconds just to be safe)
7. Run ipconfig on the PC, which is attached to a LAN port on the 871.
8. Ping the router. If it responds, we log this and continue. If it doesn't, we halt, leave the router on and wait for a humanoid life form to intervene.
9. Increment the loop count by 1, goto step 1.

We also attached a PC to the serial console and logged the output. We killed the router 100% of the time, but the cycle count varied from 49 to 70 on the versions of IOS we tried.

Conclusion:
Don't use Cisco 871 hardware in embedded applications. I don't have any higher end Cisco hardware laying around to test if this is a "feature" limited to the 871 series, but it's probably safe to assume this isn't a typical use case for Cisco routers, so who knows...?

Brick (brk) - a molded rectangular block of baked clay.

IT Idiom:
To render an expensive piece of networking gear useless by screwing up the configuration. Steve bricked the core router by installing an image of the OS larger than the available NVRAM .


All trademarks are property of their respective owners (and lawyers).