Tuesday, November 10, 2009

Introduction to iSCSI

iSCSI is one of the hottest topics in Storage because it allows you to create centralized SANs using TCP networks rather than Fibre Channel (FC) networks.

Get a handle on the main iSCSI concepts and terminology.


SANs (Storage Area Networks) are popular as a storage system for many reasons. Most commonly they are popular because the storage is centralized allowing easier management, easier maintenance, very good performance, and it has a good price/performance ratio.

Typical SANs are constructed using Fibre Channel (FC) networks and FC hard drives, although SATA, SAS, and SSD drives are used in SANs today.


FC storage has very good performance from the storage hardware to the client using FC networks networks at 2Gb/s, 4Gb/s, and now 8 Gb/s speeds.

But FC based SANs require an FC card in each client as well as an FC network (switches) in addition to the normal TCP networks that clients have, adding cost and complexity to servers.


FC based SANs are popular because of very good performance and all of the benefits of centralized storage.

But at the same time, they can be rather expensive and complex. But, there is the option of using iSCSI for SANs keeping the benefits of SANs and reducing the complexity because only a TCP network is needed.


SAN Introduction

Virtually every server needs some sort of storage and the associated backup devices and management software.

Originally this meant that every server had its own storage, its own backup device, its own management tools, its own security, etc. as show below in Figure 1. This configuration is commonly called Direct Attached Storage (DAS).



Figure 1: DAS Configuration
With a small number of servers in fairly close proximity to each other, this arrangement can be managed fairly easily with a relatively small staff.

But as the number of servers grew and they became geographically dispersed, people realized that this approach would not work (i.e. it doesn’t scale well).

Something new was needed and that something was a Storage Area Network (SAN).


In a SAN topology the storage is centralized allowing all of the associated hardware and software to be centralized as well.

Consequently, you have a single set of storage devices, a single backup system, a single set of management tools, a single set of security devices, etc. The key to enabling SANs is the network.

Figure 2 below illustrates the SAN topology with the SAN Network shown in blue.
Figure 2: SAN Configuration
Figure 2: SAN Configuration

Notice that each server does not have any local storage nor a local backup device, etc., with these devices and services being provided by the centralized SAN.

In many cases, people still put a local drive in the servers for an OS but they don’t back up this drive.


Classic SANs use Fibre Channel (FC) networks to connect the centralized storage to the servers. This involved putting an HBA (Host Based Adapter) in each server and connecting it to the SAN network.

There are multiple options for the FC network topologies, but in many cases they involve FC switches. Then the SAN software allocates storage for each server from it’s pool of storage and the server can patition, format, and use its storage as it sees fit.

The FC network primarily transmits storage commands (most likely SCSI) from the servers to the storage but the advantage of FC is that the performance is so much better than existing networks (i.e. TCP).


The original speed of FC networks was 2 Gbps (400 MB/s in full duplex) and was deployed in the 2001 time frame.

At that time Fast Ethernet (100 Mbps) was the most common network with GigE networks only becoming available. GigE was half the speed of 2Gbps FC at best and Fast Ethernet was 1/20 the speed of 2 Gbps FC networks.

In the 2005 time frame, 4 Gbps FC networks were introduced. At that time, GigE was the most prevalent network but it was 1/4 the speed of 4 Gbps FC. Then in 2008, 8 Gbps FC networks were introduced, but by that time, 10GigE networks were available but there were a bit expensive.

However, from 2001 to 2005 there was a wholesale switch from Fast Ethernet to GigE, so perhaps 10GigE prices could come down to the point where they are competitive with GigE?


iSCSI

iSCSI stands for Internet Small Computer System Interface and is an Internet Protocol (IP) based storage networking protocol.

The basic concept is to take SCSI commands and encapsulate them in TCP/IP packets to transmit data from the storage drives to the server.

Since TCP packets can be lossy and packets don’t have to arrive in order, iSCSI also has to keep track of incoming packets to make sure that all of the SCSI commands are queued in the correct order.


iSCSI was originally started by IBM and was developed as a proof of concept in 1998. In March 2000, the first draft of the iSCSI standard was presented to the IETF (Internet Engineering Task Force).

IBM offered iSCSI based storage devices in July 2001 even before the iSCSI specification was passed by the IP Storage Working Group in August 2002.


There are several fundamental concepts that explain the majority of how iSCSI functions. The first is the initiator.

It is simply the storage device that “exports” the storage to the severs. The initiator can be hardware based or, more likely, software based.

The second concept is the target which is the server which “mounts” or “accepts” the “exported” storage block from the initiator and uses it. Again, the target can be hardware based or, more likely, software based.

So the target is run on the server and handles the storage traffic to/from the device that actually has the storage which runs the initiator.


iSCSI is fundamentally a Storage Area Network (SAN) protocol just as Fibre Channel. The key difference is that FC uses a specialized (FC) network and iSCSI can just use TCP networks.

This allows you to use a single network for storage data and other communication as shown in Figure 3.
Figure 3: SAN Configuration
Figure 3: SAN Configuration

One of the reasons that FC based SANs are so successful is that FC networks are faster than GigE by several times and they are much cheaper than 10GigE. But, over the last year or so, iSCSI has been gaining in popularity for three reasons:
  • The price of iSCSI configurations can be much less than FC configurations. The iSCSI performance is not as good as 8 Gpbs FC in general, but better than 4 Gbps FC, but if the performance of iSCSI based GigE SANs is good enough, the possibly lower price of iSCSI SANs can make them very attractive.
  • Using a single network (as in iSCSI) can be less complex than multiple networks (as in TCP and FC).
  • For many years the price of 10GigE networks has been much greater than FC networks. But the price of 10GigE is finally coming down to the point where it can be competitive with FC networks. The use of 10GigE can also mean that iSCSI can be faster than FC networks.

iSCSI OS Support (Mostly About Linux)

Many Operating Systems (OS’s) support iSCSI for both targets and initiators. The first OS with iSCSI support in some form was AIX (IBM did start iSCSI development after all), followed by Windows, Netware, HP-UX, Solaris, and then Linux (and a bunch of others after that).


There are several Linux iSCSI projects. The most prominent is an iSCSI initiator that was developed by Cisco and is available as open-source.

There are patches for 2.4 and 2.6 kernels, although the 2.6 kernel series has iSCSI already in it. Many Linux distributions ship with the initiator already in the kernel.

There is also a project originally developed by Intel and is available as open-source.


A fork of an original project called the Ardistech iSCSI project, provides a target package with an eye towards porting it to the Linux 2.6 kernel and adding features to it (the original Ardistech iSCSI target package has not been developed for some time and was 2.4 based, but it is very difficult to find).

Then this project was combined with the iSCSI initiator project to develop a combined initiator and target package for Linux.

This package was under very active development and was combined with the Linux-iSCSI project to create one of the most complete and supported iSCSI packages in Linux.

Another iSCSI packages that is under heavy development and support for 2.6 kernels is the iSCSI Enterprise Target project.


There are lots of articles and some tutorials on using iSCSI with Linux around the Internet. Some of the better articles are:
  • This article specifically presents how to use iSCSI on Ubuntu 9.04 as both an initiator and target. But the general configuration instructions are the same as other distributions.
  • This article is a bit old and might not be applicable, but it does show the basic concepts for enabling iSCSI on Linux.
  • For setting up iSCSI on CentOS, this article can be used to help guide you. It uses the Open-iSCSI packages to provide the iSCSI capability.

Another option is to use a pre-packaged distribution that is focused on storage called OpenFiler. It allows you to build a NAS box or an iSCSI initiator along with CIFS support.

You just need an x86 or x86_64 systems with 512MB of memory, 1GB of space for the OS, and a supported network card.

Then you can add as much storage to the server as you like and manage it as NAS storage or iSCSI storage.

It’s a very nice setup for a dedicated storage system. There is even a YouTube video that presents how to use OpenFiler with iSCSI storage.


Summary

iSCSI is a very hot topic right now because of many reasons. The combination of the trends where storage is becoming cheaper every day (2TB drives are just appearing) and 10GigE networks are becoming affordable, are helping to drive the popularity of iSCSI.

But the fact that Linux has robust iSCSI packages available is also helping to drive iSCSI adoption.


This is just a quick introduction to iSCSI with an eye toward Linux. Linux has good iSCSI packages that allow very robust configurations to be created (while not presented here, you can also use a capability called “multi-pathing” to create a more robust and fault tolerant SAN network).

No comments:

Post a Comment