Vic/notes/iscsi
From Summerseas
iSCSI is a SCSI transport which may be used in storage networking as an alternative to FCP, Fibre Channel Protocol. The iSCSI transport packages SCSI commands and data into iSCSI PDU's and transports the PDU's via TCP packets between targets to initiators. iSCSI offers a significant cost advantage over fibre channel and also has the advantage of using a network technology that is more mature than fibre channel. Recently a competing transport, FCoE or Fibre Channel over Ethernet is generating quite a bit of excitement.
iSCSI Background
- The ability to provide block storage to a host via an IP network is compelling for many reasons. Of these, economy is the primary advantage. Modern enterprises invest pretty heavily in network infrastructure. Consider that Cisco has a market Cap of over $160 billion. IP networks are ubiquitous in our net-centric modern world. Since virtually all companies already have an IP network for data communications why not also use that network for storage.
- Fortunately the SCSI-3 spec leaves the transport up to anyone with a good idea about how to move SCSI commands and data between targets and initiators.
- Consolidating storage and data communication networks offers economic advantages.
- Existing pool of IP network professionals.
- IP networking hardware is less expensive than Fibre Channel hardware.
- IP software is more mature.
- iSCSI can take advantage of existing fault tolerance capabilities developed for IP networks like IPMP available with Solaris.
- The easy routability of IP lends itself nicely to DR replication.
iSCSI RFC
- The iSCSI transport protocol is described in RFC 3720 and additional command ordering design considerations are described in RFC 3783.
iSCSI and the Nagle Algorithm
The Nagle algorithm is a set of rules applied to outgoing TCP packets. The goal of the Nagle algorithm is to improve network efficiency by delaying "sends" unless there is some minimum amount of data to send. This tends to avoid situations where packets with excessively small payloads cause congestion on the wire. The problem with this algorithm when applied to iSCSI is that certain disk workloads seem to create worst case scenarios with Nagle. See the OpenSolaris bug linked below for more detail.
- TCP_NODELAY Bug 6621560 filed with OpenSolaris.
- Default vs. tcp-nodelay Testing
- How to disable Nagle and verify Nagle is disabled for the Solaris iSCSI initiator.
- Nagle is covered in RFC 896
iSCSI Discovery Issues
- There are a couple of problems with Discovery and a patch is available for one of them. The other must be worked-around by using static discovery.
Discovery and Path Failback Bug
- With this bug, path failback never occurs once the path has failed from MPxIO's perspective. Say for example you have 2 iscsi paths provided by 2 GigE switches and you shutdown one switch for maintenance for 30 minutes. Then you return the switch to operation and check your host to see it the paths provided by that switch are back online. In fact they are not online and won't recover until you reboot.
- The related discovery bug causes discovery to fail after an iscsi lun has been removed from a host. So for example, if you have a host with luns 0, 1 and 2 and these luns are already discovered by the host and then you remove lun 0 at the iscsi array, all subsequent discovery for that host will fail until the host is rebooted.
- Both problems were fixed by Sun in January 2008 with the release of the following patches.
- 119090-25 iscsi patch for SPARC
- 119091-26 iscsi patch for x86
- These patches are available from SunSolve.
Initiator Sillyness
- The other discovery problem is connected with dynamic discovery methods like iSNS and Send Targets. Best practice is to use Static Discovery. This may be undesirable in complex environments but it avoids the problems that may be encountered if Send Targets or iSNS discovery methods are used. Problems occur when Send Targets and iSNS are used in situations where the initiator doesn't have network access to all the target addresses found during discovery. In those cases the initiator becomes obsessed, completely obsessed and determined to log on to the inaccessible target addresses. The initiator's obsession creates so much load on the server that the server becomes unresponsive. This is clearly not acceptable initiator behavior. Unfortunately, the RFC doesn't seem to address this situation. It would seem reasonable for the initiator to have some user defined throttle controls to limit efforts to connect to a target portal address that it will never have access to. Sure, you can use access lists on the target side so the initiator never discovers the inaccessible addresses but this causes even more administrative overhead than using static configs AND it isn't supported with iSNS. Alternatively the initiator could assess connectivity before trying to connect to a target. Systems Administrators have coded connectivity tests into network scripts since the beginning of time.
- Adding a discovery address can help with creating the Static bindings. Once a discovery address is added simply issue "iscsiadm list discovery-address -v" to see all the target addresses and TPGTs available on the target. Then create the static-config as in the following example.
[root@sunx4200-shu02--->]iscsiadm list discovery-address -v
Discovery Address: 10.60.181.94:3260
Target name: iqn.1992-08.com.netapp:sn.101177625
Target address: 10.60.181.94:3260, 1000
Target name: iqn.1992-08.com.netapp:sn.101177625
Target address: 192.168.100.100:3260, 1001
Target name: iqn.1992-08.com.netapp:sn.101177625
Target address: 192.168.200.100:3260, 1002
Target name: iqn.1992-08.com.netapp:sn.101177625
Target address: 192.168.201.100:3260, 1003
[root@sunx4200-shu02--->]ping 192.168.100.100
192.168.100.100 is alive
[root@sunx4200-shu02--->]iscsiadm add static-config iqn.1992-08.com.netapp:sn.101177625,192.168.100.100,1001
iSCSI Performance Issues
- One of the challenges with IP storage the amount CPU time required to move data through the TCP layer. This issue becomes more apparent with very high speed transports like 10Gb ethernet. There is work in progress to solve this particular problem.
- RFC 4296 The Architecture of Direct Data Placement (DDP) and Remote Direct Memory Access (RDMA) on Internet Protocols.
- RFC 4297 Remote Direct Memory Access (RDMA) over IP Problem Statement.
- RFC 5040 A Remote Direct Memory Access Protocol Specification.
- RFC 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security.
- RFC 5045 Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct Data Placement (DDP).
- RFC 5046 Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA).
- RDMA Consortium
- Voltaire RDMA Work
Solaris iSCSI setup
Frequently used commands
- Show initiator iqn
- iscsiadm list initiator-node
- Change the initiator iqn
- iscsiadm modify initiator-node -N iqn.1986-03.com.sun:01:0003bad5a487.shsun16
- Add a discovery address
- iscsiadm add discovery-address 10.60.240.97
- Enable "send targets" discovery
- iscsiadm modify discovery -t enable
- Show targets and luns
- iscsiadm list target -S
- Configure multi sessions per target...
- iscsiadm modify target-param -c 3 <target node name>
- Troubleshooting commands
- iscsiadm list discovery
- iscsiadm list initiator-node
- iscsiadm list isns-server -v
- iscsiadm list static-config
- iscsiadm list target -v
- iscsiadm list discovery-address -v
- iscsiadm list static-config
- iscsiadm remove static-config iqn.1992-08.com.netapp:sn.101183784,192.168.4.212:3260,2002
- Setting up header digest.
- iscsiadm modify target-param -h CRC32 iqn.1992-08.com.netapp:sn.84199300
Configuring IP Failover on a NetApp Filer
- The ifconfig commands in /etc/rc should specify the partner's IP so that during CDO the partner's IP failsover. This is only required for IP type services like iSCSI, NFS etc. This is not required for FCP.
- The following is an example of properly configured /etc/hosts and /etc/rc files on a 3020 filer enabling it to failover it's partner's IP address.
fas3020-shu05> rdfile /etc/hosts 127.0.0.1 localhost 10.60.181.94 fas3020-shu05 fas3020-shu05-e0a 192.168.1.10 mailhost fas3020-shu05> rdfile /etc/rc hostname fas3020-shu05 ifconfig e0a `hostname`-e0a mediatype auto flowcontrol full netmask 255.255.255.0 partner 10.60.181.96 route add default 10.60.181.1 1 routed on options dns.domainname rtp.netapp.com options dns.enable on savecore
Tips
Renaming the initiator-node
- Rename the initiator-node to include the hostname and IP address to provide a more descriptive nodename.
- iscsiadm modify initiator-node -N iqn.1986-03.com.sun:01:sun280r-shu01.10.60.181.17
fas270-shu01> igroup show
sun280r-shu01i (iSCSI) (ostype: solaris):
iqn.1986-03.com.sun:01:sun280r-shu01.10.60.181.17 (logged in on: e0a)
Setting up bi-directional CHAP Authentication
- Initiator config for node
- iscsiadm modify initiator-node --CHAP-name sun280rshu01
- iscsiadm modify initiator-node -a chap
- iscsiadm modify initiator-node --CHAP-secret
- (iscsiadm will prompt for a password here. The password must be 12 char.)
- Initiator config for target
- iscsiadm modify target-param -B enable iqn.1992-08.com.netapp:sn.84199300
- iscsiadm modify target-param --CHAP-name fas270shu01 iqn.1992-08.com.netapp:sn.84199300
- iscsiadm modify target-param --CHAP-secret iqn.1992-08.com.netapp:sn.84199300
- (iscsiadm will prompt for a password here. The password must be 12 char.)
- iscsiadm modify target-param -a chap iqn.1992-08.com.netapp:sn.84199300
- Filer Config for initiator
- iscsi security add -i iqn.1986-03.com.sun:01:sun280r-shu01.10.60.181.17
- -s CHAP -p chappassword -n sun280rshu01
- -o chappassword -m fas270shu01
- iscsi security add -i iqn.1986-03.com.sun:01:sun280r-shu01.10.60.181.17
Where does the Initiator store it's Configuration
- There are 2 database files kept in /etc/iscsi. Run strings on the files to see configuration items. In the example below you can see the chap usernames and passwords for the initiator and targets configured for chap authentication. This is maybe a security hole but also convenient for troubleshooting chap authentication problems. The files are iscsi_v1.dbc which appears to have the current config and iscsi_v1.dbp with the previous config.
[root@sun280r-rtp17--->strings /etc/iscsi/iscsi_v1.dbc NodeName iqn.1986-03.com.sun:01:0003ba1803fa.462656ca NodeAlias Login StaticAddr2 DiscAddr 192.168.004.215 192.168.004.212 Chap SENDTARGETS_DISCOVERY SENDTARGETS_DISCOVERY iqn.1986-03.com.sun:01:0003ba1803fa.462656ca netapphost hostpassword iqn.1992-08.com.netapp:sn.101183009 netappfiler filerpassword iqn.1992-08.com.netapp:sn.101183784 netappfiler filerpassword BidirAuth iqn.1992-08.com.netapp:sn.101183009 iqn.1992-08.com.netapp:sn.101183784 DiscMethod
Setting up an Access List on a NetApp Filer
- iscsi interface accesslist add iqn.1986-03.com.sun:01:sun280r-shu01.10.60.181.17 e0a
- Note - These access lists are useful in mitigating the dynamic discovery issues mentioned earlier but add additional administrative overhead. (Which clearly goes against the intentions of dynamic discovery.)
iSNS
The iSNS specification is covered in RFC 4171
Setting up iSNS on a Netapp Filer
- iscsi isns config -i <ip_addr>
- iscsi isns start
- iscsi isns show
- iscsi isns update
- Filer side gotcha
- if you're running MS isns server version 3, you must set the following option:
- options iscsi.isns.rev 22
- if you're running MS isns server version 3, you must set the following option:
Sun's iSCSI Target HowTo
- Simple HowTo for configuring the Sun iSCSI target. This can be handy if we want to compare out functionality with another iSCSI target implementation and may help confirm whether a problem is with the target or initiator. The target is included beginning with Solaris 10 U4.
- Enable the target service
- svcadm enable svc:/system/iscsitgt:default
- Create a zpool
- zpool create lunpool c3t1d0
- Create a target and lun
- zfs create -o shareiscsi=on -V 50g lunpool/lun0
- View target properties
[root@sunx4200-shu01--->]iscsitadm list target -v
Target: lunpool/lun0
iSCSI Name: iqn.1986-03.com.sun:02:85ea3fb3-1917-6156-afe1-b51d9da7515d
Alias: lunpool/lun0
Connections: 0
ACL list:
TPGT list:
LUN information:
LUN: 0
GUID: 0x0
VID: SUN
PID: SOLARIS
Type: disk
Size: 50G
Backing store: /dev/zvol/rdsk/lunpool/lun0
Status: online
- Create portal groups
- iscsitadm create tpgt 7000
- iscsitadm create tpgt 7001
- Assign the portal group tage to interfaces
- iscsitadm modify tpgt -i 192.168.200.2 7001
- iscsitadm modify tpgt -i 192.168.100.2 7000
- Assign the TPGTs to the target
- iscsitadm modify target -p 7000 lunpool/lun0
- iscsitadm modify target -p 7001 lunpool/lun0
- Create a local initiator. We have to link a iscsi IQN with a local initiator name in order do lun masking. Lun masking with Sun's target is accomplished by setting an ACL on the target.
- The following creates the local initiator
- iscsitadm create initiator -n iqn.1986-03.com.sun:01:SUNX4200-SHU02 sunx4200-shu02-sw-init
- Now set the ACL
- iscsitadm modify target -l sunx4200-shu02-sw-init lunpool/lun0
- Now the target properties look like this...
[root@sunx4200-shu01--->]iscsitadm list target -v
Target: lunpool/lun0
iSCSI Name: iqn.1986-03.com.sun:02:85ea3fb3-1917-6156-afe1-b51d9da7515d
Alias: lunpool/lun0
Connections: 0
ACL list:
Initiator: sunx4200-shu02-sw-init
TPGT list:
TPGT: 7000
TPGT: 7001
LUN information:
LUN: 0
GUID: 0x0
VID: SUN
PID: SOLARIS
Type: disk
Size: 50G
Backing store: /dev/zvol/rdsk/lunpool/lun0
Status: online
- If you forgot the initiator name mapings do this...
[root@sunx4200-shu01--->]iscsitadm list initiator
Initiator: sunx4200-shu02-sw-init
iSCSI Name: iqn.1986-03.com.sun:01:SUNX4200-SHU02
CHAP Name: Not set
Netapp VIFs
- VIFs are simple NIC aggregates, either single mode (active/passive) or multi mode (active/active)
- For cluster environments it may be desirable to failover virtual interfaces created on vifs. To do this the rc file must have an ifconfig command mapping the vif to it's partner's vif.
- Example - ifconfig vif1 partner vif1
- Note - The rc entries that use ifconfig to configure interfaces on top of VIFs must still use the "partner" key word to map that interface to a partner IP address.
- Example rc file with a vif...(Note the line ifconfig vif1 partner vif1)
#Regenerated by registry Mon Sep 10 08:29:15 EDT 2007 #Auto-generated by setup Thu Jun 14 10:46:40 EDT 2007 hostname fas920-shu01 vif create multi vif1 e11b e11a vlan create vif1 4 2 ifconfig vif1 partner vif1 ifconfig e0 `hostname`-e0 mediatype auto netmask 255.255.252.0 partner 10.60.181.126 ifconfig vif1-4 `hostname`-vif1-4 netmask 255.255.255.0 mtusize 9000 partner 192.168.4.4 ifconfig vif1-2 `hostname`-vif1-2 netmask 255.255.255.0 mtusize 9000 partner 192.168.2.4 #Disabled# ifconfig e10b `hostname`-e10b netmask 255.255.255.0 mtusize 9000 flowcontrol full partner 192.168.4.4 #Disabled# ifconfig e10a `hostname`-e10a netmask 255.255.255.0 mtusize 9000 flowcontrol full partner 192.168.2.4 route add default 10.60.181.1 1 routed on options dns.domainname rtp.netapp.com options dns.enable on options nis.enable off savecore
Sun IPMP, IP Multipathing
- Configuring an active/standby IPMP group. In this setup we'll create an IPMP interface group called iSCSI using an NGE and an E1000G interface. We'll need 1 test IP for each of the interfaces and the real IP that will be accessible from the network.
- /etc/hosts entries
- 192.168.100.1 sun4200-shu02-ipmp
- 192.168.100.2 sun4200-shu02-ipmp-e1000g0
- 192.168.100.3 sun4200-shu02-ipmp-nge1
- Now create /etc/hostname entries to create the group and assign active and standby roles.
- /etc/hostname.e1000g0
- sun4200-shu02-ipmp-e1000g0 netmask + broadcast + group iSCSI
- deprecated -failover up addif sun4200-shu02-ipmp netmask + broadcast + failover up
- /etc/hostname.nge1
- sun4200-shu02-ipmp-nge1 netmask + broadcast + group iSCSI deprecated -failover standby up
- Note that the "addif" in the e1000g0 hostname file adds the real IP as a virtual interface to the active nic.
Enabling Jumbo Frames (MTU=9000) on the Cisco Switch
- Login to the switch and type "enable" and provide the enable password.
- Type "config" to getinto config mode.
- Example...
csco4948-shu01#config Configuring from terminal, memory, or network [terminal]? Enter configuration commands, one per line. End with CNTL/Z. csco4948-shu01(config)#interface GigabitEthernet 1/11 csco4948-shu01(config-if)#mtu 9000 csco4948-shu01(config-if)#interface GigabitEthernet 1/12 csco4948-shu01(config-if)#mtu 9000 csco4948-shu01(config-if)#exit csco4948-shu01(config)#exit
Enabling Jumbo Frames (MTU=9000) on a NetApp Filer
- Edit /etc/rc and find the ifconfig line for the interface of interest.
- Add/Change "mtusize 9000"
Enabling Jumbo Frames (MTU=9000) on x86 Solaris 10
NGE Interface
- Edit /kernel/drv/nge.conf
- Uncomment the following line...
- Set the max jumbo frame size with default_mtu parameter.
- nge supports upto 9000 bytes jumbo frame size.
- default_mtu=9000;
e1000g Interface
- Edit /kernel/drv/e1000g0.conf
- Change the MaxFrameSize line to the following...
- MaxFrameSize=3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3;
Setting up iSNS on the solaris host
- iscsiadm add iSNS-server 10.61.17.53
- iscsiadm modify discovery -i enable
Linux
- iscsiadm
[root@skylab.rtp.netapp.com--->]iscsiadm -m node 10.60.240.52:3260,1 iqn.1992-08.com.netapp:sn.33603897 10.60.240.51:3260,1000 iqn.1992-08.com.netapp:sn.33604524 [root@skylab.rtp.netapp.com--->]iscsiadm -m node -T iqn.1992-08.com.netapp:sn.33604524 -p 10.60.240.51:3260 -l
- Mounting via disk label...
- e2label /dev/sdc1 /netapp1
- echo "LABEL=/netapp1 /netapp1 ext3 defaults 1 2" >>/etc/fstab
- mount /netapp1
Study/Experiments
Initiator behavior during CFO
How long before TCP connection drops?
- Performed this test on an x86 system running Sol10U4 with 1 session configured via static config. The test it to down the interface on the filer and monitor the tcp port 3260 connection using netstat.
- Used the following script to monitor the connection in both tcp and iscsi...
#!/bin/bash
X=1
while true
do
echo "Elapsed time, $X seconds....."
netstat -an |grep 10.60.181.94.3260
iscsiadm list target;echo "";echo ""
X=$((++X))
sleep 1
done
# Here is the output beginning 52 seconds into the test
Elapsed time, 52 seconds.....
10.60.181.38.33288 10.60.181.94.3260 131400 47 64240 0 ESTABLISHED
Target: iqn.1992-08.com.netapp:sn.101177625
Alias: -
TPGT: 1000
ISID: 4000002a0000
Connections: 1
Elapsed time, 53 seconds.....
10.60.181.38.33290 10.60.181.94.3260 0 0 49640 0 SYN_SENT
Target: iqn.1992-08.com.netapp:sn.101177625
Alias: -
TPGT: 1000
ISID: 4000002a0000
Connections: 0
- How long before iSCSI session drops?
- How long before iSCSI session resumes on failover head?
- Does the failed target IQN failover or just the IP?
- Inititiator behavior during loss of a connection
iSCSI Analysis and Troubleshooting
- Troubleshooting iSCSI problems can be a challenging. Subtle protocol violations may cause problems such as login failures unexplained errors, dropped connections etc. This analysis and troubleshooting guide should help diagnose iSCSI interoperability problems.
