Configuring pNFS/spnfsd

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
BennyHalevy (Talk | contribs)
(Created page with '= What is pNFS ? = pNFS is a new NFS feature provided in NFSv4.1, also known as Parallel NFS. Parallel NFS (pNFS) extends Network File Sharing version 4 (NFSv4) to allow client…')
Newer edit →

Revision as of 16:08, 7 April 2010

Contents

What is pNFS ?

pNFS is a new NFS feature provided in NFSv4.1, also known as Parallel NFS. Parallel NFS (pNFS) extends Network File Sharing version 4 (NFSv4) to allow clients to directly access file data on the storage used by the NFSv4 server. This ability to bypass the server for data access can increase both performance and parallelism, but requires additional client functionality for data access, some of which is dependent on the class of storage used.

Parallel NFS comes with various ways of accessing the data directly. For the moment, three such "layouts" have been provided.

  • the LAYOUT4_FILE that stripes accross multiple NFS Server
  • the LAYOUT4_BLOCK_VOLUME that allow the client to access data as stored in a block device
  • LAYOUT4_OSD2_OBJECTS that is based on the OSD2 protocol.

NFSv4.1 and pNFS are described by the following RFCs:

  • RFC5661 : Network File System (NFS) Version 4 Minor Version 1 Protocol
  • RFC5662 : Network File System (NFS) Version 4 Minor Version 1, External Data Representation Standard (XDR) Description
  • RFC5663 : Parallel NFS (pNFS) Block/Volume Layout
  • RFC5664 : Object-Based Parallel NFS (pNFS) Operations

Content of this document

This document describes how 3 machines were set up to build a basic pNFS/LAYOUT4_FILE test configuration. The machines I used are:

  • nfsmds, IP addr = XX.YY.ZZ.A, used as Metadata Server
  • nfsds, IP addr = XX.YY.ZZ.B, used a Data Server
  • nfsclient, IP addr = XX.YY.ZZ.C, used as client

Let's go configuring now...

Kernel and nfs-utils compilation

The first things to be done are recompiling a kernel and a nfs-utils distribution that are compatible. I used those from Benny Halevy's git repository:

 # Get kernel repository
 git clone git://git.linux-nfs.org/projects/bhalevy/linux-pnfs.git
 
 # Get nfs-utils repository
 git://linux-nfs.org/~bhalevy/pnfs-nfs-utils.git

For this document, I used the repositories's version with the following status:

  • pnfs-nfs-utils: commit id = 2b5373db8615a52c47dbcf3ab968fad7cdcc6fed (pnfs-nfs-utils-1-2-2)
  • kernel linux-pnfs: commit id = cbd09e0fb2b160a06a44aad1c21786b99401823f (pnfs-all-2.6.33-2010-03-09)

The kernel compilation goes ok. Just make sure that you have the right options configured in .config

       CONFIG_NETWORK_FILESYSTEMS=y
       CONFIG_NFS_FS=m
       CONFIG_NFS_V4=y
       CONFIG_NFS_V4_1=y
       CONFIG_PNFS=y
       CONFIG_NFSD=m
       CONFIG_PNFSD=y
       # CONFIG_PNFSD_LOCAL_EXPORT is not set
       CONFIG_SPNFS=y
       CONFIG_SPNFS_LAYOUTSEGMENTS=y

With kernel 2.6.34 or higher, add (should be the same as CONFIG_NFS_FS)

  CONFIG_PNFS_FILE_LAYOUT=m

Compiling pnfs-nfs-utils will be done as this

 # autoreconf --instal
 # ./configure --prefix=/usr && make && make install

but you have to make sure that you have the following products installed (all nodes were installed with a Fedora 12):

  • libtirpc + libtirpc-dev
  • tcp_wrappers + tcp_wrapper-libs + tcp_wrappers-devel
  • libblkid + libblkid-devel
  • libevent + libevent-devel
  • libnfsidmap

You'll find all of them as rpm packages, but the libnfsidmap. For this one, you'll have to get the lastest version, compile and install it (do not forget to specify "./configure --prefix=/usr"). You can get it from nfs-utils-lib-devel-1.1.4-8 or higher as well.

Basically, as command like the following one should do all the required work:

 # yum install libtirpc{,-devel} tcp_wrappers{,-devel} libevent{,-devel} nfs-utils-lib{,-devel} \
   libgssglue{,-devel} libnfsidmap{,-devel} libblkid{,-devel} libcap{,-devel}

Configuring the test bed to used pNFS over LAYOUT4_FILES

In this configuration, the client (nfsclient) will mount the MDS (nfsmds). The client has inserted a specific kernel module, known as the layout driver to connect to the DS. All of the metadata traffic will go through the MDS, but data traffic will be done in-between the DS and the client.

The MDS should be able to mount the DS and have root access on it. It runs a user space daemon, the spnfsd (which is part of nfs-utils) that uses this mount point to get information from the DS.

Configuring the pNFS Data Server

The Data Server is just a regular NFSv4.1 server. It is important that the Metadata Data Server had root access on it, to prevent from weird behaviour due to EPERM errors.

The Data Server's /etc/exports will look like this on nfsds:

 /export/spnfs  *(rw,sync,fsid=0,insecure,no_subtree_check,pnfs,no_root_squash)

Configuring the pNFS Metadata Server

The MDS is a client to the DS, and runs the spnfsd. It is as well a NFSv4.1 server with pNFS enabled.

The spnfsd configuration is done in two steps:

  • configuring the MDS as client to the DS
  • Writing the /etc/spnfsd.conf file

On the MDS, the /etc/fstab should contain this line:

 nfsds:/       /spnfs/XX.YY.ZZ.B   nfs4    minorversion=1        0 0

It is mandatory to have the mount point done over NFSv4 and with minorversion set to 1.

Its /etc/spnfsd will look like this (this is a single DS configuration)

 [General]
 Verbosity = 1
 Stripe-size = 8192
 Dense-striping = 0
 Pipefs-Directory = /var/lib/nfs/rpc_pipefs
 DS-Mount-Directory = /spnfs
 
 [DataServers]
 NumDS = 1
 DS1_IP = XX.YY.ZZ.B
 DS1_PORT = 2049
 DS1_ROOT = /
 DS1_ID = 1

Finally the /etc/exports will be like this

 /export  *(rw,sync,pnfs,fsid=0,insecure,no_subtree_check,no_root_squash)

Notice the pnfs token within the export's options

Configuring the client

The client is to be used as a regular NFSv4.1 client. The only thing to do is making sure that kernel module nfslayoutdriver is inserted

 # modprobe nfslayoutdriver

Then you can mount the MDS on the client:

 # mount -t nfs4 -o minorversion=1 nfsmds:/ /mnt

warning: Before making any read/write operations, make sure that the NFSv4 grace delay is passed. Usually it take 90s after the nfs service starts..


Basic test

The first test is pretty simple: On the client, I write 50 bytes to a file:

 # echo "jljlkjljjhkjhkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhk" > ./myfile
 # ls -i ./myfile
 330246 myfile

On the DS, I should see a new file whose name contains the fileid of myfile and located in the root of what it exports to the MDS.

 # ls -l /export/spnfs/330246*
 -rwxrwxrwx 1 root root 50 Mar 24 10:49 /export/spnfs/330246.2343187478
 # cat /export/spnfs/330246.2343187478
 jljlkjljjhkjhkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhk

As you can see, this file, located on the DS contains the data written by the client.

On the MDS, the file has the right size, but no blocks allocated if watched outside NFS. It contains no data.

 # cd /export
 # stat myfile
 File: `myfile'
 Size: 50              Blocks: 0            IO Block: 4096   regular file
 Device: fd00h/64768d    Inode: 330246      Links: 1
 Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
 Access: 2010-03-24 12:56:02.331151053 +0100
 Modify: 2010-03-24 10:49:08.997150735 +0100
 Change: 2010-03-24 10:49:08.997150735 +0100
 
 # cat myfile
 (no output, the file is empty)

-- Philippe Daniel 2010-04-07

Personal tools