labunix's blog

labunixのラボUnix

EXT3+DRBDの導入

■EXT3+DRBDの導入
 GFSで管理しないパターン。
 lvm2は今回の手順では使わない。必要なら。

$ echo "cman drbd8-utils drbdlinks lvm2" | \
  sudo apt-get install -y `xargs`

■モジュールのロード

$ sudo modprobe drbd
$ lsmod | grep drbd
drbd                  193891  0 
lru_cache              12969  1 drbd

■procファイル

$ cat /proc/drbd 
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9 

■定義された資源はないよ。ってそりゃそうだ。

$ sudo drbdadm dstate `uname -n`
  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

you are the 783th user to install this version
no resources defined!

■まだ使っていない同じ容量の仮想ディスクがあります。

$ echo "01" >/dev/null;sudo fdisk -l /dev/sdb 

Disk /dev/sdb: 21.5 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

■パーティションの作成
 空のディスクなので、そのまま。

$ sudo fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xd1a5a0e2.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p

Disk /dev/sdb: 21.5 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd1a5a0e2

   Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-41943039, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039): 
Using default value 41943039

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

■確認

$ sudo fdisk -l /dev/sdb | grep ^/
/dev/sdb1            2048    41943039    20970496   83  Linux

■DRBDのサンプル設定を確認。
 としたいところだが、リソースの設定が長すぎる。
 「*.res」はリソースの略かな。。。

$ cat /etc/drbd.conf 
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

$ lv -s /usr/share/doc/drbd8-utils/examples/drbd.conf.example.gz | \
  grep -v "^#\|^\$\|^ *#" | grep -A 39 "^resource r0" >/dev/null

■ディスクがエラーの時は。。。

$ sudo grep -A 3 "[ \t]*disk " /etc/drbd.d/global_common.conf 
	disk {
		# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
		# no-disk-drain no-md-flushes max-bio-bvecs
	}

■ハンドラは「pass_on/ call-local-io-error/detach」なので、
 pass_onを選択するのだが、後でリソースの設定に記述する。
 ※globalには記述しない。

$ man drbd.conf | grep -A 13 "on-io-error handler"
       on-io-error handler
           is taken, if the lower level device reports io-errors to the upper
           layers.

           handler may be pass_on, call-local-io-error or detach.

           pass_on: The node downgrades the disk status to inconsistent, marks
           the erroneous block as inconsistent in the bitmap and retries the
           IO on the remote node.

           call-local-io-error: Call the handler script local-io-error.

           detach: The node drops its low level device, and continues in
           diskless mode.

■同期のレートはデフォルトが「250 KiB/sec」
 今回は後で変更するが、最初の同期が一番時間がかかるので、先にやっておいた方が良い。

$ sudo grep -A 2 "[ \t]*syncer " /etc/drbd.d/global_common.conf 
	syncer {
		# rate after al-extents use-rle cpu-mask verify-alg csums-alg
	}

$ man drbdsetup | grep -A 8 "^ *syncer"
   syncer
       Changes the synchronization daemon parameters of device at runtime.

       -r, --rate rate
           To ensure smooth operation of the application on top of DRBD, it is
           possible to limit the bandwidth that may be used by background
           synchronization. The default is 250 KiB/sec, the default unit is
           KiB/sec.

■プロトコルはデフォルトのCで。
 どの時点で完了とみなすかが異なる。

 Aの書き込みはローカルディスクとTCPバッファに届いた時点
 Bの書き込みはローカルディスクとリモートのバッファキャッシュに届いた時点
 Cの書き込みはローカルディスクとリモートディスクに届いた時点

$ grep protocol /etc/drbd.d/global_common.conf 
	protocol C;

$ man drbd.conf | grep -A 12 "protocol prot-id"
       protocol prot-id
           On the TCP/IP link the specified protocol is used. Valid protocol
           specifiers are A, B, and C.

           Protocol A: write IO is reported as completed, if it has reached
           local disk and local TCP send buffer.

           Protocol B: write IO is reported as completed, if it has reached
           local disk and remote buffer cache.

           Protocol C: write IO is reported as completed, if it has reached
           both local and remote disk.

■リソースを設定

$ cat /etc/drbd.d/drbd0.res 
resource drbd0 {
	protocol C;
	meta-disk internal;

	on xen-debian2 {
		device /dev/drbd0;
		disk /dev/sdb1;
		address 192.168.152.92:7789;
	}
	on xen-debian1 {
		device /dev/drbd0;
		disk /dev/sdb1;
		address 192.168.152.91:7789;
 }
}

■drbdプロセスの再起動
 アクティブ、スタンバイの両方に同じリソースファイルを設定して行う。

  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

you are the 51444th user to install this version
[....] Starting DRBD resources:[ 
drbd0
no suitable meta data found :(
Command '/sbin/drbdmeta 0 v08 /dev/sdb1 internal check-resize' terminated with exit code 255
drbdadm check-resize drbd0: exited with code 255
d(drbd0) 0: Failure: (119) No valid meta-data signature found.

	==> Use 'drbdadm create-md res' to initialize meta-data area. <==


[drbd0] cmd /sbin/drbdsetup 0 disk /dev/sdb1 /dev/sdb1 internal --set-defaults --create-device --on-io-error=pass_on  failed - continuing!
 
[ ok d0) ].

■この状態でのprocでアクティブ、スタンバイ共に同じ下記となる。

$ cat /proc/drbd 
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9 
 0: cs:Unconfigured

■「drbdsetup」でスタンバイ側を設定

$ sudo dd if=/dev/zero bs=1M count=1 of=/dev/sdb1; sync
1+0 レコード入力
1+0 レコード出力
1048576 バイト (1.0 MB) コピーされました、 0.00714007 秒、 147 MB/秒

$ sudo drbdadm create-md drbd0
  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

node already registered
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

$ sudo drbdsetup /dev/drbd0 disk /dev/sdb1 internal -1
Can not open device 'internal': No such file or directory

$ ls -l /dev/drbd0
brw-rw---T 1 root disk 147, 0  7月 22 23:43 /dev/drbd0

$ sudo drbdsetup /dev/drbd0 net 192.168.152.92 192.168.152.91 C

■アクティブ側ではIP順を逆にする。

$ sudo dd if=/dev/zero bs=1M count=1 of=/dev/sdb1; sync
$ sudo drbdadm create-md drbd0
$ sudo drbdsetup /dev/drbd0 disk /dev/sdb1 internal -1
Can not open device 'internal': No such file or directory
$ sudo drbdsetup /dev/drbd0 net 192.168.152.91 192.168.152.92 C

■アクティブ/スタンバイ共に以下のようなデバイスが出来る。

$ sudo find /dev/drbd* -print
/dev/drbd
/dev/drbd/by-res
/dev/drbd/by-res/drbd0
/dev/drbd/by-disk
/dev/drbd/by-disk/sdb1
/dev/drbd0

■アクティブ/スタンバイ共にSecondary/Secondaryであること。

$ cat /proc/drbd 
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9 
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:20969820

■アクティブ側で以下を実施。

$ sudo drbdadm -- --overwrite-data-of-peer primary drbd0
  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

node already registered

■同期が始まったことを確認出来る。
 (プライマリ)

$ cat /proc/drbd version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9 
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:34432 nr:0 dw:0 dr:35104 al:0 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:20935388
	[>....................] sync'ed:  0.2% (20444/20476)Mfinish: 4:20:23 speed: 1,328 (1,324) K/sec

■以下の方法でも確認出来る。
 (セカンダリ)

$ sudo /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9 
m:res    cs          ro                 ds                     p        mounted  fstype
0:drbd0  SyncTarget  Secondary/Primary  Inconsistent/UpToDate  C
...      sync'ed:    0.5%               (20380/20476)Mfinish:  4:20:54  1,312    (1,288)  want:  250  K/sec

■プライマリ側はro:Primary/Secondaryで、
 セカンダリ側はro:Secondary/Primaryで見える。

■プライマリ側でDRBDボリュームのマウント

$ sudo mke2fs -j /dev/drbd0 
mke2fs 1.42.5 (29-Jul-2012)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1310720 inodes, 5242455 blocks
262122 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
160 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

$ sudo mkdir /media/drbd0
$ sudo mount -t ext3 /dev/drbd0 /media/drbd0

■DRBDの動作確認とsyncスピードの変更
 アンマウント前に同期が完了していること。
 プライマリ側でしかマウントしていない。
 プライマリをアンマウント、セカンダリに降格。

$ sudo touch /media/drbd0/test
$ sudo umount /media/drbd0
$ sudo drbdadm secondary dbdb0

■セカンダリ側をプライマリに昇格

$ sudo drbdadm primary drbd0
$ sudo mount -t ext3 /dev/drbd0 /media/drbd0
$ ls /media/drbd0
lost+found  test

■プライマリのキャッシュの読み込み、デバイスの読み込み速度を確認。

$ sudo hdparm -Tt /dev/sdb1

/dev/sdb1:
 Timing cached reads:   8326 MB in  1.99 seconds = 4183.64 MB/sec
 Timing buffered disk reads: 928 MB in  3.00 seconds = 309.12 MB/sec

■書き込み速度の平均値を取得。50MBまではいけそう。

$ for m in `seq 1 10`;do \
    for n in `seq 1 100`;do \
      sudo dd if=/dev/zero bs=512 count=100 of=/media/drbd0/write 2>&1 ; \
    done | grep MB | sed s/".*秒、 "//g | \
    awk '{sum+=$1}END{print sum/100"MB"}'; \
  done
69.807MB
74.735MB
78.3MB
73.5MB
72.906MB
72.489MB
69.918MB
73.562MB
66.584MB
68.111MB

■syncerの設定

$ grep -A 2 syncer /etc/drbd.d/drbd0.res 
	syncer {
		rate 50M;
	}

$ sudo /etc/init.d/drbd restart

■その他

$ sudo drbdadm dstate drbd0
UpToDate/UpToDate

$ sudo drbdadm cstate drbd0
Connected

■プライマリからの結果

$ sudo drbdadm state drbd0
'drbdadm state' is deprecated, use 'drbdadm role' instead.
Primary/Secondary

$ sudo drbdadm show-gi drbd0

       +--<  Current data generation UUID  >-
       |               +--<  Bitmap's base data generation UUID  >-
       |               |                 +--<  younger history UUID  >-
       |               |                 |         +-<  older history  >-
       V               V                 V         V
19D1FE4951F2490B:0000000000000000:0004000000000004:0003000000000004:1:1:1:1:0:0:0
                                                                    ^ ^ ^ ^ ^ ^ ^
                                      -<  Data consistency flag  >--+ | | | | | |
                             -<  Data was/is currently up-to-date  >--+ | | | | |
                                  -<  Node was/is currently primary  >--+ | | | |
                                  -<  Node was/is currently connected  >--+ | | |
         -<  Node was in the progress of setting all bits in the bitmap  >--+ | |
                        -<  The peer's disk was out-dated or inconsistent  >--+ |
      -<  This node was a crashed primary, and has not seen its peer since   >--+

flags: Primary, Connected, UpToDate

■セカンダリからの結果

$ sudo drbdadm state drbd0
'drbdadm state' is deprecated, use 'drbdadm role' instead.
Secondary/Primary

$ sudo drbdadm show-gi drbd0

       +--<  Current data generation UUID  >-
       |               +--<  Bitmap's base data generation UUID  >-
       |               |                 +--<  younger history UUID  >-
       |               |                 |         +-<  older history  >-
       V               V                 V         V
19D1FE4951F2490A:0000000000000000:0004000000000004:0003000000000004:1:1:0:1:0:0:0
                                                                    ^ ^ ^ ^ ^ ^ ^
                                      -<  Data consistency flag  >--+ | | | | | |
                             -<  Data was/is currently up-to-date  >--+ | | | | |
                                  -<  Node was/is currently primary  >--+ | | | |
                                  -<  Node was/is currently connected  >--+ | | |
         -<  Node was in the progress of setting all bits in the bitmap  >--+ | |
                        -<  The peer's disk was out-dated or inconsistent  >--+ |
      -<  This node was a crashed primary, and has not seen its peer since   >--+

flags: Secondary, Connected, UpToDate

■設定を出力

$ sudo drbdadm dump 
# /etc/drbd.conf
common {
    protocol               C;
}

# resource drbd0 on xen-debian1: not ignored, not stacked
resource drbd0 {
    protocol               C;
    on xen-debian2 {
        device           /dev/drbd0 minor 0;
        disk             /dev/sdb1;
        address          ipv4 192.168.152.92:7789;
        meta-disk        internal;
    }
    on xen-debian1 {
        device           /dev/drbd0 minor 0;
        disk             /dev/sdb1;
        address          ipv4 192.168.152.91:7789;
        meta-disk        internal;
    }
    disk {
        on-io-error      pass_on;
    }
    syncer {
        rate             50M;
    }
}

■秒単位で同期が終わってしまう程度の書き込みでは同期中の状態も見れないので、
 こんな感じでテスト。

■セカンダリ

$ sudo watch -d -n 1 'cat /proc/drbd'


■プライマリ

$ sudo dd if=/dev/zero of=/media/drbd0/test bs=512 count=100000
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9 
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:55780 dw:55780 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0