labunix's blog

labunixのラボUnix

CPUの温度上昇による無応答、シャットダウンに対応する。

■CPUの温度上昇による無応答、シャットダウンに対応する。
 以下を参考に。

 CPU hardware errors in Ubuntu 17.04
 https://askubuntu.com/questions/941686/cpu-hardware-errors-in-ubuntu-17-04

■CPUモデルを確認。intel。

$ awk 'BEGIN{c=0}/model.name/{print c,$0;c++}' /proc/cpuinfo
0 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
1 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
2 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
3 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
4 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
5 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
6 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
7 model name	: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz

■00:39前後に無応答となったログ。

$ sudo grep "emperature above threshold, cpu clock throttled" /var/log/syslog
Nov  8 00:17:17 mo-debian kernel: [181994.609539] CPU5: Core temperature above threshold, cpu clock throttled (total events = 200507)
Nov  8 00:17:17 mo-debian kernel: [181994.609540] CPU1: Core temperature above threshold, cpu clock throttled (total events = 200508)
Nov  8 00:23:51 mo-debian kernel: [182389.271413] CPU5: Core temperature above threshold, cpu clock throttled (total events = 200530)
Nov  8 00:23:51 mo-debian kernel: [182389.271414] CPU1: Core temperature above threshold, cpu clock throttled (total events = 200531)
Nov  8 00:34:22 mo-debian kernel: [183019.938253] CPU7: Core temperature above threshold, cpu clock throttled (total events = 219419)
Nov  8 00:34:22 mo-debian kernel: [183019.938254] CPU3: Core temperature above threshold, cpu clock throttled (total events = 219417)
Nov  8 00:39:22 mo-debian kernel: [183320.426649] CPU1: Core temperature above threshold, cpu clock throttled (total events = 207287)
Nov  8 00:39:22 mo-debian kernel: [183320.426650] CPU5: Core temperature above threshold, cpu clock throttled (total events = 207286)
Nov  8 19:18:12 mo-debian kernel: [    0.052757] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov  8 19:18:12 mo-debian kernel: [    0.224506] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov  8 19:18:12 mo-debian kernel: [    0.224784]  #2 #3 #4<2>[    0.232331] CPU4: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov  8 19:18:12 mo-debian kernel: [    0.234922] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov  8 19:18:12 mo-debian kernel: [    0.245428] CPU7: Core temperature above threshold, cpu clock throttled (total events = 1)
Nov  8 19:23:37 mo-debian kernel: [  364.427091] CPU0: Core temperature above threshold, cpu clock throttled (total events = 126)
Nov  8 19:23:37 mo-debian kernel: [  364.427092] CPU4: Core temperature above threshold, cpu clock throttled (total events = 98)
Nov  8 19:28:38 mo-debian kernel: [  665.343979] CPU5: Core temperature above threshold, cpu clock throttled (total events = 4806)
Nov  8 19:28:38 mo-debian kernel: [  665.343980] CPU1: Core temperature above threshold, cpu clock throttled (total events = 4809)
Nov  8 19:33:38 mo-debian kernel: [  965.183368] CPU4: Core temperature above threshold, cpu clock throttled (total events = 5568)
Nov  8 19:33:38 mo-debian kernel: [  965.183369] CPU0: Core temperature above threshold, cpu clock throttled (total events = 5596)
Nov  8 19:33:39 mo-debian kernel: [  966.907344] CPU1: Core temperature above threshold, cpu clock throttled (total events = 19737)
Nov  8 19:33:39 mo-debian kernel: [  966.907345] CPU5: Core temperature above threshold, cpu clock throttled (total events = 19734)
Nov  8 19:38:40 mo-debian kernel: [ 1267.185671] CPU7: Core temperature above threshold, cpu clock throttled (total events = 33083)
Nov  8 19:38:40 mo-debian kernel: [ 1267.185672] CPU3: Core temperature above threshold, cpu clock throttled (total events = 33083)
Nov  8 19:43:40 mo-debian kernel: [ 1567.201237] CPU5: Core temperature above threshold, cpu clock throttled (total events = 38768)
Nov  8 19:43:40 mo-debian kernel: [ 1567.201238] CPU1: Core temperature above threshold, cpu clock throttled (total events = 38771)
Nov  8 19:48:40 mo-debian kernel: [ 1867.234588] CPU7: Core temperature above threshold, cpu clock throttled (total events = 51412)
Nov  8 19:48:40 mo-debian kernel: [ 1867.234590] CPU3: Core temperature above threshold, cpu clock throttled (total events = 51411)

■intel-microcodeをインストール。

$ apt-cache search intel-microcode
iucode-tool - Intel processor microcode tool
microcode.ctl - Intel IA32/IA64 CPU Microcode Utility (transitional package)
amd64-microcode - Processor microcode firmware for AMD CPUs
intel-microcode - Processor microcode firmware for Intel CPUs

$ sudo apt-get install -y intel-microcode

■マイクロコードのアップデートの確認

 プロセッサーのマイクロコードを更新する
 https://support.mozilla.org/ja/kb/microcode-update

$ dpkg -L intel-microcode | grep blacklist
/etc/modprobe.d/intel-microcode-blacklist.conf

$ cat /etc/modprobe.d/intel-microcode-blacklist.conf
# The microcode module attempts to apply a microcode update when
# it autoloads.  This is not always safe, so we block it by default.
blacklist microcode

$ lsmod | grep microcode

■initrdに組み込まれていることを確認

$ sudo apt-get install -y binwalk

$ dd if=/boot/initrd.img-$(uname -r) of=/dev/stdout bs=$(binwalk /boot/initrd.img-$(uname -r) \
    | awk '/gzip/{print $1}') skip=1 2>/dev/null | gunzip - | cpio -i -t | grep intel-microcode
etc/modprobe.d/intel-microcode-blacklist.conf
120563 blocks

■システムの再起動
 「depmod -a」は効かない。

$ sudo shutdown -r now && exit


$ sudo dmesg | grep microcode
[    0.000000] microcode: microcode updated early to revision 0x7, date = 2013-08-20
[    0.817508] microcode: sig=0x106e5, pf=0x2, revision=0x7
[    0.817801] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

$ sudo grep microcode /var/log/syslog
Nov  8 19:18:12 mo-debian kernel: [    0.803800] microcode: sig=0x106e5, pf=0x2, revision=0x4
Nov  8 19:18:12 mo-debian kernel: [    0.804116] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
Nov  8 20:20:55 mo-debian kernel: [    0.000000] microcode: microcode updated early to revision 0x7, date = 2013-08-20
Nov  8 20:20:55 mo-debian kernel: [    0.817508] microcode: sig=0x106e5, pf=0x2, revision=0x7
Nov  8 20:20:55 mo-debian kernel: [    0.817801] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

■Firefox、Chromiumのクラッシュ頻度が増えたのもこのせいかも知れない。
 firefoxのクラッシュログはある。chromiumは無い。

 Mozilla クラッシュレポーター
 https://support.mozilla.org/ja/kb/mozillacrashreporter#w_agceccaicccecyckcnacdoaoaeag

 Firefoxのクラッシュレポートの解析結果の読み方
 https://www.clear-code.com/blog/2017/12/19.html

$ firefox about:crashes 

$ find .config/chromium/Crash\ Reports/ 
.config/chromium/Crash Reports/

■念のためモニタリング出来るようにしておく。

 CPU 周波数スケーリング
 https://wiki.archlinux.jp/index.php/CPU_%E5%91%A8%E6%B3%A2%E6%95%B0%E3%82%B9%E3%82%B1%E3%83%BC%E3%83%AA%E3%83%B3%E3%82%B0#thermald

$ apt-cache search thermald
thermald - Thermal monitoring and controlling daemon

$ sudo apt-get install thermald
$ dpkg -L thermald | grep bin/
/usr/sbin/thermald

$ sudo shutdown -r now && exit