読者です 読者をやめる 読者になる 読者になる

labunix's blog

labunixのラボUnix

はてなブログのテキストバックアップ

■以前に以下のようなスクリプトを書いた。
 なんだか読むのが面倒なスクリプトだったのと、
 XMLの「page」を見るように仕様が変わっていたようなので、書き直した。

 はてなブログbetaのテキストバックアップ
 http://labunix.hateblo.jp/entry/2012/05/13/021544

■以下が新しいスクリプト

$ w3m -dump https://raw.github.com/labunix/get_new_hatenadialy_backup/master/labunix.hateblo.jp.sh
#!/bin/bash

SAVE=~/hatena2
test -d "$SAVE" || mkdir "$SAVE"
test -d "$SAVE" || exit 1
test -d "$SAVE" && cd "$SAVE"

w3m -dump "http://labunix.hateblo.jp/sitemap.xml" | \
  sed s/"<"/"\n&"/g | grep "<loc>" | sed s/"<loc>"// | \
  sed s/"^"/"\""/g | sed s/"\$"/"\""/g | \
  for site in `xargs`;do 
    w3m -dump "$site" | \
      sed s/"<"/"\n&"/g | grep entry | \
      grep "<loc>" | sed s/"<loc>"//g | \
      for list in `xargs`;do
        OUT=`echo "$list" | sed s%".*entry/"%% | \
          sed s%"\(20[0-9][0-9][01][0-9][0-3][0-9]\)/"%"\1_"% | \
          sed s%"\(20[0-9][0-9]\)/\([01][0-9]\)/\([0-3][0-9]\)"%"\1\2\3"%g | \
          sed s%"/"%"_"%g`
        if [ -f "$OUT" ];then
          echo "$OUT is here."
        else
          w3m -dump "$list" > $OUT && echo "$OUT done."
        fi
      done
  done
unset OUT
exit 0


■実行すると以下のようになる。

$ ./myscripts/labunix.hateblo.jp.sh 
20130619_1371652385 done.
20130617_1371395586 done.
20130614_1371137201 done.
20130613_1371130506 done.
20130611_1370959526 done.
20130610_1370869975 done.
20130609_1370707083 done.
20130608_1370700439 done.
20130608_1370687086 done.
20130528_1369748834 is here.
20130528_1369744282 is here.
20130525_1369486329 is here.
20130519_1368972546 is here.
20130518_1368806582 is here.
20130517_1368719553 is here.
20130517_1368717352 is here.
20130512_1368364903 is here.
20130512_1368360371 is here.
20130504_1367672082 is here.
20130429_1367241075 is here.
20130429_1367161222 is here.
20130408_1365422357 is here.
20130407_1365341897 is here.
20130404_1365085284 is here.
20130404_1365080717 is here.
20130402_1364902833 is here.
20130128_1359379972 is here.
20130106_1357439702 is here.
20130105_1357393094 is here.
20130104_1357303576 is here.
20130104_1357230211 is here.
20121231_1356955656 is here.
20121216_1355653389 is here.
20121113_1352756408 is here.
20121108_1352378556 is here.
20121107_1352292359 is here.
20121104_1352031606 is here.
20121029_1351516733 is here.
20121027_1351345680 is here.
20121014_1350219368 is here.
20121014_1350143134 is here.
20121004_1349357349 is here.
20120923_1348380254 is here.
20120919_1348059290 is here.
20120829_215603 is here.
20120826_010812 is here.
20120811_175214 is here.
20120810_011547 is here.
20120805_223723 is here.
20120729_180430 is here.
20120729_180423 is here.
20120728_192410 is here.
20120722_225133 is here.
20120722_194906 is here.
20120614_221436 is here.
20120611_234951 is here.
20120605_235512 is here.
20120603_165411 is here.
20120603_165405 is here.
20120602_231109 is here.
20120513_021544 is here.
20120510_144326 is here.
20120506_225732 is here.
20120506_031758 is here.
20120506_030620 is here.
20120504_155322 is here.
20120504_003001 is here.
20120430_183541 is here.
20120429_184047 is here.
20120407_191955 is here.
20120402_234628 is here.
20120402_002647 is here.
20120401_204538 is here.
20120401_034517 is here.
20120306_224942 is here.
20120304_194140 is here.
20120303_224721 is here.
20120225_223816 is here.
20120220_225536 is here.
20120215_235429 is here.
20120204_224407 is here.
20120201_225957 is here.
20120125_234531 is here.
20120109_225628 is here.

■バックアップした分だけログに残る。

$ cat hatena2/labunix.hateblo.jp.log 
20130619_1371652385 done.
20130617_1371395586 done.
20130614_1371137201 done.
20130613_1371130506 done.
20130611_1370959526 done.
20130610_1370869975 done.
20130609_1370707083 done.
20130608_1370700439 done.
20130608_1370687086 done.