labunix's blog

labunixのラボUnix

gsでpdfフォントを埋め込んでみる。

■gsでpdfフォントを埋め込んでみる。
 pdffontsでembとして組み込まれていないフォントがある。

$ pdffonts sample.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Calibri                              TrueType          WinAnsi          no  no  no       7  0
Calibri-Bold                         TrueType          WinAnsi          no  no  no       8  0
Calibri-Bold                         TrueType          WinAnsi          no  no  no       9  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no      10  0
Calibri-BoldItalic                   TrueType          WinAnsi          no  no  no      16  0
Calibri                              TrueType          WinAnsi          no  no  no      17  0
BBCDEE+Calibri                       CID TrueType      Identity-H       yes yes yes   1397  0
Times-Roman                          Type 1            WinAnsi          no  no  no    1837  0

■今回はCalibriの好き嫌いとは無関係ながらつまった文字表示について、
 結果としてはPDFビューワーを変えて解決しました。

$ sudo apt-get install -y mupdf mupdf-tools

■代替フォントでもとりあえずデフォルトでフォントを埋め込んでみる。

$ gs -q -dNOPAUSE -dBATCH -dPDFSETTINGS=/prepress -sDEVICE=pdfwrite -sOutputFile=sample_inside.pdf sample.pdf

$ pdffonts sample_inside.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
NTJNVO+Helvetica-Bold                Type 1C           WinAnsi          yes yes no       8  0
JSORRB+Helvetica                     Type 1C           WinAnsi          yes yes no      10  0
CEFOXW+Helvetica-BoldOblique         Type 1C           WinAnsi          yes yes no      22  0
ZOHUJJ+Calibri                       CID TrueType      Identity-H       yes yes yes    880  0

■Times-Romanに変更してみる。

$ find /etc/ghostscript/ -type f | grep Time `xargs`
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Times-Roman /NimbusRomNo9L-Regu ;
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Times-Bold /NimbusRomNo9L-Medi ;
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Times-Italic /NimbusRomNo9L-ReguItal ;
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Times-BoldItalic /NimbusRomNo9L-MediItal ;

$ cp sample.pdf sample2.pdf; \
  sed -i -e 's/Calibri-BoldItalic/Times-BoldItalic/g' sample2.pdf; \
  sed -i -e 's/Calibri-Bold/Times-Bold/g' sample2.pdf; \
  sed -i -e 's/Calibri/Times-Roman/g' sample2.pdf

$ pdffonts sample2.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Times-Roman                          TrueType          WinAnsi          no  no  no       7  0
Times-Bold                           TrueType          WinAnsi          no  no  no       8  0
Times-Bold                           TrueType          WinAnsi          no  no  no       9  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no      10  0
Times-BoldItalic                     TrueType          WinAnsi          no  no  no      16  0
Times-Roman                          TrueType          WinAnsi          no  no  no      17  0
BBCDEE+Times-Roman                   CID TrueType      Identity-H       yes yes yes   1397  0
Times-Roman                          Type 1            WinAnsi          no  no  no    1837  0

$ gs -q -dNOPAUSE -dBATCH -dPDFSETTINGS=/prepress -sDEVICE=pdfwrite -sOutputFile=sample_inside2.pdf sample2.pdf
   **** Error:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** However, the output may be incorrect.

$ pdffonts sample_inside2.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LMQGFU+Times-Bold                    Type 1C           WinAnsi          yes yes no       8  0
HVBVCI+Times-Roman                   Type 1C           WinAnsi          yes yes no      10  0
EBTTUL+Times-BoldItalic              Type 1C           WinAnsi          yes yes no      22  0
ZOHUJJ+Times-Roman                   CID TrueType      Identity-H       yes yes yes    880  0

■Courierに変更してみる。

$ find /etc/ghostscript/ -type f | grep Courier `xargs`
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Courier /NimbusMonL-Regu ;
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Courier-Bold /NimbusMonL-Bold ;
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Courier-Oblique /NimbusMonL-ReguObli ;
/etc/ghostscript/fontmap.d/10gsfonts.conf:/Courier-BoldOblique /NimbusMonL-BoldObli ;

$ cp sample.pdf sample3.pdf; \
  sed -i -e 's/Calibri-BoldItalic/Courier-BoldOblique/g' sample3.pdf; \
  sed -i -e 's/Calibri-Bold/Courier-Bold/g' sample3.pdf; \
  sed -i -e 's/Calibri/Courier/g' sample3.pdf

$ pdffonts sample3.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Courier                              TrueType          WinAnsi          no  no  no       7  0
Courier-Bold                         TrueType          WinAnsi          no  no  no       8  0
Courier-Bold                         TrueType          WinAnsi          no  no  no       9  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no      10  0
Courier-BoldOblique                  TrueType          WinAnsi          no  no  no      16  0
Courier                              TrueType          WinAnsi          no  no  no      17  0
BBCDEE+Courier                       CID TrueType      Identity-H       yes yes yes   1397  0
Times-Roman                          Type 1            WinAnsi          no  no  no    1837  0

$ gs -q -dNOPAUSE -dBATCH -dPDFSETTINGS=/prepress -sDEVICE=pdfwrite -sOutputFile=sample_inside3.pdf sample3.pdf
   **** Error:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** However, the output may be incorrect.

$ pdffonts sample_inside3.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
UIMOTI+Courier-Bold                  Type 1C           WinAnsi          yes yes no       8  0
EWJIGE+Courier                       Type 1C           WinAnsi          yes yes no      10  0
DJWMPM+Courier-BoldOblique           Type 1C           WinAnsi          yes yes no      22  0
ZOHUJJ+Courier                       CID TrueType      Identity-H       yes yes yes    880  0

■他のフォントを探してみる。

$ find /etc/ghostscript/ -type f | xargs cat | awk '{gsub("/|-.*","",$1);a[$1]+=1}END{for(n in a){print n}}' | column
CenturySchL		Helvetica		GothicBBB		Courier			MSung
URWChanceryL		Palatino		Symbol			HYSMyeongJo		MKai
NimbusSanL		HelveticaNarrow		STSong			URWPalladioL		NewCenturySchlbk
Ryumin			NanumGothic		BousungEG		URWBookmanL		Song
ShanHeiSun		MHei			AvantGarde		STHeiti			ZenKai
Bookman			GBZenKai		Times			StandardSymL		HeiseiMin
HeiseiKakuGo		MOESung			STKaiti			Dingbats
NanumBarunGothic	STFangsong		ZapfChancery		Japanese
NimbusRomNo9L		HYRGoThic		ZapfDingbats		URWGothicL
HYGoThic		NanumMyeongjo		Adobe			NimbusMonL

■日本語は無いPDFなので今回は以下は使う必要は無い。

$ find /etc/ghostscript/ -type f | grep ^/Japanese `xargs` | awk '{gsub(".*/","",$1);print $1}'
Japanese-Mincho-Regular-JaH
Japanese-Gothic-Regular-JaH
Japanese-Mincho-Regular
Japanese-Gothic-Regular

$ find /etc/ghostscript/ -type f | grep ^/Adobe `xargs` | awk '!/Korea|CNS|GB/{gsub(".*/","",$1);print $1}'
Adobe-Japan2
Adobe-Japan2-Bold
Adobe-Japan1
Adobe-Japan1-Bold

■Helveticaに変更してみる。

$ find /etc/ghostscript/ -type f | grep ^/Helvetica `xargs` | awk '{gsub(".*/","",$1);print $1}'Helvetica
Helvetica-Bold
Helvetica-Oblique
Helvetica-BoldOblique
Helvetica-Narrow
HelveticaNarrow
Helvetica-Narrow-Bold
HelveticaNarrow-Bold
Helvetica-Narrow-Oblique
HelveticaNarrow-Oblique
Helvetica-Narrow-BoldOblique
HelveticaNarrow-BoldOblique

$ cp sample.pdf sample4.pdf; \
  sed -i -e 's/Calibri-BoldItalic/Helvetica-BoldOblique/g' sample4.pdf; \
  sed -i -e 's/Calibri-Bold/Helvetica-Bold/g' sample4.pdf; \
  sed -i -e 's/Calibri/Helvetica-Oblique/g' sample4.pdf

$ pdffonts sample4.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Helvetica-Oblique                    TrueType          WinAnsi          no  no  no       7  0
Helvetica-Bold                       TrueType          WinAnsi          no  no  no       8  0
Helvetica-Bold                       TrueType          WinAnsi          no  no  no       9  0
TimesNewRomanPSMT                    TrueType          WinAnsi          no  no  no      10  0
Helvetica-BoldOblique                TrueType          WinAnsi          no  no  no      16  0
Helvetica-Oblique                    TrueType          WinAnsi          no  no  no      17  0
BBCDEE+Helvetica-Oblique             CID TrueType      Identity-H       yes yes yes   1397  0
Times-Roman                          Type 1            WinAnsi          no  no  no    1837  0

$ gs -q -dNOPAUSE -dBATCH -dPDFSETTINGS=/prepress -sDEVICE=pdfwrite -sOutputFile=sample_inside4.pdf sample4.pdf
   **** Error:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** However, the output may be incorrect.

$ pdffonts sample_inside4.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
NTJNVO+Helvetica-Bold                Type 1C           WinAnsi          yes yes no       8  0
HVGHFF+Helvetica-Oblique             Type 1C           WinAnsi          yes yes no      10  0
CEFOXW+Helvetica-BoldOblique         Type 1C           WinAnsi          yes yes no      22  0
ZOHUJJ+Helvetica-Oblique             CID TrueType      Identity-H       yes yes yes    880  0