Monday, February 2, 2009

Extract and merge pages from PDF document

I had taken great care in typesetting my thesis. Informative running headers (current chapter/ section name that shows up on the top of the page), correct margins when typesetting one-sided or two-sided and other fine points. 

The LaTeX class file that I wrote for my thesis is based on the book class and typesets  page numbers at the top outer side.  In keeping with typographical conventions, and also because it looks nicer, the first page of the major sectioning unit, chapter or part is handled differently, namely, the page number at the bottom center.

Now grad studies insists that the page numbers be typeset uniformally throughout the thesis: either all at the top and all at the bottom.  It was not that hard to change the LaTeX class to have it put all the page numbers at the top outer side. But I had already printed my thesis on bond paper, and there was no point in incurring the expenditure again.  So all I needed was to extract the chapter and part pages from the "corrected" PDF file, print them and replace the faulty ones in the print. Hence the need to extract and merge pages from a PDF file.

Well, ghostscript (gs) is our friend and will do the job nicely. In a nutshell, the following command  will extract pages into individual files:     

 gs -dBATCH -dNOPAUSE -dFirstPage=<first_page> -dLastPage=<last_page> -sDEVICE=pdfwrite -sOutputFile=<output_file>

while the following will merge multiple pdf files in the space delimited list <input_files> into one

 gs  -dBATCH -dNOPAUSE  -sDEVICE=pdfwrite -sOutputFile=<output_file> <input_files>

Here's the shell script that I used to extract the chapter pages and merge into one PDF file


 declare -rx SCRIPT=${0##*/}
 declare -i extract_flag=0
 declare -i merge_flag=0

 declare -a p_num_all=( \
 1 6 7 11 21 33 42 44   \
 54 55 61 73 87 95 109 111 \
 118 119 125 138 147 163 165  \
 170 172    \

 declare -i num_pages=${#p_num_all[*]}
 declare -i offset=12
 declare i
 declare f_name f_name_all
 declare out_file=chapter_pages.pdf

 case $1 in
 0) extract_flag=1
 1) merge_flag=1
 2) extract_flag=1
 *) printf "usage: $SCRIPT  <0|1|2>  <input_file> \n"

 for (( i=1; i<=num_pages; i++ )) ; do   

     p_num_off=$(expr $p_num + $offset)   
     f_name_all="$f_name_all $f_name"

     if [ $extract_flag -eq 1 ] ; then
     printf "extracting page %d to file %s \n" $p_num $f_name
     gs  -dBATCH -dNOPAUSE       \
     -dFirstPage=$p_num_off -dLastPage=$p_num_off   \
     -sDEVICE=pdfwrite -sOutputFile=$f_name  \

 if [ $merge_flag -eq 1 ] ; then
     gs  -dBATCH -dNOPAUSE      \
     -sDEVICE=pdfwrite -sOutputFile=$out_file \

Sunday, July 27, 2008

Linux mount ntfs hfsplus partition

okay, this pretty much drove me crazy today. Here are some entries from /dev/disk/by-label lrwxrwxrwx 1 root root 10 Aug 4 03:07 Neumann -> ../../sdb5 lrwxrwxrwx 1 root root 10 Aug 4 03:07 Newton -> ../../sdb1 lrwxrwxrwx 1 root root 10 Aug 4 03:07 opt -> ../../sda6 lrwxrwxrwx 1 root root 10 Aug 4 03:07 root -> ../../sda2 the respective file-systems and mount-points are (from /etc/fstab) LABEL=root / ext3 defaults 1 1 LABEL=opt /opt ext3 defaults 1 2 /dev/disk/by-label/Newton /Newton ntfs ro,umask=0222 0 0 /dev/disk/by-label/Neumann /Neumann hfsplus defaults 1 2 #LABEL=Newton /Newton ntfs ro,umask=0222 0 0 #LABEL=Neumann /Neumann hfsplus defaults 1 2 See the last two commented out entries starting with LABEL=Newton, and LABEL=Neumann? So now the million-dollar question is why is it that I'm able to mount the ext3 partitions / and /opt using their labels, while I've to use an explicit device path for the ntfs and hfsplus partitions (volums)? Mind you, the ntfs and hfsplus volumes do have labels, Newton, and Neumann, respectively. I put them there using gparted. Also the corresponding links are very much there in /dev/disk/by-label. I have a hunch that somebody somewhere is not able to read the ntfs/hfsplus labels, which are written on the partition itself (another guess) maybe in the first few sectors of the partiton, like the MBR. Meanwhile, if you are looking to mount your ntfs/hfsplus volumes and using labels, hopefully my experience will save you some time and frustration.

Friday, May 9, 2008

In TeX Command List -> LaTeX, changed %`%l%(mode)%`%t to %`%l%(mode)%t. The new command gets rid of the "\input" in the command line, and now works with a pre-compiled preamble. The file to be processed still appears as main.tex on the command line (via %t) Should get rid of the .tex extension.

Wednesday, May 7, 2008

Generating LeTeX preamble format (.fmt) file

I used to use a compiled preamble, but since i really didn't see a difference in the processing times, perhaps because my dynamic part in my documents is usually quite big, and also because i now have a quad-core machine (gasp!). Anyhow, so i stopped compiling, and over time, forgot the way to generate the .fmt file. Lately, have been doing some TeX-ing on my laptop, which obviously lacks the desktop's muscle, hence the need for a pre-compiled preamble. Searching on the net brought me to the blog [1], which gives a pretty good description of the process. First, some remarks about the command below, looking at which today, sort of connected together bits of knowledge i have about the inner workings of the TeX system. Also somewhat explains the various options in the command, which even to a seasoned LaTeX user would seem obscure. latex -ini "&latex preamble.tex \dump" so now, LaTeX (and it's other avatars), in my understanding, is just a front end; in that it provides a number of macros to TeX, which is the real typesetting engine. The macros are (pre-)compiled and stored in a format file, which in the case of LaTeX would be latex.fmt. So, ``&latex" in the command above, simply states that latex.fmt be loaded. Now, let's look using the compiled preamble while compiling the actual document main.tex. The first line of main.tex is: %&preamble well, other than the comment symbol ``%", we've figured out what's going on: the ``&" operator tells TeX to load the (binary ) format file preamble.fmt As I see on my Powerbook G4, the time saved in the compile cycle is significant when the compiled preamble is used. References:
  2. TeX info manual: memory dump
  3. LaTeX info manual