Packaging a Latex project
The problem
Whenever I’m working on a LaTeX project, I inadvertently end up with a bunch of files scattered throughout the project directory. Usually, not all of them are actually used to build the final document, and I’d like a way to package the project without any unnecessary files, so I can submit it to arXiv or somewhere else.
The not-so-good solutions
At first, I’ve thought of several naive ways to accomplish this:
- Submit each file manually via the arXiv interface. This is quite tedious, error-prone, and time-consuming, and arXiv offers an option to upload an archive of the project (
tar.gz
,zip
, or whatever) anyway, and thus, should only be used as a last resort. - Use
tar
to package just the necessary files. This basically has the same cons as the previous option, so it’s also not optimal. - Using git, run
git archive
to export an archive of the project.
Since I’ve been using git for almost all of my personal and professional projects, I’ve hastily opted for option 3; unfortunately, while working on the project, I’ve accidentally tracked a bunch of PDF figures and data files as well, and, as a result, I still had to semi-manually (read: with a lot of experimentation with grep
and various other utilities) prune the archived file.
This was still quite tedious, and I was hoping to use a less error-prone and more automated method in the future.
The automated solution
Out of curiosity, I’ve decided to just google for any options that might be available, and I’ve come across a tool built to solve this exact issue: bundledoc
.
This program is available on Debian/Ubuntu based Linux distributions as part of the texlive-extra-utils
package (also as texlive-core
in Arch), and can be installed by simply running:
apt install texlive-extra-utils
Note that bundledoc
makes use of the snapshot
LaTeX package, which is available in the texlive-latex-extra
package, so this package should be installed as well if apt
doesn’t do it automatically.
On macOS, the bundledoc
program and the snapshot
package are available on MacPorts, in the packages texlive-bin-extra
and texlive-latex-extra
.
Usage
Using bundledoc
is simple - just insert the following in your primary [NAME].tex
file, even before documentclass
:
\RequirePackage{snapshot}
Then just rebuild the project (using whatever CLI/GUI build system you prefer, my new favorite is latexmk
), and you should see an additional [NAME].dep
file somewhere in your project directory.
To package everything up into a compressed archive (tar.gz
), simply run:
bundledoc --localonly --manifest="" [NAME].dep
This will produce an archive [NAME].tar.gz
which contains just the necessary files for rebuilding your entire project from scratch.
The flags in the above command are not strictly mandatory, but without them, the created archive may be littered with a bunch of sty
files, which are usually available on the arXiv server anyway.
For more customization options, take a look at the man
page of bundledoc
.
To verify bundledoc
packaged everything correctly, just untar
the created archive in an empty directory on your computer (using tar xf [ARCHIVE]
), and then rebuild it there (again, using your build system of choice).
One of the perks of using bundledoc
is that you can readily submit this freshly-built archive to arXiv without the need for any additional changes.
Notable issues
While bundledoc
is pretty handy, there is one problem I’ve encountered though: the resulting compressed archive has the following structure:
├── NAME
│ ├── file1.tex
│ ├── file2.tex
│ ├── file3.tex
...
As you can see, there is a top directory inside the archive, which means this won’t work:
tar xf [ARCHIVE] && [BUILD_SYSTEM]
but there needs to be a cd
somewhere in between (or a flag given to [BUILD_SYSTEM]
which tells it where to build it from).
This isn’t a big deal on my own system, but can be an issue if the journal I’m submitting the project to has very strict guidelines on what the layout of the archive should be to correctly process it.
Unfortunately, giving an empty top directory name to bundledoc
doesn’t work:
$ bundledoc --localonly --directory="" --manifest="" [NAME].dep
Option directory requires an argument
Using anything other than the empty string works, but hacks such as using .
or /
to prevent it from making an actual top level directory will just give an error: bundledoc: File exists
.
The following Bash/Zsh script takes care of this particular problem by extracting the archive, without the top level directory, in a temporary location, and then re-packaging it in a new archive1:
# USAGE: remove_topdir [ARCHIVE]
remove_topdir(){
if [[ $# -ne 1 ]]
then
echo "USAGE: remove_topdir [ARCHIVE]"
return 1
fi
# the name of the archive
archive="${1}"
# check that the path exists and is a file
if ! [[ -f "${archive}" ]]
then
echo "ERROR: ${archive} does not exist or is not a file!"
return 2
fi
# the name of the output archive
archive_new="${archive/.tar.gz/-trimmed.tar.gz}"
# make a temp dir
tmp_dir="$(mktemp -d)"
# extract the contents of the archive in that dir
# we remove the top level dir in the achive
tar --strip-components=1 --extract --file="${archive}" -C "${tmp_dir}"
# re-compress it all in the new archive
tar cf "${archive_new}" -C "${tmp_dir}" .
# error handling
if [[ $? -ne 0 ]]
then
echo "Something went wrong!"
return 3
else
echo "New archive is located at ${archive_new}"
return 0
fi
}
Appending this to your ~/.bashrc
file allows running the following to get the right directory structure:
bundledoc --localonly --manifest="" [NAME].dep && remove_topdir [NAME].tar.gz
which creates a file [NAME]-trimmed.tar.gz
ready for uploading.
-
for why you can’t just pipe the uncompressed output directly back into tar, see this AskUbuntu answer ↩