The Basic Requirements
We’re using a Debian-based image for the purposes of this article. If you’re using a different base, you’ll need to adapt the displayed package manager commands accordingly. The official Node.js image is a suitable starting point that means you don’t need to manually install Node.
Puppeteer is distributed via npm, the Node.js package manager. It bundles the latest build of Chromium within its package, so theoretically an npm install puppeteer
would get you running. In practice, a clean Docker environment will lack the dependencies you need to run Chrome.
As it’s ordinarily a heavyweight GUI program, Chrome depends on font, graphics, configuration, and window management libraries. These all need to be installed within your Dockerfile.
At the time of writing, the current dependency list looks like this:
FROM node:latest
WORKDIR /puppeteer
RUN apt-get install -y \
fonts-liberation \
gconf-service \
libappindicator1 \
libasound2 \
libatk1.0-0 \
libcairo2 \
libcups2 \
libfontconfig1 \
libgbm-dev \
libgdk-pixbuf2.0-0 \
libgtk-3-0 \
libicu-dev \
libjpeg-dev \
libnspr4 \
libnss3 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libpng-dev \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
xdg-utils
The dependencies are being installed manually to facilitate use of the Chromium binary that’s bundled with Puppeteer. This ensures consistency between Puppeteer releases and avoids the possibilities of a new Chrome release arriving with incompatibilities that break Puppeteer.
Now run npm install puppeteer
in your local working directory. This will create a package.json
and package-lock.json
for you to use. In your Dockerfile, copy these files into the container and use npm ci
to install Puppeteer.
# (above section omitted)
COPY package.json .
COPY package-lock.json .
RUN npm ci
The final step is to make Puppeteer’s bundled Chromium binary properly executable. Otherwise, you’ll run into permission errors whenever Puppeteer tries to start Chrome.
# (above section omitted)
RUN chmod -R o+rwx node_modules/puppeteer/.local-chromium
You might want to manually install a specific Chrome version in customized environments. Setting the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD
environment variable before you run npm ci
will disable Puppeteer’s own browser download during installation. This helps slim down your final image.
At this point you should be ready to build your image:
docker build . -t puppeteer:latest
This is a fairly large build process which could take several minutes on a slower internet connection.
Using Puppeteer in Docker
Some special considerations apply to launching Chrome when you’re using Puppeteer in a Dockerized environment. Despite installing all the dependencies, the environment still looks different to most regular Chrome installations, so additional launch flags are required.
Here’s a minimal example of using Puppeteer inside your container:
const puppeteer = require("puppeteer");
const browser = await puppeteer.launch({
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--disable-setuid-sandbox",
"--no-sandbox",
]
});
const page = await browser.newPage();
await page.goto("https://example.com");
const ss = await page.screenshot({path: "/screenshot.png"});
await page.close();
await browser.close();
This demonstrates a simple script that launches a headless Chrome instance, navigates to a URL, and captures a screenshot of the page. The browser is then closed to avoid wasting system resources.
The important section is the arguments list that’s passed to Chromium as part of the launch()
call:
disable-gpu
– The GPU isn’t usually available inside a Docker container, unless you’ve specially configured the host. Setting this flag explicitly instructs Chrome not to try and use GPU-based rendering.no-sandbox
anddisable-setuid-sandbox
– These disable Chrome’s sandboxing, a step which is required when running as theroot
user (the default in a Docker container). Using these flags could allow malicious web content to escape the browser process and compromise the host. It’s vital you ensure your Docker containers are strongly isolated from your host. If you’re uncomfortable with this, you’ll need to manually configure working Chrome sandboxing, which is a more involved process.disable-dev-shm-usage
– This flag is necessary to avoid running into issues with Docker’s default low shared memory space of 64MB. Chrome will write into/tmp
instead.
Add your JavaScript to your container with a COPY
instruction. You should find Puppeteer executes successfully, provided proper Chrome flags are used.
No comments:
Post a Comment