Azure Pipelines Integration

Azure Pipelines Integration

Overview

Azure Pipelines supports several different source repositories. This integration works with repos
hosted on Azure Repos Git and GitHub.

Once configured for a repository, the Azure Pipelines integration will provide analysis of project dependencies from
a lockfile during a Pull Request (PR) and output the results as a comment in a thread on the PR.
The CI job will return an error (i.e., fail the pipeline) if any of the newly added/modified dependencies from the
PR fail to meet the project risk thresholds for any of the five Phylum risk domains:

  • Vulnerability (aka vul)
  • Malicious Code (aka mal)
  • Engineering (aka eng)
  • License (aka lic)
  • Author (aka aut)

See Phylum Risk Domains documentation for more detail.

NOTE: It is not enough to have the total project threshold set. Individual risk domain threshold values must be
set, either in the UI or with phylum-ci options, in order to enable analysis results for CI. Otherwise, the risk
domain is considered disabled and the threshold value used will be zero (0).

There will be no comment if no dependencies were added or modified for a given PR.
If one or more dependencies are still processing (no results available), then the comment will make that clear and
the CI pipeline job will only fail if dependencies that have completed analysis results do not meet the specified
project risk thresholds.

Prerequisites

The Azure Pipelines environment is primarily supported through the use of a Docker image.
The pre-requisites for using this image are:

  • Access to the phylumio/phylum-ci Docker image
  • Azure DevOps Services is used with an Azure Repos Git or GitHub repository type
    • Azure DevOps Server versions are not guaranteed to work at this time
    • Bitbucket Cloud hosted repositories are not supported at this time
  • An Azure token with API access, when the build repository is Azure Repos Git
    • Can be the default System.AccessToken provided automatically at the start of each pipeline build
    • Can be a personal access token (PAT) - see documentation
      • Needs at least the Pull Request Threads scope (read & write)
      • Consider using a service account for this token
  • A GitHub PAT with API access, when the build repository is GitHub
    • Can be a fine-grained PAT
    • Can be a classic PAT
      • Needs the repo scope or minimally the public_repo scope if private repositories are not used
      • See documentation
  • A Phylum token with API access
  • Access to the Phylum API endpoints
    • That usually means a connection to the internet, optionally via a proxy
    • Support for on-premises installs are not available at this time
  • A .phylum_project file exists at the root of the repository

Configure azure-pipelines.yml

Phylum analysis of dependencies can be added to existing pipelines or on it's own with this minimal configuration:

trigger:
  - main
pr:
  - main

jobs:
  - job: Phylum
    pool:
      vmImage: ubuntu-latest
    container: phylumio/phylum-ci:latest
    steps:
      - checkout: self
        fetchDepth: 0
      - script: phylum-ci
        displayName: Analyze dependencies with Phylum
        env:
          PHYLUM_API_KEY: $(PHYLUM_TOKEN)
          AZURE_TOKEN: $(AZURE_PAT)     # For Azure repos only
          GITHUB_TOKEN: $(GITHUB_PAT)   # For GitHub repos only

This single stage pipeline configuration contains a single container job named Phylum, triggered to run on pushes
or PRs targeting the main branch. It does not override any of the phylum-ci arguments, which are all either
optional or default to secure values.

Let's take a deeper dive into each part of the configuration:

Pipeline control

Choose when to run the pipeline. See the YAML schema trigger definition and pr definition
documentation for more detail.

# This is a CI trigger that will cause the
# pipeline to run on pushes to the `main` branch
trigger:
  - main

It is recommended to also enable PR validation for the target trigger branch(es). To do so for GitHub repos, use
the pr keyword. See the YAML schema pr definition documentation for more detail.

# This is a PR trigger that will cause the pipeline to run when
# a pull request is opened with `main` as the target branch.
# NOTE: This has no affect for Azure Repos Git based repositories
pr:
  - main

To enable PR validation for Azure Repos Git, navigate to the branch policies for the desired branch
(main in this example), and configure the Build validation policy for that branch.
For more information, see the documentation on PR triggers for Azure Repos Git hosted repositories,
PR triggers for GitHub, or more broadly events that trigger pipelines.

Job names

The job name can be named differently or included in an existing stage/job.

jobs:
  - job: Phylum  # Name this what you like

Pool selection

The pool is specified at the job level here because this is a container job. While Azure Pipelines
allows container jobs for windows-2019 and ubuntu-* base vmImage images, only ubuntu-* is supported by Phylum
at this time. Keeping that restriction in mind, the pool can be specified at the pipeline or stage level instead.
See the YAML schema pool definition documentation for more detail.

    pool:
      vmImage: ubuntu-latest

Docker image selection

The container is specified at the job level here because this is a container job where all steps
in the job are meant to run with the same image. The container can also be specified as a
resource at the pipeline level and then
referenced by name in individual steps of a job instead.
See the YAML schema jobs.job.container definition and resource definition
documentation for more detail.

Choose the Docker image tag to match your comfort level with image dependencies. latest is a "rolling" tag that
will point to the image created for the latest released phylum-ci Python package. A particular version tag
(e.g., 0.15.0-CLIv3.10.0) is created for each release of the phylum-ci Python package and should not change
once published.

However, to be certain that the image does not change...or be warned when it does because it won't be available
anymore...use the SHA256 digest of the tag. The digest can be found by looking at the phylumio/phylum-ci
tags on Docker Hub or with the command:

# The command-line JSON processor `jq` is used here for the sake of a one line example. It is not required.
❯ docker manifest inspect --verbose phylumio/phylum-ci:0.15.0-CLIv3.10.0 | jq .Descriptor.digest
"sha256:db450b4233484faf247fffbd28fc4f2b2d4d22cef12dfb1d8716be296690644e"

For instance, at the time of this writing, all of these tag references pointed to the same image:

    # NOTE: These are examples. Only one container line for `phylum-ci` is expected.

    # Be explicit about wanting the `latest` tag
    container: phylumio/phylum-ci:latest

    # Use a specific release version of the `phylum-ci` package
    container: phylumio/phylum-ci:0.15.0-CLIv3.10.0

    # Use a specific image with it's SHA256 digest
    container: phylumio/[email protected]:db450b4233484faf247fffbd28fc4f2b2d4d22cef12dfb1d8716be296690644e

Only the last tag reference, by SHA256 digest, is guaranteed to not have the underlying image it points to change.

Repository checkout

The phylum-ci logic for determining changes in lockfiles requires git history beyond what is available in a shallow
clone/checkout/fetch. To ensure the shallow fetch option is disabled for the pipeline, an explicit checkout step is
specified here, with fetchDepth set to 0. It is also possible to disable the shallow fetch option in the
pipeline settings UI. See the YAML schema steps.checkout definition documentation
for more detail.

      # Reference: https://learn.microsoft.com/azure/devops/pipelines/yaml-schema/steps-checkout
      - checkout: self
        fetchDepth: 0

Script arguments

The arguments to the script step are the way to exert control over the execution of the Phylum analysis.
The entry here will run as a script in the phylum-ci based container job.
See the YAML schema steps.script definition and container job documentation for
more detail.

The phylum-ci script entry point is expected to be called. It has a number of arguments that are all optional
and defaulted to secure values. To view the arguments, their description, and default values,
run the script with --help output as specified in the Usage section of the top-level README.md or
view the script options output for the latest release.

      # NOTE: These are examples. Only one script entry line for `phylum-ci` is expected.

      # Use the defaults for all the arguments.
      # The default behavior is to only analyze newly added dependencies against
      # the risk domain threshold levels set at the Phylum project level.
      - script: phylum-ci

      # Consider all dependencies in analysis results instead of just the newly added ones.
      # The default is to only analyze newly added dependencies, which can be useful for
      # existing code bases that may not meet established project risk thresholds yet,
      # but don't want to make things worse. Specifying `--all-deps` can be useful for
      # casting the widest net for strict adherence to Quality Assurance (QA) standards.
      - script: phylum-ci --all-deps

      # Some lockfile types (e.g., Python/pip `requirements.txt`) are ambiguous in that
      # they can be named differently and may or may not contain strict dependencies.
      # In these cases, it is best to specify an explicit lockfile path.
      - script: phylum-ci --lockfile requirements-prod.txt

      # Thresholds for the five risk domains may be set at the Phylum project level.
      # They can be set differently for CI environments to "fail the build."
      - script: |
        phylum-ci \
          --vul-threshold 60 \
          --mal-threshold 60 \
          --eng-threshold 70 \
          --lic-threshold 90 \
          --aut-threshold 80

      # Ensure the latest Phylum CLI is installed.
      - script: phylum-ci --force-install

      # Install a specific version of the Phylum CLI.
      - script: phylum-ci --phylum-release 3.10.0 --force-install

      # Mix and match for your specific use case.
      - script: |
        phylum-ci \
          --vul-threshold 60 \
          --mal-threshold 60 \
          --eng-threshold 70 \
          --lic-threshold 90 \
          --aut-threshold 80 \
          --lockfile requirements-prod.txt \
          --all-deps

Script Variables

The script step environment variables are used to ensure the phylum-ci tool is able to perform it's job.

A Phylum token with API access is required to perform analysis on project dependencies.
Contact Phylum or register to gain access.
See also phylum auth register command documentation and consider using a bot or group account
for this token.

Azure Repos Git Build Repositories

An Azure DevOps token with API access is required to use the API (e.g., to post notes/comments) when the build
repository is Azure Repos Git.
This can be the default System.AccessToken provided automatically at the start of each pipeline build for the
scoped build identity or a personal access token (PAT).

If using a PAT, it will need at least the Pull Request Threads scope (read & write).
The account used to create the PAT will be the one that appears to post the comments on the pull request.
Therefore, it might be worth using a bot or service account.
See the Azure DevOps documentation for using PATs to authenticate for more info.

If using the System.AccessToken, the scoped build identity it attaches to needs at least the
Contribute to pull requests permission. For example, to use the System.AccessToken on a project-scoped
identity, follow these steps:

  • Go to project settings
  • Select the Repos --> Repositories menu
  • Select the Security tab
  • Select the user {Project Name} Build Service ({Org Name})
    • NOTE: This user will only exist after the first time the pipeline has run
  • Ensure the Contribute to pull requests permission is set to Allow

See the Azure DevOps documentation for using the System.AccessToken and setting it's
job authorization scope.

GitHub Build Repositories

A GitHub PAT with API access is required to use the API (e.g., to post notes/comments) when the build
repository is GitHub.
This can be a fine-grained or classic PAT.

If using a fine-grained PAT, it will need repository access and permissions for read access to metadata and
read/write access to pull requests. See permissions required for fine-grained PATs for more info.

If using a classic PAT, it will need the repo scope or minimally the public_repo scope if private
repositories are not used. See documentation for scopes for more info.

Setting Values

Values for the PHYLUM_API_KEY and either AZURE_TOKEN or GITHUB_TOKEN environment variable (e.g., PHYLUM_TOKEN
and one of either AZURE_PAT or GITHUB_PAT in the example here) can come from the pipeline UI, a variable group,
or an Azure Key Vault. View the full documentation for how to set secret variables for more
information. Since these tokens are sensitive, care should be taken to protect them appropriately.

        env:
          # Contact Phylum (phylum.io/contact-us) or register (app.phylum.io/register)
          # to gain access. See also `phylum auth register`
          # (https://docs.phylum.io/docs/phylum_auth_register) command documentation.
          # Consider using a bot or group account for this token.
          # This value (`PHYLUM_TOKEN`) will need to be set as a secret variable:
          # https://learn.microsoft.com/azure/devops/pipelines/process/set-secret-variables
          PHYLUM_API_KEY: $(PHYLUM_TOKEN)

          # NOTE: These are examples. Only one `AZURE_TOKEN` entry line is expected, and
          #       only when the build repository is hosted in Azure Repos Git.
          #
          # Use the `System.AccessToken` provided automatically at the start of each pipeline build.
          # This value does not have to be set as a secret variable since it is provided by default.
          AZURE_TOKEN: $(System.AccessToken)
          #
          # Use a personal access token (PAT).
          # This value (`AZURE_PAT`) will need to be set as a secret variable:
          # https://learn.microsoft.com/azure/devops/pipelines/process/set-secret-variables
          AZURE_TOKEN: $(AZURE_PAT)

          # NOTE: A `GITHUB_TOKEN` entry is only needed for GitHub hosted build repositories.
          #
          # Use a personal access token (PAT).
          # This value (`GITHUB_PAT`) will need to be set as a secret variable:
          # https://learn.microsoft.com/azure/devops/pipelines/process/set-secret-variables
          GITHUB_TOKEN: $(GITHUB_PAT)

Alternatives

It is also possible to make direct use of the phylum Python package within CI.
This may be necessary if the Docker image is unavailable or undesirable for some reason.
To use the phylum package, install it and call the desired entry points from a script under your control.
See the Installation and Usage sections of the README file for more detail.