Phylum Package Score

Numeric risk score for your open source packages.

Introduction

The Phylum Package Score is an easy-to-understand score representing the overall reputation of an open source package. The objective is to help you quickly triage and act on issues.

Similar to a credit score that captures your overall credit rating, the Phylum Package Score captures analytics, heuristics, and machine learning models applied to open source software dependencies.

The Package Score is measured between 0-100. Higher values are "better" or "safer" compared to packages with lower scores. Phylum's big data technology modifies the score higher or lower based upon identified characteristics in the package under test.

Why a single value score?

A single value score provides sufficient fidelity relating complex information in a way that can be easily used in modern development practices. Over time, Phylum will apply thousands of analytics to modify the score assigned to a package. Some of these analytics, like remote code execution vulnerabilities or active malware, will have a huge impact on the Phylum Package Score. Other analytics will have a much smaller impact. The score between 0-100 allows nuanced mapping of large amounts of data.
We favor this classification approach over a common severity scale of high/medium/and low. The score between 0-100 is more easily used in automation because there is more detail to build successful security policies around.

Risk Domains

The Phylum Package Score is made up of five key domains of risk: malicious code, technical debt, license, author, and software vulnerability.

1246
  1. Malicious Code Score - captures malware, backdoors, and other types of malicious code. Examples of risk analytics that modify the Malicious Code Score:
    • High-entropy data blobs or strings ending in an evaluation function
    • Download, decrypt, execute call patterns
    • Dynamic function resolution behaviors
    • Dynamic module loading behaviors
  2. Technical Debt Score - encompasses engineering risk and technical debt. Examples of risk analytics that modify the Technical Debt Score:
    • Abandoned packages
    • Packages with 1 author or maintainer
    • Packages without tests or sufficient test coverage
  3. License Score - evaluates the commercial friendliness of software licenses and the packages change over time. Examples of risk analytics that modify the License Score:
    • Presence of non-commercial friendly licenses in the package or dependencies
    • How frequently licenses change in the package and its dependency graph
    • Likelihood of future changes to licenses in the package and its dependency graph
  4. Author Score - assesses author behavior, reputation, and risk to the package. Examples of risk analytics that modify the Author Score:
    • Has the author previously committed vulnerabilities to other software
    • Has the author previously committed malicious code to other software
    • Age of author account
    • Overall open source contributions
    • Does the author's identity map to other online identities (Twitter, Stack Overflow, Quora, etc.)
  5. Software Vulnerability Score - encapsulates the domain of software vulnerabilities. Examples of risk analytics that modify the Software Vulnerabilities Score:
    • Severity and impact of the vulnerability
    • Difficulty in exploitation of the vulnerability
    • Age of the vulnerability
    • Presence of patch to the vulnerability

How is the Phylum Package Score calculated?

First, the Phylum system ingests and processes massive amounts of information about a package and the dependencies to that package. Next, analysis occurs on the dataset using analytics, heuristics and machine learning models.

The ingested dataset includes:

  • Static analysis of package source code
  • File analysis of all files in package
  • Commit history analysis of any attached source code repositories
  • Metadata analysis of all artifacts captured from package manager and hosting repository
  • Known vulnerabilities for a package-version iteration
  • Commit analysis of prior and new authors
  • Author reputation from previous activities and behaviors
  • Full composition analysis of all dependencies required for package use

This data set is maintained and curated over the lifetime of the package. As authors, source code, files, and other artifacts are added or removed over time, new data triggers updates to the Phylum Package Score.

Analytics, Heuristics and Machine Learning

The analysis layer combs over the package data to identify low indicators of risk and combines them with other associated information to extract high indicators of risk. The techniques that are used to extract this information vary, but can be lightly collected into analytics, heuristics and machine learning.

These techniques operate on the Phylum platform continuously to extract meaningful indicators to better understand the risk in using an open source package. Once these indicators have been identified, they are weighted and combined with other indicators to create the Phylum Package Score.

Example

An example highlighting how low indicators of risk can be combined into high indicators:

Using time-series analysis, we can understand how a package author typically commits source code. We can observe times of day, sizes of commit, how comments are used, variable names and more to enumerate a fingerprint that is representative of that author. These features can be used to model the author's behavior using machine learning.

If we observe the author's identifier (e.g. GitHub email address) in a password breach dataset and notice a divergence from the normal fingerprint, we may have indication of malicious activity. This combination of analytics, heuristics, and machine learning can identify when an attacker may have recovered or stolen an author's credentials and used them perform unauthorized activity on source code that others rely upon.

What are some ways to use the Phylum Package Score?

Define and enforce policies for use of open source software as dependencies. By setting thresholds using either the Phylum CLI tool or Phylum User Interface, a user can define the policies by which dependencies with risk attributes can be controlled.

A user can also get started by disallowing any packages with a Phylum Package Score under 50. This can be done easily in Phylum's UI or CLI tool. This can be integrated into a variety of places for the developer and devops automation systems in use today.

More mature policies might define:

  • Packages with scores below 50 block builds during test execution
  • Packages with scores between 51 and 65 send a warning message to the security team and developer
  • Packages with scores that have dropped more than 15 points in the past 30 days send a warning message to the security team and developer
  • Packages that are severely abandoned or depend on abandoned packages will be alerted for 90 days, but will block builds after 90 days