In the modern landscape of web application development, the seamless exchange of data has become a necessity. Whether for storage, communication, or logging purposes, the movement of data is intrinsic to the functioning of these applications. This holds true, especially for sensitive domains like finance and healthcare, where the accuracy and security of information are paramount.
When applications engage in data transfers, they often encounter the need to adhere to a standardized data format. This ensures that information is uniformly presented, allowing different systems to interpret and process it correctly. To achieve this, applications must perform a two-way conversion. Initially, they transform their proprietary data format into the established standard before transmitting it. Subsequently, upon receiving data, they decode the standard format and convert it back into their own proprietary structure.
However, this seemingly innocuous process can introduce a significant vulnerability known as insecure deserialization.
Before we delve into the details of insecure deserialization, let's pause for a moment to explore two important concepts: serialization and deserialization. These concepts lay the foundation for how data is handled in the digital world.
Serialization vs Deserialization
Serialization is the process of converting objects or complex data structures into a specific structured data format, which can be sent and received as a sequential stream of bytes. This conversion is particularly valuable for transmitting or storing data while maintaining its original structure and attributes. A prime example is converting Java Entity classes into JSON format for seamless communication with other services or clients.
Serialization offers several advantages, such as efficient storage, streamlined network communication, and improved interoperability. Notably, serialized objects retain their complete state, including attributes and corresponding values, ensuring precise reconstruction during deserialization.
Whereas deserialization is the process of reconstructing objects or intricate data structures from a specific structured data format, such as a sequential stream of bytes, back into their original in-memory representation. This operation is crucial for restoring data to its original state after it has been serialized, enabling the seamless retrieval of complex information.
Deserialization plays a vital role in scenarios where serialized data needs to be utilized or processed. Just as serialization allows for efficient storage, network communication, and interoperability, deserialization ensures the data's revival and reintegration into the application's logic.
Finally, it can be said that the purpose of object serialization is to:
- Store data in a persistence storage
- Transfer data
- Store data to files
What is insecure deserialization?
Insecure deserialization vulnerability refers to a security flaw that arises when an application improperly handles the process of deserializing data, leading to potential security risks. Meaning that it emerges when user-controllable data is deserialized by a web application.
Leveraging this vulnerability, attackers might potentially alter serialized objects and introduce malicious data into the application's logic. By doing so, he is able to replace a serialized object for an object of a different class. As a result, the vulnerability is also called "object injection" because it allows for the introduction of unexpected objects into an application's logic as a result of incorrect deserialization handling.
Why does deserialization vulnerability arise?
Deserialization vulnerabilities arise primarily due to a lack of proper validation and secure handling during the process of converting serialized data back into its original form. This makes the situation risky and needs a closer look at the reasons behind it.
One foundational aspect is the trust vested in serialized data during the deserialization process. Applications inherently assume that the serialized data remains untampered and secure. Exploiting this trust, malicious actors manipulate serialized data to embed harmful code or unexpected elements, undermining the system's integrity.
Further complicating matters is the often inadequate validation of incoming serialized data. The absence of rigorous validation leaves room for even structurally irregular data to undergo deserialization, inadvertently spawning unintended objects. This gap in validation can lead to the creation of unexpected objects that could potentially compromise the application's intended behavior.
Here's an example of how an insecure deserialization vulnerability can occur in the context of a PHP-based web application.
Let's assume you have a web application that uses serialized data to store user preferences. Users can save their preferred settings, and the application serializes these settings for storage in a database. When a user logs in, the application retrieves the serialized data from the database and unserializes it to apply the saved preferences. The application trusts that the serialized data is safe.
Here's a simplified example of how this could be exploited:
- User Profile Class:
- Vulnerable code in web application
In this scenario:
- The web application stores user preferences in a cookie named user_preferences. These preferences are serialized.
- Upon login, the application retrieves the serialized data from the user's cookie and unserializes it to apply the user's preferences.
- The attacker realizes that the application is vulnerable to insecure deserialization because it doesn't properly validate or sanitize the data retrieved from the cookie.
- To exploit this vulnerability, the attacker crafts a malicious cookie:
Cookie: user_preferences=O:10:"UserProfile":2:{s:8:"username";s:5:"Alice";s:7:"isAdmin";b:1;}
- When the attacker accesses the application with this crafted cookie, the application unserializes the data and sets isAdmin to true, making the attacker an admin user.
- The attacker can now access admin-only functionality, escalate privileges, or perform malicious actions as an admin user.
A pentester’s approach to discover it while in a pentest
Having access to the source code during a penetration test is undeniably advantageous. With source code access, a pentester can readily pinpoint where objects are serialized and thoroughly assess if they are susceptible to insecure deserialization vulnerabilities. This level of transparency enables a more targeted and efficient testing approach, as vulnerabilities can be scrutinized directly at their source.
However, when it comes to black-box penetration testing, the situation becomes more challenging. In such scenarios, the pentester doesn't have access to the source code and must rely on a different set of skills. They need to identify where data is being serialized and passed to the backend, a task that requires a keen eye and an understanding of serialized data formats used by different programming languages.
An important thing to keep in mind is that serialized objects are usually hidden in a format called base64. You can often find them in places like cookie headers or in the parameters of web requests. So, during a penetration test, it's vital to carefully inspect HTTP requests for any of these hidden base64-encoded strings. This is where you're likely to uncover potential issues related to insecure deserialization.
This section of the blog will cover a few examples from PHP, Java and Python Deserialization.
PHP Serialization Format: PHP employs the serialize() and unserialize() functions for serialization and deserialization. PHP serialization is relatively human-readable. For instance, consider this PHP serialized object:
O:4:"User":2:{s:4:"name";s:6:"Alice";s:7:"isAdmin";b:1;}
This serialized object can be interpreted as follows:
O:4:"User" - An object with a class name of 4 characters, "User."
2 - The object has 2 attributes.
s:4:"name" - The key of the first attribute is a string with 4 characters, "name."
s:6:"Alice" - The value of the first attribute is a string with 6 characters, "Alice."
s:7:"isAdmin" - The key of the second attribute is a string with 7 characters, "isAdmin."
b:1 - The value of the second attribute is a boolean, true, indicating that the user has admin privileges.
Java Serialization Format: Java, on the other hand, utilizes a binary serialization format that's less human-readable. However, it has distinctive signature bytes at the start, typically "AC ED 00 05" in hexadecimal or "rO0" in base64-encoded examples.
Python Serialization Format: Python relies on the "pickle" library for serialization and deserialization. When Python code gets deserialized using "pickle," it executes within the context of the underlying Python process. Unlike Java and PHP, Python serialization doesn't have a specific format. Identifying it may require knowledge of the technology stack running in the backend. Tools like "Whatweb" can help determine the backend technology.
Up to this point, we've covered Serialization and Deserialization, including their emergence and diverse deserialization formats. In the next section of this blog, we'll explore real-world penetration testing. Here, we'll observe how penetration testers address serialization vulnerabilities to achieve remote code execution.