What are proxies and how can we detect them?

Proxies come in many different shapes and sizes, some of them are easy to detect, others much harder. Some are used for illegitimate purposes, others are used as standard practise by corporations. In this post, we’ll explore the different types of proxies, what they can be used for, and how to detect them.

What is a proxy?

A proxy is a simple server which forwards requests on your behalf. For example, if you’re using a proxy to access google.com, your request will first be sent to the proxy server, which will then in turn make the actual request to google.com, and return the result back to you.

To help explain it, let’s look at a simplified diagram of how your computer connects to a website.

And here’s how a proxy fits in…

How is a proxy different from a VPN?

This all looks very similar to a VPN, as we explored in our Tor Detection post, but there are some key differences.

When you connect to a VPN, it establishes a secure, encrypted connection between your computer and the Internet. Your computer is connected to a remote and Virtual Private Network, so all Internet traffic will be sent over this connection, regardless of whether it’s from your web browser, a native application, or even your operating system checking for updates. Because of this, the VPNs are much heavier and have the potential to add more latency to your traffic. However, they’re more fully-functioning and generally more secure.

Traffic from VPNs can look identical to proxy traffic, so from a detection perspective there’s no difference – the IP address of the VPN or proxy can be looked up against a database or service, such as ipdata.

Underlying technology

The term “proxy” covers any server that forwards data on behalf of another user or server. Due to the broad coverage, there are many different types and styles of proxies, but most of these will use either HTTP or SOCKS as an underlying technology.

HTTP Proxies

An HTTP proxy is one of the simplest types, and can be written in just a few lines of code. In Node.js, the module node-http-proxy makes the process very simple.

const httpProxy = require('http-proxy');
httpProxy.createProxyServer({ target:'https://dogtreats.com' }).listen(8000);

Now, if we run this, any requests to port 8000 will be forwarded on to https://dogtreats.com and look as if they originated from the server where this is running.

SOCKS Proxies

A SOCKS proxy runs at a lower-level than a HTTP proxy, making it far more versatile. SOCKS can forward TCP and UDP connections and is therefore more commonly used than HTTP proxies. A SOCKS proxy can also be run over SSH using OpenSSH, allowing users to securely connect to servers.

Tor is a system which effectively links multiple SOCKS proxies together to provide strong anonymity, at the cost of speed and some convenience. Read more about Tor detection here.

Proxy Categories

Any of these proxy types can use any of the proxy technologies, such as SOCKS. A proxy can also fall into multiple categories!

Transparent proxies simply pass on the request from the user to the destination, without making any modifications at all. Non-transparent proxies make some modification to the request, such as adding HTTP headers. Some proxies add an [x-forwarded-for](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For) header which includes the original IP address of the user, this would be an example of a non-transparent proxy.

Open proxies are simply proxies which are accessibly by anyone on the internet, sometimes by mistake. Due to their open nature, they’re often used to provide anonymity and are frequently used for malicious activity. Using open proxies is extremely dangerous, unless you totally trust the proxy service you’re connecting to, as they may be snooping on all your internet usage.

Corporate proxies are proxies run by corporations, usually hosted on their own servers. Many companies use proxies to control access to the internet and their networks from their employees.

Private proxies are very similar to open proxies, but they require some kind of registration and usually promise to be more secure than open proxies.

Detecting proxies

It’d be impossible to detect all proxy servers, but some can be detected. Proxy providers continually change their IP addresses to try to avoid detection.

There are multiple lists of known proxy IP addresses available online, and ipdata combines many of them, along with proprietary lists to detect the larger proxies.

Additionally, proxies are reasonably likely to be hosted by a cloud hosting provider, such as AWS or OVH. Traffic from hosting providers can usually be detected using the IP address, but of course there are sometimes legitimate reasons for a hosting provider to be calling your service – especially if you’re running an API.

Here’s an example ipdata response for a proxy:

curl https://api.ipdata.co/54.39.133.108?api-key=test


{
    ip: "54.39.133.108",
    is_eu: false,
    city: null,
    region: null,
    region_code: null,
    country_name: "Canada",
    country_code: "CA",
    continent_name: "North America",
    continent_code: "NA",
    latitude: 43.6319,
    longitude: -79.3716,
    postal: null,
    calling_code: "1",
    flag: "https://ipdata.co/flags/ca.png",
    emoji_flag: "🇨🇦",
    emoji_unicode: "U+1F1E8 U+1F1E6",
    asn: {
        asn: "AS16276",
        name: "OVH SAS",
        domain: "ovh.com",
        route: "54.39.0.0/16",
        type: "hosting"
    },
    languages: [
        {
            name: "English",
            native: "English"
        },
        {
            name: "French",
            native: "Français"
        }
    ],
    currency: {
        name: "Canadian Dollar",
        code: "CAD",
        symbol: "CA$",
        native: "$",
        plural: "Canadian dollars"
    },
    time_zone: {
        name: "America/Toronto",
        abbr: "EDT",
        offset: "-0400",
        is_dst: true,
        current_time: "2020-09-04T12:42:43.132643-04:00"
    },
    threat: {
        is_tor: false,
        is_proxy: false,
        is_anonymous: false,
        is_known_attacker: false,
        is_known_abuser: false,
        is_threat: false,
        is_bogon: false
    }
}

Here, we can see ipdata hasn’t currently got this IP address on a known proxy list (threat.is_proxy ), but the request is coming from a hosting provider ( asn.type).

Blocking proxies

Due to the difficulty in reliably detecting proxies, it’s usually not worth spending much time trying to block proxies. However, some proxies will be caught by threat.is_proxy and generally, ipdata will help protect your site from threats if you block all requests where threat.is_threat == true.

There are also many, totally legitimate, use-cases for proxies. Corporate proxies are common and blocking legitimate traffic from these companies could be costly!