Adding Location Data to ELK With GeoIP

You already know that Elasticsearch excels at searching. The way they managed to simplify queries and information is amazing. While databases are good at storing data, search engines are good at … well … searching, of course. In our particular case, we won’t be talking about searching, phrase matching, filtering, or retrieving data. This post is about how ELK’s Logstash geoip-filter can read and parse geographic data out of IP addresses (GeoIP). I know, magic!

Later in the post, we’ll discuss what to do when you don’t have ELK available or when a quicker solution is needed. I’m going to assume you already installed, configured, and ran Elasticsearch (and the rest of the ELK stack). If you haven’t, you might want to pause here and come back when everything’s ready.

So, let’s get started!

GeoIP and ELK

As I mentioned before, Elasticsearch is an amazing tool. And it only gets better when paired with Kibana and Logstash. And Logstash is precisely what we’re here to discuss. You probably know that Logstash is what sits between your data and your database for final archiving and analyzing. It can handle an increasing number of sources and feeds—not just Elasticsearch, but also regular databases or email services. It’s a very powerful tool.

Moreover, Logstash can parse, transform, and filter the data. One of those filters happens to be the geoip-filter. So a common use case is to analyze the access logs of a server; you can very easily ingest them (letting Logstash monitor the logs) or feed them directly to the application. The recommended way to start is to download and install the geoip-filter, as recommended by the official documentation.

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-geoip

However, if you run this command on more recent versions, you may find an error as you run that command.

That means that this plugin is actually part of the Elasticsearch distribution, as noted in the docs.

The way the GeoIP filter works is very simple: Elasticsearch ships with a database of IP addresses and their geographic information. So, when the tool parses the IP, it automatically maps it to a geopoint (latitude and longitude), which can later be mapped in the Visualize tab as follows.

Here, I’m using the sample data that ships with Elasticsearch to map the unique visitors to our sample website, sorted by country. If you’re curious enough, you can also jump in the Discover tab and click around your sample data; you’ll find the geolocation field, noted by the world icon right next to it.

The filter basically groups the latitude and longitude together, useful for pinpointing the markers. Easy!

What If I Don’t Have ELK?

The objective remains the same: How can I get visualization or data based on the IP address? This is a feature that’s normally used in websites, but nowadays, being able to obtain geoinformation is more and more important in a growing number of situations. You may serve API endpoints globally, or perhaps you’d like to filter the content shown on your app based on the location of your users. Another use case might come from your marketing department. They’ll probably ask you to count and perform operations and visualizations on the sales funnel for different locations. Either way, you need GeoIP data.

But sometimes using ELK might be a bit of an overkill. In the cases above, setting up an ELK stack might be even more complicated than creating the functionality itself. That can be due to a number of things. Simplicity is the first one that comes to mind. There are a number of tools and services out there that can also serve GeoIP data—normally using a database. Some of them might even be free. As a matter of fact, the Logstash default database for GeoIP data is free and publicly available from the MaxMind website (though previous registration is required).

Now that would require you to find the right database, download it, compile it (the relationships between the files might not work for you), and distribute it across your codebase. Make sure it has all the information you need (the available data changes on a per-database basis).

Did I mention those databases are normally different files? CSV formatted, too. I know, very 1995. Last but not least, you’ve got to check periodically for changes and updates to the data. Once you have all of that, you need to add it to the infrastructure, whether that’s on a separate service in your codebase or you do decide to keep the files. And we haven’t even talked about accessing the data across different languages and environments.

What do you do?

The Alternative

Well, you’re in luck. There’s a service called ipdata that will get you up and running in no time. Let me show you how that works. Performing a simple request to their service (I’ll discuss how to do that in a minute) returns a JSON object with useful information relevant to the geographical location you’re requesting.

{
    "ip": "81.20.41.113",
    "is_eu": true,
    "city": "St Albans",
    "region": "England",
    "region_code": "ENG",
    "country_name": "United Kingdom",
    "country_code": "GB",
    "continent_name": "Europe",
    "continent_code": "EU",
    "latitude": 51.7768,
    "longitude": -0.2843,
    "postal": "AL4",
    "calling_code": "44",
    "flag": "https://ipdata.co/flags/gb.png",
    "emoji_flag": "\ud83c\uddec\ud83c\udde7",
    "emoji_unicode": "U+1F1EC U+1F1E7",
    "asn": {
        "asn": "AS29033",
        "name": "Eckoh UK Limited",
        "domain": "eckoh.com",
        "route": "81.20.32.0/20",
        "type": "isp"
    },
    "languages": [
        {
            "name": "English",
            "native": "English"
        }
    ],
    "currency": {
        "name": "British Pound Sterling",
        "code": "GBP",
        "symbol": "\u00a3",
        "native": "\u00a3",
        "plural": "British pounds sterling"
    },
    "time_zone": {
        "name": "Europe/London",
        "abbr": "BST",
        "offset": "+0100",
        "is_dst": true,
        "current_time": "2020-05-21T18:07:37.049274+01:00"
    },
    "threat": {
        "is_tor": false,
        "is_proxy": false,
        "is_anonymous": false,
        "is_known_attacker": false,
        "is_known_abuser": false,
        "is_threat": false,
        "is_bogon": false
    }
}

Isn’t that amazing? The response times are equally as amazing as the information presented. The service runs globally on AWS, which allows them to respond in an average of 60 ms. That’s without the use of cache, too. Signing up for their service is very fast. I requested aFree Tier API Key and received a response very quickly to activate my account.

So let’s jump ahead. At this point, I’m sure you already have a number of use cases in mind. However, if you need ideas, the welcome email they send you includes a number of test calls. How’s that for inspiration?

Oh, by the way, did I mention ipdata also handles IPv6? That may not seem like much. It may not even ring a bell, but believe me. It’s a thing. You want your service to be able to handle both versions of the protocol.

So How Does It Work?

Pretty easily. Here’s a sample of a simple curl request. Use the following command:

curl https://api.ipdata.co/8.8.8.8?api-key=API_KEY

And very quickly, the results show up.

Wrapping Up

At its most basic functionality, ELK is a search engine, capable of transforming and displaying data much faster than a regular database manager can. The ELK stack is definitely a rising star in the development industry and not everyone can actually use all of its functionality. So, if you’re here, it means that you’re beginning to use it or perhaps more of a seasoned developer. Either way, you should feel lucky. After all, everyone can write MySQL … right?

However, the more tools you have available as a developer, the more prepared you’ll be when the next challenge arrives. Acquiring GeoIP information will always be a staple resource in commercial and corporate codebases. Not to mention how quickly it can spread to lower-scale environments due to the increased value of customizing content for your users. The possibilities are truly endless. So, if you’re using centralized logging while visualizing global threats in a real-time dashboard, ELK is your friend.

For anything else that requires quick turnarounds and a solution that works on all modern programming languages and platforms, you require something different. Then ipdata has your back.

This post was written by Guillermo Salazar. Guillermo is a solutions architect with over 10 years of experience across a number of different industries. While his experience is based mostly in the web environment, he's recently started to expand his horizons to data science and cybersecurity.