Skip to content
July 2, 2010 / cohodo

Putting Structure Into Application Logs

Most of us know that application logs can be a fantastically rich source of data, and if you can mine them effectively they can be an extremely valuable resource in lots of contexts. We make use of the app logs from the various Platform components in a number of ways – for trend analysis, monitoring, tracking deployments of new features, fault diagnosis and post-mortem analysis.

Unfortunately for us ops peeps, application logging is a big, moving target. We might dark deploy a version of the software with super verbose output, or we might introduce a new feature which renders our previous perl + regex home brew tool to pull out important values pretty useless, this means that the important data that we want to trend on in March 2009 needs a different regex to June 2010.

Standards! I hear you cry! It seems (to me) existing standards are good at getting the message into the “right” place in the log, which makes them easy to parse and pass around the network, but they rely on the fact that you will be saying the same thing in the logs over and over again – they rely on rigid structure, and a change to logs is the same as adding a new column to an sql database. This means that if you are trying to match values on their position – sometimes adding a new variable will render your previous match useless – and worst still you might not know about it until it is too late!

Basically, string matching just isn’t robust or portable enough for what we want to do so we’ve been adding structure to the data in our logs. We’re currently outputting logs from some of our services as json – this means that we can dynamically add and remove variables as we wish, we can potentially send the logs straight to an indexer, load them into a db, convert them to RDF, or process them using Hadoop-based tools. We can still do our perl based graph building – by converting the json to an array or a hash and pulling out the right fields – irrespective of where they appear in the output.

To get a fully jsonised log line we need to work with 3 main areas, firstly the application itself, the applications logging handler, and finally the centralised logging tool, here is some real output:

{
    "syslogDate" : "2010-06-28T15:24:45+01:00",
    "syslogFacility" : "local0",
    "syslogPriority" : "info",
    "host" : "somehost.talis",
    "Message" : {
        "Process": "WS",
        "Date": "2010-06-28 15:24:45,834",
        "Priority": "INFO",
        "Category": "MemoryUsageLogger",
        "Thread": "Memory-Usage-Logger-Thread",
        "id": "",
        "Message": {
            "Memory Usage" : {
                "Heap" : {
                    "init" : "60",
                    "committed" : "73",
                    "used" : "26",
                    "max" : "864"
                },
                "Non-Heap" : {
                    "init" : "23",
                    "committed" : "24",
                    "used" : "22",
                    "max" : "216"
                }
            }
        }
    }
}

The first part of the json comes from syslog-ng, this puts in the details around how syslog sees the Log Message, the first sub “Message” comes from Log4J, and is how Log4J sees the message and the final “Message” is from the application itself, which in this instance gives us some general information about Memory usage.

As well as making our lives easier, it also means we can correlate on events must more quickly, and easily. From the simple Json above – we could write a rule that identifies descrepancies between “syslogDate” and “Date”, which could indicate a problem with the system clock on the application server, or even identify a problem with the syslog server unexpectedly slowing down.

One of the uses we’re putting this too is to build real-time reporting of coarse-grained application events to compliment the views we get from ganglia etc.

Advertisements
February 17, 2010 / cohodo

Talis Hackday 1.0

A couple of weeks ago we held our first hackday. Basically, this involved taking over one of the larger rooms at Talis HQ for the day, filling it with hackers and pizza then baking for several hours. Hackdays tend to be aimed squarely at developers, but taking a leaf from events like Hacks & Hackers we wanted to be a bit more inclusive, so we tried to make it interesting and accessible to non-techies. For a week or so before the day, everyone who had an idea, pet project, or itch to scratch was encouraged to post it up on a whiteboard and ‘pitch’ it to other people, who might be interested in finding out more or even pitching in to help out. There were only 2 rules – that no idea was dismissed out of hand and that no-one was allowed to hack on stuff from their day job (because that’s what we do, like, every other day).
Talis is an organisation full of hackers, so there was no shortage of ideas or participants. In fact, the number of hacks posted on the board far exceeded our hacking capacity for a single day.

The day was a great success and we’re already planning future events with lots of ideas on how to tweak the format. We’d love to open these up for wider participation, and hope to be doing this in the next few months, so watch this space. There were some really cool projects being worked on, so see if anything tickles your fancy and let us know what you think.

Recording Environmental Data as RDF
Über-cool mashup of Arduino and RDF, Rob built a device to take temperature readings at regular intervals, represent the data in RDF and post it to a Platform Store. Its now sat on his windowsill, keeping us informed of the ambient temperature in Rob’s general vicinity

TweeVR
A twitter-enabled plugin for PVRs (primarily MythTV, but hopefully with support for other distros in the pipeline). Triggered when you record a TV show, this queries various datasources, integrates the data and publishes it for the world to see. Perfect for advertising your love of Carry On films or afternoon soap opera.

Store Activity Visualisation
Julian built a cool visualisation of activity on a Platform Store using the built-in OAI-PMH service which graphs updates made to both the Metabox and Contentbox over a specified period. The IRC logs for #talis are persisted in a Store, so we’re going to use this tool to graph activity on the channel.

Using PIG and Amazon Elastic MapReduce to Analyze Webserver Logs
We have a lot of logs, and as you can imagine, they contain lots of truly invaluable data. Some members of our Platform engineering team wanted to explore this a bit more deeply, and so spent the day hacking up Pig Latin scripts to do this. Since they managed to chomp so many logfiles, we let them get away with breaking hackday rule #2.

Android Life Tracker
Talis CSO Justin hacked up an android app to record events as RDF direct from mobile devices. Surprisingly, he’s chosen to store the trail of these events in a Platform Store for post-hoc analysis & data mining 🙂

Sparql 1.1 HTTP Update Protocol Implementation
Paolo spent the day working on a reference implementation of the current draft of the Sparql Working Group’s RESTful update protocol using Jena and Jersey/JAX-RS. Paolo plans to open source and contribute this back to the Jena project once he’s done.

Data Integration for Business Intelligence
John spent the day working on modelling data extracted from library loans services. Using RDF to integrate data from disparate sources like this is just the sort of job we built our Platform for.

Sackboy
Like a school sports day, there were no prizes awarded on the day. But if there had been, the gold medal would have undoubtedly gone to Ian Corns for his LittleBigPlanet hack – Sackboy Explains the Semantic Web.

There’s just no way you can top an platform-based romp through the bowels of CERN where the eponymous hero meets TimBL to explore the origins of the document and data web.