March 16, 2009

Handle Google Analytics scheduled e-mail reports with Apache James

Google Analytics doesn't have an API, but after reading this post I got an idea for a similar solution. We can configure Google Analytics to schedule e-mails with XML report attachments, we then handle the e-mail at the mail server to save the attachment to disk. The saved file can be used as input in other applications, for example a dashboard application. On my server I have an Apache JAMES server instance running. JAMES supports so called mailets. A mailet is a piece of Java code that can process e-mails and follows the Mailet API. This is ideal for our solution.

Let's see how we can use JAMES to get Google Analytics reports and save them automatically to disk. First we create a new user for the JAMES server. This user will get the e-mails from Google Analytics. To add a user the JAMES server must be running. We can then log in to the remote manager with a telnet client. With the command adduser google_analytics <password> we create the user google_analytics with e-mail address google_analytics@servername.

Okay we now have setup a e-mail address. Now we can create a scheduled e-mail report in Google Analytics. We must open Google Analytics and for this example we select Dashboard. At the top we must click on the Email button. Now we can select the Scheduled tab page. Here we use our newly created e-mail google_analytics@servername. We can also fill the other fields, we must make sure we use the XML format.

Now we get an e-mail with an XML attachment every day. It is time to setup JAMES with mailets so we can automatically extract the XML attachment and save it to disk. First we must checkout and build the Mailet base and standard packages:

$ mkdir mailet
$ cd mailet
$ svn checkout apache-mailet-base
$ cd apache-mailet-base
$ mvn install
$ cd ..
$ svn checkout apache-standard-mailets
$ cd apache-standard-mailets
$ mvn install

Now we have two JAR files in apache-mailet-base/target/apache-mailet-base-1.0-SNAPSHOT.jar and apache-standard-mailets/target/apache-standard-mailets-1.0-SNAPSHOT.jar. We need to copy these to the JAMES directory: JAMES_HOME/apps/james/SAR-INF/lib. When we copy the files in this directory they will be included in the classpath of JAMES.

Next we must change the JAMES_HOME/apps/james/SAR-INF/config.xml file. Here we define the mailet in the <process name="root"> element. We place the following snippet at the end of the element:

<mailet match="RecipientIs=google_analytics@servername" class="StripAttachment">

The match attribute RecipientIs makes sure we can process all e-mail messages for google_analytics@servername. Processing is done with the StripAttachment class. With this class the attachment is simply stripped from the message and saved to disk. With the pattern, directory and remove we configure the StripAttachment. The pattern is set to .*\.xml so we find all XML attachments. In our case the e-mail from Google Analytics only contains one XML attachment. With directory we specify in which directory the XML file is saved. And finally remove let's us define if we want to remove the attachment from the original e-mail or just leave it in (as we did).

We restart JAMES and next time Google Analytics sends an e-mail with an XML report, the e-mail is intercepted by the JAMES mailet and the attachment is saved to the directory /home/googe_analytics/xml.