SECOORA Google Analytics Wiki

17 July 2008 Dear DMCC,

I have been working my way through some of the Google Analytics data (for mid-June through mid-July) and there are some interesting trends.

I'd like a bit of feedback on a template report from GA. Any ideas on core elements to include in the report? Maybe even two different reports (one for internal use that focuses on loading and page hits) and another for public/membership consumption that provide key details in nice graphics/info bites. Section below for each....

Thoughts on either idea? Basic information on GA reporting elements can be found here: http://www.google.com/analytics/

Feel free to comment at will...

Thanks, Sam

Ideas for Internal (Operational) Report

Basic Need: To provide DMCC and web site team members with site profile metrics each month (or quarter) with the objective of tailoring modifications to content and architecture to true user needs. Essentially using the GA outputs as a passive feedback mechanism. Would ultimately include some hybrind of the GA output and Apache logs (so that we can see specifically which files are being downloaded/queried, instead of simply page hits).

...

Ideas for External (Public) Report

Basic Need: To provide an elegant, graphic-heavy public report to the membership, with the objective of informing them of trends in site use. Fundamentally, this is an attempt at organizational transparency and education. I envision this as being an element of the monthly newsletter (or every other month maybe) showing trends over time and some highlighted element of the analysis.

...

Initial Apache log analysis

This is in support of more internally focused reporting about our data and map products. Consolidating all logging through GA if possible is the ultimate answer if possible.

This need arises from 2 issues

  1. use of AJAXish functions on the site - mainly for OpenLayers, which does not refresh its container page when map content changes. So GA misses logging the WMS calls out from the map.
  2. use of Apache rewrite rules - mainly for data download and OGC services. Used to "brand" non-SECOORA URLs with a secoora.org prefix. Since these URLs are rewritten before reaching Plone, GA does not log them.

Approach

Take existing Apache logs and subset to include only URLs we think are missing/underrepresented in GA and analyze with AWStats.

  • Subset to the dates that the SECOORA.ORG data portal has been active => May - Oct
  • Eliminate all IPs from UNC (152.2.*) and USC (129.252.*) as these are from our internal testing and server setup.
  • Subset the remaining logs to include all rewritten URLs together:
    • /ncogc, /ncdata, /mapcache, /sclatest, /screcent, /scsos, /scarchival
  • Divide into sub-logs:
    • all SC rewrites - mainly links to data file distribution at USC
      • /sclatest, /screcent, /scsos, /scarchival
    • all NC rewrites supporting the OpenLayers map application
      • /ncogc URLs from /maps page.
    • all NC rewrites supporting direct OGC web services distribution
      • /ncogc URLs without /maps in referring page
  • Analyze each set of logs with AWStats - initial summary stats and most bots/crawlers are eliminated

...

Resources to review that might help consolidate all logging through Google Analytics

Some of these suggestions are beyond me and so may not be germane to the issues highlighted above. Jeremy has some good ideas here as well - don't understand yet, but will post here when I do.

  • Monitoring of Apache rewrites that bypass Plone and GA entirely when folks hit them directly. This is the trickiest to handle, maybe impossible. Maybe we could rewrite the URLs on interest to pass through a script/page that we can monitor with GA on its way to the page of interest.