Splunk my Garmin - Get the data in - Part 1
Splunk my Garmin Series
- Part 1: Get the data in
- [Part 2: Make sense of TrackPoint]
- [Part 3: Build the app]
- [Part 4: Investigate the API]
- [Part 5: ???]
Ever since Dave wrote about Downhill Splunking I’ve wanted to suck my Garmin Connect activities into Splunk and explore. This is the first of a multi-part post on my experience doing so.
Where my files at?
I’ve been using Tapiriik for a long time to sync my Garmin activities to Endomondo. At the time I decided what’s the harm in putting them in Dropbox. Turns out that was a great idea, because I now have a year’s worth of data to play with.
Splunk it up!
Start by creating a fitness
app. All sourcetypes, macros, extractions, dashboards etc are going to go in here. Once I’ve got something useful, I’ll push it up to GitHub and publish on Splunkbase.
- Create index in splunk
fitness
Put a monitor on the directory so files automatically get indexed.
cd C:\users\Me\DropBox\Apps\tapiriik
splunk add monitor . -index fitness -sourcetype tcx
What is TCX
TCX is Training Center Database XML file. it looks like:
There can be lots of laps per activity and lots of activities per file. The sourcetype that I define below will kind of cater to multiple activities.
Explore the data
Now lets explore what we’ve got
index=fitness
Ooh that was a mistake. Splunk created an event for every single track point because it found multiple timestamps. Slow down and index=fitness | delete
everything. Upload the file and create the sourcetype. We’ll move this to the app later.
Fix up the sourcetype
Look at the props.conf documentation for multiline events. There are a few things we need to set for this sourcetype
TIME_PREFIX = \<Id\>
This seems to be the case on my Garmin activities. Not sure about other systems.BREAK_ONLY_BEFORE = \<\?xml version
I’ve set this to the XML declaration header. Each file is a single event so don’t break. In the future, this will change to break for<Activity>
.MAX_EVENTS = 999999999
Splunk breaks at 256 lines. I set this to 5 orders of magnitude higher than my biggest fileKV_MODE = xml
to get field extractions
Save the sourcetype as tcx
in the app fitness
. I’m unsure about the TIME_PREFIX field, but we’re good to go for now. The props.conf stanza is
Data upload - round 2
Now to re-index what’s in the directory. I didn’t feel like mucking about with resetting fishbucket etc so a quick powershell line
Let’s see what we’ve got
sourcetype=tcx | timechart count by "TrainingCenterDatabase.Activities.Activity{@Sport}"
Wahey! We’re getting some data.
Looking at the actual TCX files, I’m not sure how I’m going to deal with multi-lap events. That’s a problem for later in the post. I want to get some data out of the lap, total time, distance, calories, etc.
Alias some fields
These are the fields that I have aliased in props.conf.
FIELDALIAS-Sport = "TrainingCenterDatabase.Activities.Activity{@Sport}" AS Sport
FIELDALIAS-Cadence = TrainingCenterDatabase.Activities.Activity.Lap.Cadence AS Cadence
Now for something I haven’t been able to discover previously. How has my cadence changed over time?
index=fitness sourcetype=tcx Sport=Biking Cadence=* | timechart span=mon avg(Cadence)
This is a good start. I’ll look at pulling some information about the track points apart. I want to try to get parity with what garmin connect shows as statistics.