Profiling a NDK app from a CI server

As we get into working in VR, one of the chief toolsets that's missing from our developer toolbox is profiling.  The problem is that VR has a hard CPU / GPU requirement which is not so forgiving to unoptimized code.  So every time a team member checks in code, we really must ensure that the app still performs at least at 60 fps or our end users will wind up sick.
As I've mentioned before, I prefer Jenkins for a CI server.  Now something we can do is start recording the systrace and run a monkey test behind it so the app is doing more than just running.  To do that, create a new shell task in Jenkins and do something like this:

adb shell monkey -p -v 5000 > monkeyResults.txt
This will run 5000 random events that an end user might run back to back.  The nice thing about it is it generates random user input and runs any app in an unpredictable yet repeatable manner.  It looks for crashes from unexpected user interaction and we can use Jenkins to parse the results.  In this way we can safely assume that the app is getting exercised and it's worth running a profiler at the same time. So to do that let's run it from the command line like this:

You will find in the AndroidSDK/platform-tools/systrace directory.  It is safe to move that directory to the server that is running the emulator and jenkins.  So in the script run this command

python --time=10 -a -o mynewtrace.html
adb shell monkey -p -v 5000 > monkeyResults.txt

What this will do is run for 10 seconds and produce an output that we can look at from within a browser.  However, on Daydream  or android N+ devices, it does something a bit special.  You either must have a N+ device or a rooted M device in order to get the systrace functionality to work.  If you have a N+ device, then any programmed call to systrace will show up in the resultant html.  Now unfortunately, Google only currently supports two calls from the NDK, (ATrace_BeginSection/ATrace_EndSection) but it also only handles the b/e markers that weren't enough for my needs; and very likely wont help you with yours.  The reason why is the B/E section only takes a pid so any end event on any thread in the same pid has the fun side effect of making the nested section tags pretty worthless in most apps.

There's two other types of markers, an async marker 'S'/'F' which you provide the pid and a cookie so it can correctly handle a begin/end along with a 'C' marker that will give you a count.  The count is interesting in that it will render out a nice bar graph (useful for displaying variable data).

Now for the reason for posting this, the Snapdragon Profiler is better at displaying the resulting html files.  It also has an import trace file utility.  The problem that it solves is that the Async tags will display as disjoint items in the html page; when there's a lot of them, it's hard to sift through it.  Snapdragon Profiler will put them in one async tab that you can drill into and find the big outliers.  It's admittedly not ideal and you have to pick and chose your battles carefully as too many

So what I do is hook up the command line trace file generation tool to our CI / auto unit testing system and use Snapdragon Profiler to view the results, when things go bump; this provides a good way to understand when it went bump.

Here's some example code that works pretty well:

int fileId = open( "/sys/kernel/tracing/trace_marker" , O_WRONLY);
if (fileId == -1) assert(false);

void nested_begin(const char* markerName) {
  char buf[256];
int len = sprintf(buf, 256, "B|%d|%s", getpid(), markerName);
write(fileId, buf, len);
//I'm skipping the rest of the non relevant parts and showing the strings that are possible and what they go with:
void nested_end(){
int len = sprintf(buf, 256, "E");
void counter(const char* counterName, int value) {
     int len = sprintf(buf, 256, "C|%d|%s|%d", getpid(), counterName, value);
void async_begin(const char* markerName, int cookie) {
       int len = sprintf(buf, 256, "S|%d|%s|%i", getpid(), markerName, cookie);
void async_end(const char* markerName, int cookie) { 
      int len = sprintf(buf, 256, "F|%d|%s|%i", getpid(), markerName, cookie);
Now, what this gives us is a great way to see exactly what the program is doing in a nice graphical format complete with timing, how it lines up with the frame events and what other system events are happening at the same time.
Only one downside, 10 seconds is worth about 30 megs of data for our profiled results.  I've found that most jenkins instances are not able to serve static pages of this size so they wind up timing out during the download.  Once the systrace.html is downloaded, load it up in Snapdragon Profiler and import these results.  Reading systrace files is something the Snapdragon Profiler excels at and it makes drilling down into the data insanely simple.  Also it wont place each async event on a separate line making the graph nearly impossible to read.
Over the course of the next major drone project, I'll be following this advice and give everyone a chance to see the resultant systrace html files along with screen captures of the Snapdragon Profiler handling those same files.


Popular posts from this blog

Drone VR Software Architecture

Build Snapdragon Flight Drone

Soldering new connectors for the drone