A police officer sees a drunken man intently searching the ground near a lamppost and asks him the goal of his quest. The inebriate replies that he is looking for his car keys, and the officer helps for a few minutes without success then he asks whether the man is certain that he dropped the keys near the lamppost.
"No," is the reply, "I lost the keys somewhere across the street." "Why look here?" asks the surprised and irritated officer. "The light is much better here," the intoxicated man responds with aplomb. (source)
As with inebriated man, we developers are prone to optimising parts of the program which are most amenable to optimisation, rather than the parts which take the largest amount of time. To help us guide in optimisation efforts, we need to profile our programs.
Profiling a program refers to keeping track of times taken by various parts of a program. This helps us to find out the parts of the program that take most time. Those parts of the program should be the focus of optimisation efforts.
In all of the popular languages, there are some tools available for profiling. Most of them fall in two categories:
The advantage of above methods is that there is little programming effort involved. The disadvantage is that either they may have a significant overhead (a one line function may be called millions of times, sampling period may be too small) or they may be too coarse grained to be useful.
As a result of above limitations, profiling is usually set off during production, and is enabled only in development/testing.
However, this is not an option for us, since production environment is very dynamic and unpredictable and it is not feasible for us to replicate it in dev/test settings. For us, performance optimisation is a first class activity, and we don’t mind instrumenting our code to facilitate accurate profiling.
We realised that we needed a low overhead instrumentation framework, where we could manually specify the interesting portions of our program. There was no open source work which we found useful, and thus we have made a profiling framework which we call QGProfiler.
We have released its source code at https://github.com/quantumgraph/qgprofiler.
QGProfiler works as following. You instantiate an object of QGProfiler, giving it a name, say "my-cool-program".
qgprofiler = QGProfiler("my-cool-program", "/some/path/filename.json")
As the program enters an interesting phase, you do a push:
qgprofiler.push("some phase")
When the interesting phase is over, you do a pop:
qgprofiler.pop()
Pushes and pops can be nested inside each other. QGProfiler keeps track of time taken by each node (determined by a push and pop). In the end, you call
qgprofiler.generate_file()
This call generates a file with the name specified in the constructor. If you call push() with same argument several times while being at particular point in the tree, the times will be added up, and one node will be generated with an overall count and total time taken for the stage.
Using pip:
$ pip install qgprofiler
Using setuptools:
$ git clone https://github.com/quantumgraph/qgprofiler $ cd qgprofiler $ python setup.py install
Say you have a program:
def f1(): # some computation def f2(): # some computation def f(): data = f1() # do some manipulation on data. This is a heavy task # do some more manipulation. This should be light for i in range(100): f2() def g(): # some short computation if __name__ == "__main__": f() # I think it takes a lot of time g() # This should not take a lot of time
Now, you would like to know how much time various parts of this program take. You instrument your code like this:
from QGProfiler import qgprofiler def f1(): # some computation def f2(qgp): qgp.push('f2') # some computation qgp.pop() def f(qgp): qgp.push('f1') data = f1() qgp.push('heavy manipulation') # do some manipulation on data. This is a heavy task qgp.pop() # do some more manipulation. This should be light for i in range(100): f2(qgp) qgp.pop() def g(): # some short computation if __name__ == "__main__": filename = <a unique filename>json qgp = QGProfiler('number-cruncher', filename) qgp.push('overall') qgp.push('f') f(qgp) # I think it takes a lot of time qgp.pop() g() # This should not take a lot of time qgp.pop() qgp.generate_file()
At the end of this program, a file with the name specified by you will be generated, which contains content like the following:
{ "count": 1, "name": "number-cruncher", "value": 23.12, "children": [ { "count": 1, "name": "overall", "value": 23.11, "children": [ { "count": 1, "name": "f", "value": 16.33, "children": [ { "count": 1, "name": "f1", "value": 14.14, "children": [ { "count": 1, "name": "heavy manipulation", "value": 12.21, "children": [] }, { "count": 100, "name": "f2", "value": 1.20, "children": [] } ] } ] } ] } ] }
Above json, throws following light on the program behavior:
Combining various output files
Let's say your program runs many times, and you generate one json file for each run of the program. Alternatively, your program may be forever running server, and you output one json file every hour.
You would like to see aggregated performance across various json files. For this, we provide a module called QGProfileAggregator. You use it like this:
from qgprofiler import QGProfileAggregator qg_aggregator = QGProfileAggregator('*.json', 'output.json') qg_aggregator.generate_file()
You do it like this:
qg_aggregator = QGProfileAggregator('*.json', 'output.html') qg_aggregator.generate_file()
The flame graph consists of an interactive chart using which you can explore how much time various parts of your program take. This is what a sample flame graph for one of our programs looks like: (click image for larger version)
You can find a sample interactive flame graph here: https://app.qgraph.io/static/blog/flame-graph.html
QGProfiler, and in particular its flame graph output has already helped us significantly to zoom in on areas of improvement. Compared to ad hoc approaches that we used earlier, this is like an X-ray, pointing us incisively to the parts of the program most deserving of optimisation efforts.
There are several improvements planned:
QGProfiler is deceptively simple to use, but extremely powerful. If you are thinking of making code changes to your python program to make it faster, try it out and you won’t be disappointed. Let us know how it goes. You can find the source code and further instructions at https://github.com/quantumgraph/qgprofiler. Prudhvi Raj, the creator and maintainer of the program, is waiting for your feedback!