## Category Archives: Cython

So I got around to doing an actual cython performance comparison.  As was expected, converting python code to cython results in fairly substantial performance gains.  I’ve posted the code on github, and will show the results here.  I simply took the primes example in the cython tutorial, and created a python version, and modified the cython version a little bit.  The python version is below:

```def primes(kmax):
p = [0] * kmax
result = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1

return result
```

For the cython version, I modified the original static array in the code to a dynamic array using malloc. This allowed me to pass in the desired length of the primes return, without having to use an upper bound exception, as is done in the original cython tutorial. This was a little frustrating, as I was not familiar with malloc, and it is a good example of why python is such a wonderful language to use (assuming you don’t have performance issues). The cython code is below:

```from libc.stdlib cimport malloc, free

def primes(int kmax):
cdef int n, k, i
cdef int *p = <int *>malloc(kmax * sizeof(int))
result = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1

free(p)
return result
```

As you can see, there is still some python code in the cython function. One of the limitations with cython, is that if I want to run the cython function from a python script, I need to return the results as a python object. Of course, I could just run everything in cython!

I created a script to time and plot the performance of the two functions over a number of input values. Here is the result:

So as you can see, the partial cython implementation vastly outperforms the python function.  This gives you an idea of what kind of overhead the python interpreter brings to the language.  So know that I have a good idea of what kind of performance gains I can get using cython, I am going to try to implement some of these in my algorithms.  From what I understand, I simply need to convert python data objects to cython data types in order to get the code to compile to C.  I think a fun future project would be to write a python script that automatically converts common python data objects to their corresponding cython counterparts.  If I get a little time in the near future I may give that a shot.

In my continuing quest to improve my scores for the discrete optimization class, I stumbled upon a data structure called ‘k-d tree’ (short for k-dimensional tree).  Supposedly it is a really good data structure for spatial data, allowing for fast nearest neighbor searches.   The scipy.spatial library has an object called KDTree and cKDTree, both of which are implementations of the k-d tree data structure.  The primary difference between the two is that KDTree is implemented in python, whereas cKDTree is implemented in Cython.

I wrote up a quick test case to get an idea of the performance improvements provided by the special data structures.  For the baseline, I wrote a function that simply loops through a set of coordinates and grabs the nearest neighbor to a specified set of coordinates by calculating and comparing the distances of the specified set of coordinates with every other set.  In the second test case, I used the scipy.spatial KDTree object to store the coordinates, and then used the query method to get the nearest neighbor.  The third method was identical to the KDTree method except I used the cKDTree object.

I created a random set of coordinates and then ran each function 1000 times to get an idea of the speed of each method.  The code is posted on gist.  The results are below:

```Method 1 (using for loops) time:  2.42
Method 2 (using kdtree) time:  0.37
Method 3 (using ckdtree) time:  0.12
```

As you can see from the results, both of the k-d tree methods were significantly faster than using a `for loop`.  Additionally, the cKDTree implementation was fairly faster than the KDTree function.  One issue I ran into was an overflow error on the KDTree implementation when using a list of coordinates with more than 100,000 points.  Due to this issue, and because the use of cKDTree is nearly identical to KDTree (and faster), I would recommend using cKDTree.  I am going to throw this into my latest TSP solver and see if I an get some performance gains.  I’ll let you know how it goes.

So I have been playing around with Cython, which is an optimizing static compiler for Python that allows users to write python extensions in C, as a possible method to improve my algorithm performance for the discrete optimization class I have been taking.  In theory, by compiling the code before execution, the overhead associated with the python interpreter is reduced, which results in performance gains.  An additional benefit of the package is that it can run unmodified python code, so it is pretty easy to convert existing programs to cython.  I figured I would drop in my latest implementation of the Tabu Search algorithm that I wrote to solve the Traveling Salesman Problem (TSP) and compare the run-times between the two implementations.  The results of the comparison are below:

As you can (or can’t see) there is hardly a difference between the two implementations.  I was initially encouraged by the fact that the Cython implementation was running a second or two faster for the smaller TSP problems, but as the problem size increased, the time savings remained in the range of a few seconds, which doesn’t help at all.  I imagine that If I go in and re-write/translate the existing python code in the cython extension to C, I can probably see some performance gains, but merely dropping python code into a cython extension only has limited benefits.  I might go back and give this a shot, but for now I am going to keep plugging away in python.