Numpy is my homeboy
I am continually amazed by how awesome numpy can be. While some parts of the API are a little crusty, it does give you a pretty nice set of vectorized operations (read: in a general purpose language - R and Octave are cool, but when you’re developing online learning algorithms, it’s nice when your statistical code is in the same language that talks to your databases, etc.)
Let’s say I want to write code to implement a simple matrix multiplication, like this:
\[ y = \theta^TX \]
Here’s one (terribly naive, doubles-only) implementation of a 2D matrix multiplication in ANSI C:
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
double **matrix(int rows, int cols)
{
double **m;
int i;
m = (double **) malloc(rows * sizeof(double *));
for (i=0; i < rows; i++) {
m[i] = (double *) malloc(cols * sizeof(double));
if (m[i] == NULL)
return NULL;
}
return m;
}
void free_matrix(double **matrix, int rows)
{
int i;
for (i=0; i < rows; i++)
free(matrix[i]);
free(matrix);
}
double **matrix_multiply(double **a, double **b, int a_rows, int a_cols, int b_rows, int b_cols)
{
assert (a_cols == b_rows);
int i, j, k;
double **result = matrix(a_rows, b_cols);
if (result == NULL) {
printf("Could not allocate result matrix");
exit(1);
}
for (i=0; i < a_rows; i++)
for (j=0; j < b_cols; j++)
for (k=0; k < b_rows; k++)
result[i][j] += a[i][k] * b[k][j];
return result;
}
Now, while writing a simple matrix multiplication in C makes for a good opening interview question, there are far more efficient implementations than anything I could ever hope to write available in the form of Level-3 BLAS routines, which can be accessed through the excellent GNU Scientific Library (GSL). Here’s that code:
#include <stdio.h>
#include <gsl/gsl_blas.h>
int main(void)
{
double a[] = { 1.0, 2.0, 3.0,
4.0, 5.0, 6.0 };
double b[] = { 4.0, 5.0,
6.0, 7.0,
8.0, 9.0 };
double c[] = { 0.0, 0.0,
0.0, 0.0 };
gsl_matrix_view A = gsl_matrix_view_array(a, 2, 3);
gsl_matrix_view B = gsl_matrix_view_array(b, 3, 2);
gsl_matrix_view C = gsl_matrix_view_array(c, 2, 2);
gsl_blas_dgemm(CblasNoTrans, CblasNoTrans,
1.0, &A.matrix, &B.matrix,
0.0, &C.matrix);
printf("[ %g, %g\n", c[0], c[1]);
printf(" %g, %g ]\n", c[2], c[3]);
return 0;
}
These routines are crazy fast. Things like matrix multiplications are automatically decomposed into submatrices small enough that they can fit in the L1 cache. But they’re kind of cumbersome and low-level. Scientists are usually more interested in thinking about the algebra and the data than the primitive operations.
Luckily we have numpy/scipy, which allow you to do some pretty intense numerical computations in Python. Here’s what the above GSL code would look like in numpy:
A = numpy.array([[1, 2, 3], [4, 5, 6]], dtype=numpy.float64) B = numpy.array([[4,5], [6,7], [8,9]], dtype=numpy.float64) C = A.dot(B) print C
Much simpler, more algebraic-looking code, and it’s often almost as fast as the C version (numpy.dot and a few other methods are actually making calls to BLAS routines if it’s available i.e. if you’re a cool kid with a Fortran compiler!) I haven’t seen a proper benchmark on this, but numpy is among the faster high-level math languages.
I do quite a bit of work with numpy, but many times in the past I’ve found myself having to rely on regular Python data structures like sets and such when I’d really like a nice vectorized way to work with my strongly-typed data. Here are a few pieces of numpy that I’ve found to be particularly helpful:
High-level interface for strongly typed sets (1d arrays - if you don’t know whether all elements are unique, you can make a set with numpy.unique):
import numpy a = numpy.arange(10) b = numpy.arange(5, 15) numpy.setdiff1d(a,b) # array([0, 1, 2, 3, 4]) numpy.union1d(a,b) # array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) numpy.intersect1d(a,b) # array([5, 6, 7, 8, 9])
Building arrays incrementally from a Python generator:
import numpy
# Imagine these come from a db cursor or something
coordinates = [(1,2,3), (4,5,6), (7,8,9)]
def my_gen(some_tuple):
for x, y, z in some_tuple:
yield x, y, z
a = numpy.fromiter(my_gen(coordinates), dtype=[('x', 'l'), ('y', 'l'), ('z', 'l')])
"""
This creates a numpy record array, which looks like:
array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
dtype=[('x', '<i8'), ('y', '<i8'), ('z', '<i8')])
>>> a['x']
array([1, 4, 7])
"""
I now retract my previous comments about the array module, as numpy.fromiter beats the pants off of it. In my previous post, I found that array.append, while being slower than appending to a Python list, used quite a bit less memory for arrays of a single type.
numpy.fromiter actually appears to be more efficient than Python’s list append (maybe because it saves a Python for loop). Here’s the same benchmark I ran
python -m timeit -n1 -r100 "a = []" "for i in xrange(1000000):" " a.append(i)" 1 loops, best of 100: 93.1 msec per loop python -m timeit -n1 -r100 -s "import numpy" "a = numpy.fromiter(xrange(1000000), dtype=numpy.int64)" 1 loops, best of 100: 74.3 msec per loop python -m timeit -n1 -r100 -s "import numpy" "def gen():" " for i in xrange(1000000):" " yield i" "l = []" "for i in gen():" " l.append(i)" 1 loops, best of 100: 247 msec per loop python -m timeit -n1 -r100 -s "import numpy" "def gen():" " for i in xrange(1000000):" " yield i" "a = numpy.fromiter(gen(), dtype=numpy.int64)" 1 loops, best of 100: 123 msec per loop
What this means is that building a sparse matrix for some machine learning or statistical problem, using data that came in from your web server or database as an iterator, can be quite efficiently marshaled into numpy arrays without having to preallocate. Pretty nifty!
Discovering this has led me to build some fun generators, especially given the fact that numpy has a record array (I used one in the previous example as well) which lets you build complex data types that are stored column-wise as contiguous arrays - perfect for sparse coordinate matrices.
Here’s another example of fromiter that comes up sometimes in distributed systems. Let’s say I have a bunch of arrays of the same length that are coming in as byte strings, and I want to stack them to build a matrix. I can do something like this:
# Even more fun
def array_gen(some_strings):
for s in some_strings:
yield numpy.fromstring(s, dtype=numpy.int)
# Let's say I know the length of all the arrays coming out of my generator
# and I want to build a matrix
K = 10
M = 5
stringified = numpy.arange(K).tostring()
str_gen = (stringified for i in xrange(M)) # This is just a generator object, hasn't done anything yet
m = numpy.fromiter(array_gen(str_gen), dtype=[('cool', numpy.int, K)])
"""
The array would look like this:
>>> m
array([([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],),
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],),
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],),
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],),
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],)],
dtype=[('cool', ''<i8', 10)])
But you can also access the single matrix through the 'cool' index without making copies
>>> m['cool']
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
"""
Ok, that’s it for the night. There’s lots more to say about numpy but I’ll save that for another post.