Cython part 1
system:sage


<h1>Cython</h1>
<address>Jeroen Demeyer (Universiteit Gent)</address><address>&nbsp;</address><address>Disclaimer: I am not a Cython developer!</address>
<h2>What is Cython?</h2>
<ul>
<li>Cython (a fork of <a href="http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/">Pyrex</a>) is a Python $\rightarrow$ C compiler.</li>
<li>The syntax is mostly like Python, with certain Cython-specific constructs.</li>
<li>Cython supports essentially all of Python and most of C.</li>
<li>I like to think of Cython as the "union" of Python and C.</li>
<li>Cython gives the best of both worlds: the speed of C with the flexibility of Python.</li>
<li>Currently, my favourite programming language.</li>
</ul>
<h2>Uses of Cython in Sage</h2>
<p>Within Sage, these are the applications of Cython:</p>
<ul>
<li>Interfaces with C libraries, such MPIR, NTL, libGAP, PARI.</li>
<li>Speed up Python code using type declarations and <strong>cdef</strong> functions.</li>
<li>Low-level access to the operating system (signal handling, files, ...). Most of these could be done directly in Python using the <strong>os.foo()</strong> functions, but Cython allows more control (example: when working on sagenb, I found a bug in Python's tempfile module which most likely wouldn't have happened if it were written in Cython).</li>
</ul>
<h2>Getting started</h2>
<p>To quickly experiment with Cython, the easiest is to use the <strong>cython()</strong> function:</p>

{{{id=1|
%time
cython("""
print "Hello from Cython"
""")
///
}}}

<p>Note that this takes a long wall time, because this code needs to be compiled.</p>
<p>Let's implement trial division in Python:</p>

{{{id=6|
def trialdivision_py(n):
    i = Integer(2)
    while i*i <= n:
        if n % i == 0:
            return i
        i = i+1
///
}}}

<p>And now <em>exactly the same code</em> in Cython:</p>

{{{id=17|
cython("""
from sage.rings.integer import Integer
def trialdivision(n):
    i = Integer(2)
    while i*i <= n:
        if n % i == 0:
            return i
        i = i+1
""")
///
}}}

<p>Just compiling this code in Cython already gives a modest speed-up:</p>

{{{id=18|
n = next_prime(10^7)*next_prime(10^8)
///
}}}

{{{id=20|
%time
trialdivision_py(n)
///
}}}

{{{id=19|
%time
trialdivision(n)
///
}}}

<h2>Using C types</h2>

<p>The previous example was compiled in Cython, but was still using the Python API. We can speed up things massively using native C types:</p>

{{{id=21|
cython("""
def trialdivision(unsigned long long n):
    cdef unsigned long long i = 2
    while i*i <= n:
        if n % i == 0: 
            return i
        i = i+1
""")
///
}}}

{{{id=22|
%time
trialdivision(n)
///
}}}

<h2>Interrupts</h2>

<p>Unfortunately, this code cannot be interrupted with CTRL-C or <strong>alarm():</strong></p>

{{{id=26|
%time
try:
    alarm(0.1)
    trialdivision(next_prime(10^17))
except KeyboardInterrupt:
    pass
///
}}}

<p>To solve this problem, we add a <strong>sig_check()</strong> inside the loop (much more about this later!)</p>

{{{id=28|
cython("""
def trialdivision(unsigned long long n):
    cdef unsigned long long i = 2
    while i*i <= n:
        sig_check()
        if n % i == 0: 
            return i
        i = i+1
""")
///
}}}

{{{id=29|
%time
try:
    alarm(0.1)
    trialdivision(next_prime(10^17))
except KeyboardInterrupt:
    pass
///
}}}

<h2>Compiler directives</h2>
<p>Division by zero is caught as usual when executing this:</p>

{{{id=9|
cython("""
def trialdivision(unsigned long long n):
    cdef unsigned long long i = 0  # MISTAKE!
    while i*i <= n:
        sig_check()
        if n % i == 0: 
            return i
        i = i+1
""")
///
}}}

{{{id=35|
trialdivision(n)
///
}}}

<p>This check for division-by-zero (and also the different rounding when dividing negative numbers) causes some runtime overhead.<br /><br />To get pure C division, use the <strong>cdivision=True</strong> <a href="http://docs.cython.org/src/reference/compilation.html#compiler-directives">compiler directive</a>:</p>

{{{id=41|
cython("""
cimport cython
@cython.cdivision(True)
def trialdivision(unsigned long long n):
    cdef unsigned long long i = 0  # MISTAKE!
    while i*i <= n:
        sig_check()
        if n % i == 0: 
            return i
        i = i+1
""")
///
}}}

{{{id=38|
trialdivision(n)
///
}}}

<p>The default in Cython is <strong>cdivision=False</strong>, but it is <strong>cdivision=True</strong> for Sage source code.<br /><br />There are similar directives for list indexing: <strong>boundscheck</strong> and <strong>wraparound</strong>.</p>
<h2>More about type declarations</h2>
<p>The code with <strong>unsigned long long</strong> is limited to 64 bits:</p>

{{{id=43|
trialdivision(2^100-1)
///
}}}

<p>We will now write the trial division code using Sage's <strong>Integer</strong> type:</p>

{{{id=46|
cython("""
from sage.rings.integer cimport Integer
def trialdivision(Integer n):
    cdef Integer i = Integer(2)
    while i*i <= n:
        sig_check()
        if n % i == 0: 
            return i
        i = i+1
""")
///
}}}

<p>Just this change doesn't speed up anything:</p>

{{{id=48|
%time
trialdivision(n)
///
}}}

<p>We are no longer limited to 64 bits:</p>

{{{id=60|
trialdivision(2^100)
///
}}}

<p>Note the <strong>cimport</strong> statement. This statement looks at the file <a href="https://github.com/sagemath/sage/blob/develop/src/sage/rings/integer.pxd">src/sage/rings/integer.pxd</a>. That file contains declarations of the <strong>Integer</strong> type. It should contain all declarations of cdef, cpdef functions and attributes which are meant to be used by different modules.</p>

<p>Let us optimize the addition <strong>i = i+1</strong> by using the cdef method <a href="https://github.com/sagemath/sage/blob/develop/src/sage/structure/element.pxd#L93">_add_long</a>:</p>

{{{id=49|
cython("""
from sage.rings.integer cimport Integer
def trialdivision(Integer n):
    cdef Integer i = Integer(2)
    while i*i <= n:
        sig_check()
        if n % i == 0: 
            return i
        i = i._add_long(1)
""")
///
}}}

{{{id=50|
%time
trialdivision(n)
///
}}}

<p>Integers in Sage are implemented using the <a href="http://mpir.org/">MPIR</a> library (a fork of GMP). Note the cdef attribute <a href="https://github.com/sagemath/sage/blob/develop/src/sage/rings/integer.pxd#L8">cdef mpz_t value</a>. Here, <strong>mpz_t</strong> is some C type defined by MPIR. We optimize the divisibility check using the low-level MPIR function <strong>mpz_divisible_p</strong>. First, we need to <strong>cimport</strong> that function (I'm lazy so I just cimport everything from <a href="https://github.com/sagemath/sage/blob/develop/src/sage/libs/gmp/mpz.pxd#L95">sage.libs.gmp.mpz</a>).</p>

{{{id=55|
cython("""
from sage.libs.gmp.mpz cimport *
from sage.rings.integer cimport Integer
def trialdivision(Integer n):
    cdef Integer i = Integer(2)
    while i*i <= n:
        sig_check()
        if mpz_divisible_p(n.value, i.value):
            return i
        i = i._add_long(1)
""")
///
}}}

{{{id=56|
%time
trialdivision(n)
///
}}}

<ul>
<li>The types you can use are either Python built-in types (e.g. <strong>list</strong>), Cython extension types (declared with <strong>cdef class</strong>) or C types.</li>
<li>Note: <strong>cdef int</strong>, <strong>cdef long</strong> and <strong>cdef float</strong> always refer to C types, not to Python types!</li>
</ul>
<h2>Type casts</h2>
<p>The construction <strong>&lt;typename&gt;obj</strong> can be used to "cast" <strong>obj</strong> to <strong>typename</strong>. This is literally a cast like <strong>(typename)obj</strong> in C. There is also <strong>&lt;typename?&gt;obj</strong> which is a checked cast: it raises TypeError if obj is of the wrong type. Note the difference between</p>
<ul>
<li><strong>cdef typename x = obj</strong>: succeeds if isinstance(x, obj) or if obj is None</li>
<li><strong>cdef typename x = &lt;typename&gt;obj</strong>: no check, always succeeds</li>
<li><strong>cdef typename x = &lt;typename?&gt;obj</strong>: succeeds if isinstance(x, obj)</li>
<li><strong>cdef typename x = typename(obj)</strong>: not a cast, calls typename.__init__</li>
</ul>

{{{id=95|
cython("""
from sage.rings.integer cimport Integer

cpdef Integer nocast(x):
    return x
   
cpdef Integer cast(x):
    return <Integer>x
    
cpdef Integer checked_cast(x):
    return <Integer?>x
    
cpdef Integer convert(x):
    return Integer(x)
""")
///
}}}

{{{id=101|
nocast(5)
///
}}}

{{{id=102|
nocast(int(5))
///
}}}

{{{id=103|
nocast(None)
///
}}}

{{{id=100|
cast(int(5))
///
}}}

{{{id=107|
type(cast(int(5)))
///
}}}

{{{id=99|
checked_cast(5)
///
}}}

{{{id=105|
checked_cast(QQ(5))
///
}}}

{{{id=104|
checked_cast(None)
///
}}}

{{{id=106|
convert(5.0)
///
}}}

<h2>cdef and cpdef functions</h2>
<p>In Cython you can define Python functions the usual way using <strong>def</strong>. You can also define a <strong>cdef</strong> function, which is compiled as a C function without Python calling overhead. A <strong>cdef</strong> function cannot be imported, it must be cimported (and declared in the .pxd file). Within one Cython module, there is no difference between these two types of function. There is also <strong>cdef inline</strong> for inline functions.</p>
<p>There is also a <strong>cpdef</strong> function, which behaves like a <strong>cdef</strong> function with a Python wrapper. In the case of methods, a <strong>cpdef</strong> function can be overwritten by a <strong>def</strong> method (a <strong>cdef</strong> method only by a <strong>cdef</strong> method). That is why a class inheriting from <strong>Element</strong> can be implemented in Cython or Python and <strong>_mul_</strong> will "just work".</p>

{{{id=62|
cython("""
from sage.rings.integer cimport Integer

cdef inline bint is_divisible(n,d):
    return n % d == 0
    
def trialdivision(Integer n):
    cdef Integer i = Integer(2)
    while i*i <= n:
        sig_check()
        if is_divisible(n,i):
            return i
        i = i+1
""")
///
}}}

<p>Note the use of the special type <strong>bint</strong>. In a C context, it behaves exactly like <strong>int</strong>, but it gets converted to Python as True/False. For the example above, there is no difference, but using <strong>bint</strong> is more explicit about the meaning.</p>

{{{id=66|
trialdivision(n)
///
}}}

<h2>Extension types</h2>

<p>Cython allows you to define a <strong>class</strong> just like in Python. However, with <strong>cdef class</strong>, you can define a so-called "extension type". These are much faster than classes, but do not support all features of classes (like arbitrary attribute assignment):</p>

{{{id=67|
Integer.foo = 1
///
}}}

<p>Methods can be defined using <strong>def</strong>, <strong>cdef</strong> or <strong>cpdef</strong>. Attributes can be defined using <strong>cdef</strong> (Cython only), <strong>cdef public</strong> (Cython + Python), <strong>cdef readonly</strong> (Cython + read-only in Python):</p>

{{{id=74|
cython("""
cdef class MyType(object):
    cdef str mystr
    cdef public long myint
    cdef readonly list mylist
    
    def __init__(self, x, y, z):
        self.mystr = x
        self.myint = y
        self.mylist = z
        
    def __repr__(self):
        return "MyType({}, {}, {})".format(self.mystr, self.myint, self.mylist)
""")
///
}}}

{{{id=78|
x = MyType("Hello world", 42, range(10))
///
}}}

{{{id=91|
x
///
}}}

{{{id=80|
x.mystr
///
}}}

{{{id=85|
x.myint
///
}}}

{{{id=86|
x.myint = 43.0
///
}}}

{{{id=87|
x
///
}}}

{{{id=88|
type(x.myint)
///
}}}

{{{id=79|
x.mylist
///
}}}

{{{id=84|
x.mylist = []
///
}}}

<h2>Final notes</h2>

<ul>
<li>Do not use Cython for <em>everything</em>: Cython has compile-time overhead, requires manual interrupt handling, can be slower if used badly.</li>
<li>Do not try to learn Cython from the Sage source code. There are many horrible uses of Cython, mostly because Cython (before that, Pyrex and SageX) historically had a lot less features.</li>
<li>Cython has better Python 2 $\leftrightarrow$ Python 3 compatibility than Python itself. For example: <a href="https://www.python.org/dev/peps/pep-3102/">keyword-only arguments</a> and <strong>basestring</strong> are always supported.</li>
</ul>

{{{id=92|

///
}}}