Improving Haskell’s big numbers support
Work by IOHK engineers is part of the latest release of the Glasgow compiler
28 July 2020 Sylvain Henry 9 mins read
Recently we have been working on improving GHC’s support for big numbers, ie, numbers larger than 64-bit (both Integer and Natural types in Haskell). We have developed an implementation of big number operations in Haskell that is faster than the previous one (
integer-simple). We have also improved the way GHC deals with the different implementations to make it more robust and evolutive. These contributions are part of the latest GHC release, version 9.0.
Every Haskell compiler has to support arbitrary-precision integers (see Haskell 98 and Haskell 2010 reports, section 6.4). GHC is no exception and as such it provides the ubiquitous Integer and Natural types (‘big numbers’ for the rest of this post).
Until now, GHC could use one of two packages for this support:
integer-gmp: use GNU MP library (GMP) (via FFI), LGPL. Good performance.
integer-simple: Haskell implementation, BSD3. Bad performance.
Choosing one or the other depended on license, performance and cross-compilation considerations. In some cases, GMP’s LGPL license can be problematic, especially if you use static linking, which is required on some platforms, including Windows. When it comes to performance:
integer-simple is sometimes several orders of magnitude slower than
The situation was already unfortunate but there were additional issues:
- Each implementation had its own way of representing big numbers (array of unboxed words or list of boxed words). GHC was aware of the selected implementation and produced different code for each. It could lead to bugs – even ‘insanity’! – when big numbers were exchanged between processes compiled with different implementations. Moreover, it made the compiler code more complicated because it had to deal with this discrepancy (eg, when inspecting heap objects at runtime in GHCi).
- Similarly, because of the different internal representations, there are at least 71 packages on Hackage (among them some widely used ones such as bytestring and text) that explicitly depend on
integer-gmppackage or that provide a flag to depend either on
integer-simple. It is a maintenance burden because each code path could have specific bugs and should be tested on CI.
All this meant that every new big number implementation was a daunting task. First, the interface to implement was very large (Integer and Natural types and all their operations). Then, we needed to ensure that GHC’s rewrite rules (constant folding) still worked for the implementation, the packages mentioned above needed to be fixed, and new Cabal flags added (by the way, Cabal flags are Boolean so they can’t be easily extended to support more than two options). Finally, GHC’s build system needed to be modified. No wonder it never happened.
Fortunately, most of these issues are now fixed in the latest release, GHC 9.0.
The ghc-bignum package
Starting from GHC 9.0, big numbers support is provided by a single package:
ghc-bignum. This provides a Haskell implementation of big numbers (
native-backend) that is faster than
integer-simple’s (performance figures are given below), is also BSD3-licensed, and uses the same representation of big numbers as
Now the different big numbers implementations are considered as internal backends of
ghc-bignum library and there should be no observable difference between backends, except for performance. To enforce this, we even have a meta-backend used during tests that performs every operation with two backends (
native-backend and another selected one) and checks that their results are the same.
A pure Haskell implementation can't really expect to beat the performance of heavily optimized libraries such as GMP, hence
integer-gmp has been integrated into
ghc-bignum as a backend (
Adding a big numbers backend is now much easier. The interface to implement is minimal and is documented. A new backend doesn’t have to provide the whole implementation up front: operations provided by
native-backend can be used to fill the holes while another backend is being developed. The test framework doesn’t have to be rewritten for each backend and results can be checked automatically against
native-backend with the meta-backend mentioned above. We hope backends using other libraries, such as OpenSSL libcrypto integers or BSDNT, will be developed in the near future.
ghc-bignum package also has a third
ffi-backend that doesn’t provide an implementation per se but performs FFI calls for each operation. So
A major advantage of
ghc-bignum is that all the backends use the same representation for big numbers: an array of words stored in little-endian order. This representation is also used by most big numbers libraries. Formerly,
integer-simple was a notorious exception because it used a Haskell list to store the words, partly explaining its poor performance. Now, any package wanting to access the representation of the big numbers just has to depend on
ghc-bignum. Cabal flags and CPP code are no longer required to deal with the different implementations. However, conditional code may be needed during the transition from
integer-* packages to
To make the transition easier, the
integer-gmp-1.1 package is still provided but it has been rewritten to depend on
ghc-bignum and to provide some backward-compatible functions and pattern synonyms. Note, however, that some functions that were only available in
integer-gmp-1.0.* (eg, prime number test, extended GCD) have been removed in
integer-gmp-1.1. We expect these very specific functions to be exported by packages such as
hgmp instead. Alternatively, someone could implement Haskell versions of these functions into
GHC code has been simplified and made faster. Big numbers types and constructors are now known to the compiler (‘wired-in’), in the same way as other literals, so GHC doesn’t have to read interface files each time it wants to generate code using them. The unified representation avoids any need for two code paths, which makes the code more robust and easier to test.
We have compared the performance of
native-backend against latest
integer-gmp implementations (Figure 1). We measured the time necessary to compute basic operations:
Platform was Linux 5.5.4 on Intel Core i7-9700K CPU running at 3.60GHz.
The three GHC bindists have been built with Hadrian using ‘perf’ flavor.
integer-simple bindists are built from commit 84dd96105.
native-backend bindist is built from
ghc-bignum branch rebased on commit 9d094111.
Computations have been performed with positive integers of the following sizes: small – 1 word (64-bit); medium – 8 words (512-bit); big – 76 words (4,848-bit).
Figure 1. Native-backend and integer-gmp are faster than integer-simple in almost all cases (note the logarithmic scale)
Figure 1 shows that
integer-gmp are always faster than
integer-simple. The only exceptions are when we add or subtract a small number (1 word) from a big one as these operations are particularly well suited for
integer-simple’s bignum representation (a list) because the tail of the list remains unchanged and is shared between the big numbers before and after the operation. With the other representation, the tail has to be duplicated in memory.
integer-simple performs so badly that
native-backend is 40 times faster in all the tested cases.
native-backend is faster than
integer-gmp (eg, addition/subtraction with a small/medium number). GMP library is probably at least as good as
native-backend but the latter avoids FFI calls, which may explain the better performance. Otherwise,
native-backend is still slower than
integer-gmp but these results are expected because it only implements basic algorithms and it doesn’t use vectorised or optimised processor instructions.
When it comes to GHC performance tests, we took our baseline as GHC HEAD compiled with
integer-gmp. We compare the results with
gmp-backend. There are no regressions. Noticeable improvements in memory use are seen in Table 1.
Next, we compared metrics obtained with
native-backend to those obtained with GHC HEAD built with
integer-simple. The new Haskell implementation results in noticeable changes (Table 2). Note that the first four tests were disabled with
integer-simple because they took too long or failed to complete (heap overflow) but they all passed with
native-backend. Also, T10678, the final test in the table, performs a lot of additions of a small number to a big number. As we saw above, this is the only case for which
integer-simple representation is better: in most iterations only the head of the list is modified without duplicating the tail. It is reflected again in this result.
Finally, we compared
gmp-backend: it tells us how far our Haskell implementation is from our best backend. Noticeable changes are reported in Table 3.
We are very proud to have made this contribution to GHC. IOHK and the whole community now benefits from the improved performance and robustness of the compiler. Programs that use the new Haskell implementation of big numbers (
native-backend) should get a performance boost when they switch from
We are also eager to see what the community will do with this work. In particular, it should now be easier to create backends based on other libraries. There is also a lot of room for optimization in
native-backend. We also encourage everyone to test this new implementation and to report any issue on GHC’s bug tracker.