Saturday, 7 April 2012

More on atof()

Some more background on atof(). I analyzing sequencing data. That means, I am processing files with billions of lines. I haven't yet mastered properly how to do all I need in a parallel manner, so some of my operations are linear. That's where it all starts. At first, I had programs written in perl+python to analyze the data. The problem was that each file took many hours to precess. The rate of processing (as measured by pv) was round 300kb/s. Simply running 'cat' on the file works at a rate of 100Mb/s. Quite a lot faster. I was trying to do all my processing at the rate of 'cat'. Here is where the problems start - other than the fact that I switched to flex/C/C++.... The data to be analyzed has lots of numbers, and so I read and write a lot of numbers, and every read/write operation on numbers takes a long time, basically because computers store data in binary, and I'm storing text files with a decimal representation of numbers.
There are 3 solutions to this problem:
  1. Store numbers in binary in the file. For that one needs to define a good format for storing binary data. That wuold be really the best approach.
  2. Write or find a good BCD (Binary-coded decimal) library. I know... sounds old. A bit like perl, I guess. But, as long as one doesn't to too complicated math, this will be almost as fast as 1.
  3. improve read/write of numbers.
How much was there to improve? With sprintf my programs were running at rates of 10-30 MB/s, or 3-10 times "too slow".
I went on a hunt for an efficient itoa, but didn't really find much.
In the end, I wrote the itoa mentioned in my last post. That itoa was fixed width, now I also added variable width.
Here is a table of the relative speeds of various implementations:

size of number my itoa my itoa_fill0 itoa1 ufast_itoa10 new_itoa sstream i32toa sprintf
10^1 1.00 2.25 1.27 1.10 1.81 6.04 9.98 18.64
10^2 1.00 2.21 1.33 1.09 1.91 6.07 9.91 18.62
10^3 1.00 2.45 1.80 1.41 3.01 6.68 12.16 22.06
10^4 1.00 2.33 2.25 1.38 2.47 6.24 12.06 21.12
10^5 1.00 1.33 1.80 2.76 4.46 4.18 7.98 14.48
10^6 1.00 1.12 1.87 2.27 3.85 3.52 6.86 12.76
10^7 1.00 1.21 2.27 2.97 4.75 3.79 7.90 14.05
10^8 1.00 1.27 2.78 3.07 4.74 4.00 6.72 15.08
10^9 1.23 1.00 2.70 3.85 5.11 3.49 6.08 13.73

It is easy to see what the problem is. sprintf is really slow. sstream is actually surprisingly fast.... atoi next. Then, just reading and writing from files....