During the studies of database design with the accompanying toy project DuctileDB, I found some disturbing performance issues.
During the implementation of a needed log structured storage, I found that the performance is terribly slow and only about 1/100 of the performance expected. A simple check of the provided source code of FileInputStream reveals the issue: There is absolutely no optimization included.
As widely known (at least I expect it to be widely known), file systems are based on blocks and all files are stored cut into this blocks. That’s why a hard disk is also called a block device. Block devices are normally read buffered. To get a good performance, the buffer size needs to be at least the block size of the file system.
Surprisingly, at least for me, the FileInputStream is just reading every byte as single byte in the read() method. A poor performance was not a surprise anymore after the finding.
The first test was, to just decorate the FileInputStream with a BufferedInputStream. The default buffer size of BufferedInputStream is 8192 bytes or 8kB. I checked and for my systems, the block size is 512 bytes only. The performance increase was astonishing.
As a studied physicist, I wanted to know it better. So I started some measurements…
Measurement Results
The measurements took place on the systems I had at hand. The measurements were performed with a simple JUnit test. The test is part of PureSol Technologies’ Steaming Library. The class is called SequentialReadPerformanceTest.
The test creates a simple 1MB file and reads the files byte by byte. The measurement was performed ten times and a simple statistics was calculated which returns a minimum, maximum and of course a mean throughput. For the reading, the buffer size is varied from 0 bytes (a pure FileInputStream) up to 256kB. In the charts the 0 bytes buffer is drawn as 0.1kB, because of the logarithmic scale, a 0 was not shown.
I had three systems at hand:
Windows 10 on SSD:
BufferSize | minThroughput | maxThroughput | meanThroughput | sigmaThroughput |
0 | 0.478313481960232 | 0.560977316112137 | 0.53931803772942 | 0.026823943405206 |
1 | 20.7921956295732 | 159.661168284835 | 106.530929541767 | 41.5452866636502 |
2 | 135.890172930229 | 199.804763831118 | 159.301353868548 | 21.9993284291508 |
4 | 114.080929031247 | 185.829790831254 | 162.480645695998 | 20.4335777478615 |
8 | 119.472762811058 | 207.246561853822 | 173.82688161485 | 22.5238524242766 |
12 | 152.112263334608 | 212.048153787975 | 174.07568558316 | 18.3455536745854 |
16 | 160.007641966543 | 206.408535265054 | 177.837585133614 | 15.110967892046 |
24 | 122.760600929588 | 200.196611853492 | 174.685818018924 | 22.1811746856282 |
32 | 130.191143000353 | 205.116448207889 | 175.602014380743 | 23.4884718567785 |
40 | 189.830287548959 | 238.60181506275 | 217.714568123906 | 14.6494451455874 |
48 | 179.301964860046 | 257.090941811432 | 211.133906654611 | 24.3405587930466 |
56 | 180.374229186133 | 253.724040842617 | 220.05957106526 | 22.9934585645487 |
64 | 169.576206649008 | 252.181203292139 | 214.61583945847 | 22.28789046186 |
128 | 172.901240725885 | 242.260583721361 | 214.784464231356 | 22.0707625968981 |
256 | 158.073193513394 | 235.072835911773 | 198.832874007776 | 28.4353501136663 |
This is my development laptop with a WIndows 10 installed on a SSD. Filesystem is NTFS with a block size of 512 bytes.
VirtualBox on SSD:
BufferSize | minThroughput | maxThroughput | meanThroughput | sigmaThroughput |
0 | 3.30897090085418 | 3.4685246511367 | 3.37156905146268 | 0.058287703014056 |
1 | 45.8442138958278 | 204.489511433226 | 149.196065250552 | 54.546756256114 |
2 | 169.028569641574 | 213.338498388938 | 191.078173076755 | 16.1087752022564 |
4 | 175.832342723626 | 220.551930904206 | 204.223397588209 | 14.7572527618542 |
8 | 167.40312316305 | 212.853555391402 | 200.302253142487 | 14.2208722932322 |
12 | 174.043085529545 | 221.548785267493 | 200.891618472135 | 15.2731165631726 |
16 | 178.626872017884 | 223.56519816547 | 204.200373947012 | 15.2758384736513 |
24 | 181.218954353416 | 225.566526761738 | 206.516876975593 | 13.137885697191 |
32 | 223.296922708024 | 265.152151154852 | 245.022329558141 | 13.661324942683 |
40 | 216.037271505894 | 256.377367956888 | 240.406641413718 | 13.4568412728096 |
48 | 194.605412997549 | 258.278920524087 | 238.66016565359 | 18.617845389739 |
56 | 218.857124728457 | 268.919769748526 | 241.338112649669 | 16.2068776298014 |
64 | 215.220983192018 | 262.003958883976 | 240.601463050768 | 14.6838561989163 |
128 | 207.162624999802 | 258.216081521909 | 237.674988725484 | 14.2341770113763 |
256 | 202.084528616352 | 262.459082128095 | 234.696119688756 | 19.2935202838983 |
This VirtualBox runs on the Windows 10 system mentioned above. As guest system an Ubuntu 17.10 is installed. The filesystem is btrfs with a block size of 512 bytes.
BufferSize | minThroughput | maxThroughput | meanThroughput | sigmaThroughput |
0 | 3.50369275983678 | 3.64641950704795 | 3.59516953215679 | 0.042635735226103 |
1 | 41.0512347657029 | 256.14289221404 | 210.779922949005 | 78.5887297940257 |
2 | 232.373524058612 | 285.055321462553 | 263.672741286731 | 12.7495562706978 |
4 | 255.566173924001 | 272.880340559129 | 264.701869296095 | 5.75897219156574 |
8 | 256.293085158907 | 278.836772514138 | 268.961324482104 | 6.65276931366453 |
12 | 255.882054365566 | 276.077129595719 | 267.372542695553 | 7.72968634110117 |
16 | 258.613027035372 | 275.235002420108 | 269.684282122623 | 5.18769137033597 |
24 | 171.856825532156 | 276.072041550241 | 251.919547286073 | 27.995974992746 |
32 | 252.29570040848 | 270.142097067889 | 264.173763360173 | 5.0979184809125 |
40 | 289.992165109599 | 304.345453637775 | 295.727761674292 | 3.72938830994006 |
48 | 290.484623991654 | 299.758266839562 | 295.794604575198 | 2.85277669207739 |
56 | 291.98249401528 | 305.188024108252 | 297.294492472613 | 3.86242932963834 |
64 | 292.089367018439 | 299.632351890619 | 294.804240737189 | 2.2082451746197 |
128 | 255.519590730416 | 302.032878151745 | 294.733138187711 | 13.2476788765596 |
256 | 285.858694354284 | 302.92348363624 | 295.621763218591 | 5.44402902250886 |
This is an Ubuntu 16.04 guest inside of KVM running on another Ubuntu 16.04. The host has magnetic disks running in RAID1 soft raid with a btrf filesystem with 512 bytes block size. The guest has a XFS file system with a block size of 512 bytes.
Summary and Recommendations
There several are some findings in the graphs above:
- A FileInputStream without any additional buffer provides a terrible performance. Astonishingly, the difference of the performance of a pure FileInputStream and the maximum performance is almost two orders of magnitude. At least, for me it is surprising.
- On the virtual environments, the minimum performance of FileInputStream is not as bad as on the ‘bare metal’ Windows 10. I guess, it is related to additional buffering between the host and guest environment. More investigations would help here. Not sure, whether I can do that in future.
- The maximum performance is for buffers of 32kB or 40kB. As the block size is only 512 bytes, this is surprising for me as well. At the moment, I do not have a concise explanation for that.
- The default buffer size of BufferedInputStream of 8 kB, provides already a good performance gain.
These observations lead to some recommendations of mine:
- As soon as you read from file in Java, always use a BufferedInputStream to dramatically speed up reading.
- As long as the best buffer size is not known, the default buffer size of BufferedInputStream can be used.
- I did not do measurements here, but I know from experience, that for every FileOutputStream a BufferedOutputStream also provides a much better performance. Without a better knowledge, we can assume that the throughput speed ups are similar. Therefore, for every FileOutputStream, put a BufferedOutputStream decorator around it.
If I find some time, I will also try to find out what the correct behavior of FileOutputStream is.
Based on the findings and recommendations, I will develop a special FileOutputStream in PureSol Technologies’ Streaming Library. The buffer size will be configurable via a system property. Additionally, I also think about a mechanism to automatically find the best performance during static initialization to maximize performance.
So, have a look from time to time at the library and this blog to find more information on this topic and news on the progress of the library.