<span style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" data-mce-type="bookmark" class="mce_SELRES_start"></span>
During the studies of database design with the accompanying toy project DuctileDB, I found some disturbing performance issues.

During the implementation of a needed log structured storage, I found that the performance is terribly slow and only about 1/100 of the performance expected. A simple check of the provided source code of FileInputStream reveals the issue: There is absolutely no optimization included.

As widely known (at least I expect it to be widely known), file systems are based on blocks and all files are stored cut into this blocks. That’s why a hard disk is also called a block device. Block devices are normally read buffered. To get a good performance, the buffer size needs to be at least the block size of the file system.

Surprisingly, at least for me, the FileInputStream is just reading every byte as single byte in the read() method. A poor performance was not a surprise anymore after the finding.

The first test was, to just decorate the FileInputStream with a BufferedInputStream. The default buffer size of BufferedInputStream is 8192 bytes or 8kB. I checked and for my systems, the block size is 512 bytes only. The performance increase was astonishing.

As a studied physicist, I wanted to know it better. So I started some measurements…

Measurement Results

The measurements took place on the systems I had at hand. The measurements were performed with a simple JUnit test. The test is part of PureSol Technologies’ Steaming Library. The class is called SequentialReadPerformanceTest.

The test creates a simple 1MB file and reads the files byte by byte. The measurement was performed ten times and a simple statistics was calculated which returns a minimum, maximum and of course a mean throughput. For the reading, the buffer size is varied from 0 bytes (a pure FileInputStream) up to 256kB. In the charts the 0 bytes buffer is drawn as 0.1kB, because of the logarithmic scale, a 0 was not shown.

I had three systems at hand:

Windows 10 on SSD:

BufferSize	minThroughput	maxThroughput	meanThroughput	sigmaThroughput
0	0.478313481960232	0.560977316112137	0.53931803772942	0.026823943405206
1	20.7921956295732	159.661168284835	106.530929541767	41.5452866636502
2	135.890172930229	199.804763831118	159.301353868548	21.9993284291508
4	114.080929031247	185.829790831254	162.480645695998	20.4335777478615
8	119.472762811058	207.246561853822	173.82688161485	22.5238524242766
12	152.112263334608	212.048153787975	174.07568558316	18.3455536745854
16	160.007641966543	206.408535265054	177.837585133614	15.110967892046
24	122.760600929588	200.196611853492	174.685818018924	22.1811746856282
32	130.191143000353	205.116448207889	175.602014380743	23.4884718567785
40	189.830287548959	238.60181506275	217.714568123906	14.6494451455874
48	179.301964860046	257.090941811432	211.133906654611	24.3405587930466
56	180.374229186133	253.724040842617	220.05957106526	22.9934585645487
64	169.576206649008	252.181203292139	214.61583945847	22.28789046186
128	172.901240725885	242.260583721361	214.784464231356	22.0707625968981
256	158.073193513394	235.072835911773	198.832874007776	28.4353501136663

This is my development laptop with a WIndows 10 installed on a SSD. Filesystem is NTFS with a block size of 512 bytes.

VirtualBox on SSD:

BufferSize	minThroughput	maxThroughput	meanThroughput	sigmaThroughput
0	3.30897090085418	3.4685246511367	3.37156905146268	0.058287703014056
1	45.8442138958278	204.489511433226	149.196065250552	54.546756256114
2	169.028569641574	213.338498388938	191.078173076755	16.1087752022564
4	175.832342723626	220.551930904206	204.223397588209	14.7572527618542
8	167.40312316305	212.853555391402	200.302253142487	14.2208722932322
12	174.043085529545	221.548785267493	200.891618472135	15.2731165631726
16	178.626872017884	223.56519816547	204.200373947012	15.2758384736513
24	181.218954353416	225.566526761738	206.516876975593	13.137885697191
32	223.296922708024	265.152151154852	245.022329558141	13.661324942683
40	216.037271505894	256.377367956888	240.406641413718	13.4568412728096
48	194.605412997549	258.278920524087	238.66016565359	18.617845389739
56	218.857124728457	268.919769748526	241.338112649669	16.2068776298014
64	215.220983192018	262.003958883976	240.601463050768	14.6838561989163
128	207.162624999802	258.216081521909	237.674988725484	14.2341770113763
256	202.084528616352	262.459082128095	234.696119688756	19.2935202838983

This VirtualBox runs on the Windows 10 system mentioned above. As guest system an Ubuntu 17.10 is installed. The filesystem is btrfs with a block size of 512 bytes.

KVM on magnetic disks:

BufferSize	minThroughput	maxThroughput	meanThroughput	sigmaThroughput
0	3.50369275983678	3.64641950704795	3.59516953215679	0.042635735226103
1	41.0512347657029	256.14289221404	210.779922949005	78.5887297940257
2	232.373524058612	285.055321462553	263.672741286731	12.7495562706978
4	255.566173924001	272.880340559129	264.701869296095	5.75897219156574
8	256.293085158907	278.836772514138	268.961324482104	6.65276931366453
12	255.882054365566	276.077129595719	267.372542695553	7.72968634110117
16	258.613027035372	275.235002420108	269.684282122623	5.18769137033597
24	171.856825532156	276.072041550241	251.919547286073	27.995974992746
32	252.29570040848	270.142097067889	264.173763360173	5.0979184809125
40	289.992165109599	304.345453637775	295.727761674292	3.72938830994006
48	290.484623991654	299.758266839562	295.794604575198	2.85277669207739
56	291.98249401528	305.188024108252	297.294492472613	3.86242932963834
64	292.089367018439	299.632351890619	294.804240737189	2.2082451746197
128	255.519590730416	302.032878151745	294.733138187711	13.2476788765596
256	285.858694354284	302.92348363624	295.621763218591	5.44402902250886

This is an Ubuntu 16.04 guest inside of KVM running on another Ubuntu 16.04. The host has magnetic disks running in RAID1 soft raid with a btrf filesystem with 512 bytes block size. The guest has a XFS file system with a block size of 512 bytes.

Summary and Recommendations

There several are some findings in the graphs above:

A FileInputStream without any additional buffer provides a terrible performance. Astonishingly, the difference of the performance of a pure FileInputStream and the maximum performance is almost two orders of magnitude. At least, for me it is surprising.
On the virtual environments, the minimum performance of FileInputStream is not as bad as on the ‘bare metal’ Windows 10. I guess, it is related to additional buffering between the host and guest environment. More investigations would help here. Not sure, whether I can do that in future.
The maximum performance is for buffers of 32kB or 40kB. As the block size is only 512 bytes, this is surprising for me as well. At the moment, I do not have a concise explanation for that.
The default buffer size of BufferedInputStream of 8 kB, provides already a good performance gain.

These observations lead to some recommendations of mine:

As soon as you read from file in Java, always use a BufferedInputStream to dramatically speed up reading.
As long as the best buffer size is not known, the default buffer size of BufferedInputStream can be used.
I did not do measurements here, but I know from experience, that for every FileOutputStream a BufferedOutputStream also provides a much better performance. Without a better knowledge, we can assume that the throughput speed ups are similar. Therefore, for every FileOutputStream, put a BufferedOutputStream decorator around it.

If I find some time, I will also try to find out what the correct behavior of FileOutputStream is.

Based on the findings and recommendations, I will develop a special FileOutputStream in PureSol Technologies’ Streaming Library. The buffer size will be configurable via a system property. Additionally, I also think about a mechanism to automatically find the best performance during static initialization to maximize performance.

So, have a look from time to time at the library and this blog to find more information on this topic and news on the progress of the library.

Rick-Rainer-Ludwig.com

Java: Performance issues with FileInputStream

Measurement Results

Summary and Recommendations

Leave a Reply Cancel reply