Compression project >>Video Area Home
Eighth MPEG-4 AVC/H.264 Video Codecs Comparison - Standard Version
MSU Graphics & Media Lab (Video Group)
Video group head: Dr. Dmitriy Vatolin Project head: Dr. Dmitriy Kulikov Measurements, analysis: Marat Arsaev
Video group head: Dr. Dmitriy Vatolin
Project head: Dr. Dmitriy Kulikov
Measurements, analysis: Marat Arsaev
Now it contains Appendixes with GPU encoders comparsion and Very High Speed Encoders comparison.
Different Versions of ReportThere are two different versions of H.264 Comparison 2012 report: Here is the comparison of the versions:
Pro version of comparison will be available immediately after report purchasing.
Video Codecs that Were Tested
|Sequence||Number of frames||Frame rate||Resolution|
|VideoConference (5 sequences)|
|Movies (10 SD sequences)|
|HDTV sequences (16 sequences)|
Objectives and Testing Tools
H.264 Codec Testing ObjectivesThe main goal of this report is the presentation of a comparative evaluation of the quality of new H.264 codecs using objective measures of assessment. The comparison was done using settings provided by the developers of each codec. The main task of the comparison is to analyze different H.264 encoders for the task of transcoding video—e.g., compressing video for personal use. Speed requirements are given for a sufficiently fast PC; fast presets are analogous to real-time encoding for a typical home-use PC.
H.264 Codec Testing Tools
Overall ConclusionsOverall, the leader in this comparison for software encoders is x264, followed by MainConcept, DivX H.264 and Elecard.
The overall ranking of the software codecs tested in this comparison is as follows:
- DivX H.264
- Intel Ivy Bridge QuickSync
- MainConcept CUDA
This rank is based only on the encoders’ quality results. Encoding speed is not considered here.
Professional Versions of Comparison ReportH.264 Comparison Report Pro 2012 version contains:
The Graphics & Media Lab Video Group would like to express its gratitude to the following companies for providing the codecs and settings used in this report: The Video Group would also like to thank these companies for their help and technical support during the tests.
Codec Analysis and Tuning for Codec Developers and Codec Users
Computer Graphics and Multimedia Laboratory of Moscow State University:We could perform next task for codec developers and codec users.
Strong and Weak Points of Your Codec
Independent Codec Estimation Comparing to Other Codecs for Different Use-cases
Encoder Features Implementation Optimality AnalysisWe perform encoder features effectiveness (speed/quality trade-off) analysis that could lead up to 30% increase in the speed/quality characteristics of your codec. We can help you to tune your codec and find best encoding parameters.
See all MSU Video Codec Comparisons
MSU video codecs comparisons resources:
Other MaterialsVideo resources:
Having played around with video since I had a few multimedia CD-ROMs and a BT878-based TV tuner card, video compression is one area that has amazed me. I watched as early “simple” compression efforts such as Cinepak and Indeo bought multimedia to CD-ROMs running at 1x to 2x, good enough for interactive encyclopedias and music video clips. The quality wasn’t as good as TV, but it was constrained by the computing power available then.
Because of the continual increase in computing power, I watched as MPEG-1 bought VCDs and VHS quality to the same amount of storage as normally taken by uncompressed CD-quality audio. Then MPEG-2 heralded the era of the DVD, SVCD and most of the DVB-T/DVB-S transmissions, with a claimed doubling of compression efficiency. Before long, MPEG-4/H.263 (ASP) was upon us, with another doubling, enabling a lot of “internet” video (e.g. DIVX/XVID). Another bump was achieved with MPEG-4/H.264 (Part 10 – AVC) which improved efficiency to the point where standard definition “near-DVD-quality” could be fit into the same sort of space as CD-quality audio.
Throughout the whole journey, I have been doing my own video comparisons, but mostly empirically by testing out several settings and seeing how I liked them. In the “early” days of each of these standards, it was a painful but almost necessary procedure to optimize the encoding workflow and achieve the required quality. I had to endure encode rates of about an hour for each minute of video when I first started with MPEG-1, then with MPEG-2, MPEG-4 ASP, and then MPEG-4 AVC. Luckily, the decode rates were often “sufficiently fast” to be able to render the output in real-time.
Developments in compression don’t stop. Increased computing power allows more sophisticated algorithms to be implemented. Increasing use of internet distribution and continual pressure on storage and bandwidth provide motivation to transition to an even more efficient form of compression, trading off computational time for better efficiency. Higher resolutions, such as UHD 4K and 8K, are likely to demand such improvements to become mainstream and to avoid overtaxing the limited bandwidth available in distribution channels.
The successor, at least in the MPEG suite of codecs, is MPEG-H Part 2, otherwise known as High Efficiency Video Coding (HEVC) or H.265. This standard was first completed in 2013, and is slowly seeing adoption owing to the increase in 4K cameras and smartphone SoCs with inbuilt hardware accelerated decoding/encoding and promises another almost halving of bitrate for the same perceptual quality. Unfortunately, licensing appears to be one of the areas which are holding HEVC back.
Of course, it’s not the only “next generation” codec available. VP9 (from Google) directly competes with HEVC, and has been shown by some to have superior encoding speed and similar video performance, although support is more limited. Its successor has been rolled into AOMedia Video 1, which is somewhat obscure at this time. From the Xiph.Org team, there is Daala, and from Cisco there is Thor. However, in my opinion, none of these codecs have quite reached the “critical mass” of adoption to make it hardware-embraced and universally-accessible as the MPEG suite of codecs has.
I did some initial informal testing on H.265 using x265 late last year, but it was not particularly extensive because of time limitations and needing to complete my PhD. As a result, I didn’t end up writing anything about it. This time around, I’ve decided to be a little more scientific to see what would turn up.
Before I go any further, I’ll point out that video compression testing is an area where there are many differing opinions and objections to certain types of testing and certain sorts of metrics. As a science, it’s quite imprecise because the human physiological perception of video isn’t fully understood, thus there are many dissenting views. There are also many settings which can be altered in the encoding software which can impact on the output quality, and some people have very strong opinions about how some things should be done. The purpose of this article isn’t to debate such issues, although where there are foreseeable objections, I will enclose some details in blockquotes, such as this paragraph.
The main motivation of the experiment was to understand more about how x265 compares in encoding efficiency compared to x264. Specifically, I was motivated by this tooltip dialog in Handbrake that basically says “you’re on your own.”
As a result, I had quite a few questions I wanted to answer in as short a time as possible:
- What is the approximate bitrate scale for the CRF values and how does it differ for x264 vs. x265?
- How does this differ for content that’s moderately easy to encode, and others which are more difficult?
- How do x264 CRF values and x265 CRF values compare in subjective and synthetic video quality benchmarks?
- What are the encoding speed differences for different CRF values (and consequently bitrates), and how does x264 speed compare to x265 speed?
- How do my different CPUs compare in terms of encoding speed?
- Does x265 handle interlaced content properly?
As a result, I had to develop a test methodology to try and address these issues.
Two computers running Windows 7 (updated to the latest set of patches at publication) were used throughout the experiment – an AMD Phenom II x6 1090T BE @ 3.9Ghz was used to encode the “difficult case” set of clips, and an Intel i7-4770k @ 3.9Ghz was used to encode the “average case” set of clips. The encoding software was Handbrake 0.10.5 64-bit edition. The x264 encoding was performed by x264 core 142 r 2479 dd79a61, and the x265 encode was performed by x265 1.9.
The test clips were encoded with Handbrake in H.264 and H.265 for comparison at 11 different CRF values, evenly spaced from 8 to 48 inclusive (i.e. spaced by 4). For both formats, the preset was set to Very Slow, and encoding tuning was not used. The H.264 profile selected was High/L4.1, whereas for H.265, the profile selected was Main. It was later determined that the H.265 level was L5, thus there is some disparity in the featuresets, however, High/L4.1 is most common for Blu-Ray quality 1080p content, and a matching setting was not available in Handbrake for x265. In additional options, interlace=tff was used for the difficult case to correspond with the interlaced status of the content. No picture processing (cropping, deinterlacing, detelecining, etc.) within Handbrake was enabled.
Final bitrates were determined using Media Player Classic – Home Cinema’s information dialog and confirmed with MediaInfo. Encoding rate was determined from the encode logs. As the AMD system was my “day to day” system, it was in use during several encodes resulting in outlying reduced encode rate numbers. These have been marked as outliers.
The encoded files and the source file were then transcoded into a lossless FFV1 AVI file using FFmpeg (version N-80066-g566be4f built by Zeranoe) for comparison (noting that no colourspace conversion occured, the file remained YUV 4:2:0). This was due to unusual behaviour being witnessed if this was not done resulting in implausible SSIM/PSNR figures. Frame alignment of the files was verified using Virtualdub and checking for scene change frames – in the case of the “difficult case” video, the first frame of the source file was discarded as Handbrake did not encode that frame to maintain video length and frame alignment. The “average case” video did not need any adjustments.
Pairs of files were compared for SSIM and PSNR using the following command:ffmpeg -i [test] -i [ref] -lavfi "ssim;[0:v][1:v]psnr" -f null -
Results were recorded and reported. Produced data is available in the Appendix at the end of this post. If it is not visible, please click the more link to access it.
Two frames from each video were extracted, and a 320×200 crop from a detailed section was assembled into a collage for still image comparison. The frames were chosen to be at least two frames away from a scene cut to avoid picking a keyframe. This was performed using FFmpeg extracting into .bmp files (conversion from YUV 4:2:0 to RGB24), and then using Photoshop and exporting to a lossless PNG to avoid corrupting the output.
Subjective video quality was assessed using my Lenovo E431 laptop connected to a Kogan 50″ LED TV. This was prior calibrated by eye to ensure highlights and shadows do not clip. Testing was done with viewing at 2.5*H distance from the screen in a darkened room. Overscan correction was applied, however, all other driver-related enhancements were disabled. Use of frame rate mode switching in MPC-HC was used to avoid software frame-rate conversion. TV motion smoothing was not available, thus ensuring the viewed result is consistent with the encoded data. Subjective opinions at each rate were recorded.
The clips used were:
Approximations of the clips used are linked above (YouTube), however, the actual video files differ slightly (especially with difficult case where the online video is missing a few tens of seconds). The encoding by YouTube is also relatively poor by comparison to the source. Unfortunately, as the source clips are copyrighted, I can’t distribute them.
The choice of the clips was for several reasons – I had good quality sources of both samples which meant a better chance of seeing encoding issues, I was familiar with both clips, and both clips feature segments with high sharpness details. In the case of the difficult case, that clip is especially tricky to encode as the background has high spatial frequency detail, whereas the “focal point” of the dancing girl-group members have relatively “low” frequency detail, thus encoders often get it wrong and devote a lot of attention to the background. It also has a lot of flashing patterns which are quite “random” and require high bitrates to avoid turning into “mush”. (I did consider using T-ARA – Bo Peep as the difficult case clip, but that was mostly “fast cuts” increasing the difficulty, rather than any tricky imagery, plus my source quality was slightly lower.)
At this point, some people will have objection about the use of compressed material as the source. Normal objections include the potential for preferencing H.264 as the material was H.264 coded before, and the potential for loss of detail as to render high CRF encodes “meaningless”.
However, I think it’s important to keep in mind that if you expect the output to resemble the potentially imperfect result of the compressed input, this is less of an issue. The reference is the once-encoded video.
The second thing to note is that I’ve chosen sample clips I have with the highest bitrate and cleanest quality I have available – this maximises the potential for noticing encoding problems.
Thirdly, it’s also important to note that transcoding is a legitimate use of the codec – most people do not have the equipment to acquire raw footage and most consumer grade cameras already have compressed the footage. Other users are likely to be format-shifting and transcoding compressed to compressed. Thus testing in a compressed to compressed scenario is not invalid.
Results: Bitrate vs CRF
It’s an often touted piece of advice that a change of CRF by +/- 6 will halve/double the bitrate. Suggested rate-factors are normally around 19 to 23 roughly. Because I had no idea what a certain CRF value would produce bit-rate wise, and whether x265 adheres to the same convention, I found out by plotting the resulting bitrates on a semi-log plot and curve fitting.
In the case of difficult case for x264, the upper end CRF 8 bitrate fell off because it had reached the limits of the [email protected] profile. Aside from that, the lines are somewhat wavy but still close to an exponential function with exponent ranges from -0.108 to 0.136.
As a result, from the curve fits, it seems that for x265, we observed that it takes a CRF movement of 5.09667 to 5.5899 to see a halving/doubling in size. For x264, it took 5.68153 to 6.41801 to see a halving/doubling in size. It seems that x265 is slightly more sensitive to the CRF value in setting its bitrate (average ~5.34 as opposed to 6.05).
Readers may be concerned that my x264 examples involve using a different profile and level ([email protected]) versus the x265 ([email protected]). It is acknowledged that it will cap the output quality – in future, I’ll try to match the encode levels but that is not directly configurable for x265 at present from Handbrake.
Results: Bitrate Savings at CRF Value
On the assumption that the CRF values correspond to the same quality of output, how much bitrate do we save? I tried to find out by comparing the bitrate values at given CRFs.
The answer is less straightforward than expected. For the difficult case, the x265 output averaged 92% of the x264 output but varied quite a bit – in some cases at higher CRFs being larger than the x264 output. The average case displayed an average size of 59% which is more in-line with expectations and is mostly stable around the commonly-used CRF ranges.
Then, naturally, comes the actual question of whether the CRF values provide the same perceived quality.
Results: SSIM and PSNR
There are two main methods used to evaluate video quality – namely Structual Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR). These metrics are widely used, and are easily accessible thanks to FFmpeg filters. Their characteristics differ somewhat, with SSIM attempting to be more perceptual, so it’s helpful to look at both.
At this point, many encodists may point out the existence of many other, potentially better, video quality judgement schemes. Unfortunately, they’re less easily accessible, they’re less widely used, and there will almost certainly be debates as to whether they correlate with perception or not.
This area is continually being contested, so I’d rather stick to something which has been widely used and known as the caveats are also known to some extent. In the case of SSIM and PSNR, one of the biggest disadvantages to my knowledge is that it has no temporal assessment of quality. They are also source-material sensitive, and are not very valid when comparing across different codecs. Of course, we can’t rely solely on synthetic benchmarks.
We first take a look at the SSIM versus CRF graph. In this graph using the normalized (to 1) scale of SSIM, we can see the quality “fall-off” as CRF values are increased. The slope is steeper for the difficult case clips compared to the average case. In the case of the average case, the SSIM is almost tit-for-tat x265 vs x264 at each CRF value with the exception of CRF 48. Between the difficult case clips, there is a ~0.015 quality difference favouring x264.
For fun, we can also plot this against bitrate to see what happens. In the average case, the lines are very close together, and the quality takes an abrupt turn for the worse at about 4Mbit/s. In all but the highest bitrates, x265 has an advantage. The difficult case shows a less pronounced knee, and has x264 leading. A potential explanation for this can be seen in the subjective viewing section.
To see differences in the high end more clearly, we can plot the dB value of SSIM. We can see that at lower CRFs (<20) for the average case, x264 actually pulls ahead for a higher SSIM. Whether this is visible, or even a positive impact will need to be checked, as cross-codec comparisons are not as straightforward.
Repeating for bitrate, we see the same sort of story as we saw with the normalized values.
Looking at the PSNR behaviour shows that there are only minor differences throughout, with an exception at the lowest CRF . The minimum PSNR also seems to “level out” at high CRF values, so the “difference” in quality between the best and worst frames is lower. In all, there’s really no big difference between PSNRs for the average case between x264 and x265 on a CRF value basis.
The difficult case shows a fairly similar result, without major differences with the exception at the low CRF end where H.264 profile restrictions prevented the bitrate from going any higher, limiting the potential PSNR. Interestingly, the PSNR variance increased for x264 as the CRF was increased so as to hit the bitrate limits – so while the PSNR average is better, the worst frame was more poorly encoded to make that happen.
Plotting the same plots versus bitrate doesn’t reveal much more.
It seems on the whole, both PSNR and SSIM metrics achieved similar values for corresponding x264 and x265 CRF values. As a result, at least from a synthetic quality standpoint, the quality of x264 and x265 encodes at the same CRF are nearly identical, implying a bitrate saving averaging 41% can be achieved in the average case (and just 8% for the difficult case).
Results: Encode Rate
Of course, with every bitrate saving comes a compute penalty, so it’s time to work that out.
First by plotting CRF values, we can see that the Intel machine that encoded the “average case” files was much faster than the older AMD machine that encoded the “difficult case” files. Interestingly, the encode speed increased as the CRF increased (i.e. lower bitrates) for the Intel machine but didn’t really show as strong of a relationship for the AMD machine. The fall off in encode rate as CRF increased to 48 may have to do with reaching “other” resource limitations within the CPU.
The same thing is plotted versus the resulting bitrate. Overall, the encode rates (excluding purple outlier data points) show that x265 achieves on average just 15.7% of the speed of x264 on the Intel machine, and 4.8% of the speed on an AMD machine. Older machines are probably best sticking to x264 because of the significant speed difference. The difference in the encode rates at lower bitrates/higher CRFs may be due to different performance optimizations and cache sizes between the CPUs.
This also highlights a potential pitfall for buyers deciding whether to upgrade or not, and are basing their decision on a single metric such as CPUBenchmark scores. In our case:AMD PhenomII x6 1090T BE 5676 @ 3.2Ghz 6918 @ 3.9Ghz (scaled for clock rate) Intel Core i7-4770k 10131 @ 3.5Ghz 11289 @ 3.9Ghz (scaled for clock rate)
This would mean that we would expect that the i7-4770k would perform at 163% of the AMD PhenomII x6 1090T BE. In reality, it performed at 213% on x264 and 637% on x265. Quite a big margin of difference.
Results: Still Image Samples
Lets take a look at some selected still image samples to see how the different CRFs compare. I suppose publishing small image samples for the case of illustrating the encoding quality is fair use … and while I could theoretically use artificially generated clips or self-shot clips, I don’t think that would represent the quality and characteristics of a professionally produced presentation which would skew the encoding results.
Yes, I know, you’re going to scream at me because the human eye doesn’t perceive video as “each frame being a still picture” and some of the quality degradation might not be noticeable. But hey, this is the next best thing …
Average Case #1
This is frame #215 from the source, where SinB stares inquisitively into a sideways camera. This frame is chosen due to pure detail, especially in shadows.
For x265, starting at CRF 20, I can notice some alterations in hair structure where some of the finer hairs have been “spread” slightly. Even CRF 16 isn’t immune to this, but its image quality is good. CRF 12 is indistinguishable from source. CRF 24 continues the quality slide and makes it a bit blotchy, whereas CRF 28 is obviously corrupting the quality of the eyebrows as well which are now just a smear and subtle details in the eyebrows and lower eyelid edge are missing.
The character of x264 is different, where impairments are not primarily in detail loss initially, instead, edges seem to gain noise. CRF 20 in the hair, has some odd coloured blocks, and the skin edge seems to be tainted with edge colour issues. The hair is slightly smoother than CRF 16 which appears much sharper and “straighter”. CRF24 makes a royal mess of the hair, turning it into blotches, and CRF 28 turns it into an almost solid block while losing details in the eyebrows and eyelid.
Average Case #2
This is frame #4484 from the source, a bridge scene where the members of Gfriend are seen running across. The scene is particularly sharp, and the bars of the bridge form a difficult encoding challenge, with high detail in the planks and the water running below.
The x265 encode at CRF 16 seems indistinguishable for the most part. However, at CRF 20, Yuju’s finger has a “halo”to the left of it, and Sowon’s red pants are starting to “merge” into the bars of the bridge somewhat. CRF 24 seems to worsen the halos around the fingers, and now, noise around heads passing the concrete can be seen, and the pants merging with the bridge bars is getting worse. CRF 28 is obviously starting to smooth a lot, and blockiness is obvious in the pants.
For x264, the impairments at CRF 28 was more sparkles and blocky posterization/quilting. CRF 24 showed a “pre-echo” of Yuju’s finger as well, which disappeared in CRF 20. CRF 20 appears to have lost some detail in the concrete beam behind, but isn’t bad at all.
Difficult Case #1
This is frame #1092, where Jessica (now ex-member of Girls’ Generation) had a solo shot. The frame was chosen because of the high detail in the eyes and hair.
Unfortunately, in the case of this clip, some of the detail was already lost in the encoding at the “source”, so we need to compare with an obviously degraded original.
For x265, the most obvious quality losses begin at about CRF 24 where the hair to the side seems to go slightly flatter in definition and some of the original blockiness (a desirable quality) is lost. By CRF 28, the hair looks like it’s pasted on with the loose strands being a little ill defined, and CRF 32 causes her to lose her eyebrows entirely.
For x264, CRF 20 maintains some of the original blockiness, but CRF 24 is visibly less defined in the hair in terms of the original blockiness. The difference is very minor, but by CRF 28, a similar loss of hair fidelity is seen but instead, it looks a little sharper but much noisier.
Difficult Case #2
This was frame #5827 where Yoona (left) and Tiffany (right) are dancing in front of the LED display board.
In the x265 case, in light of the messiness of the source, even CRF 24 looks acceptable. By CRF 28, Yoona’s almost completely lost her eyebrows and most of the facial definition, whereas Tiffany’s nose has a secondary “echo” outline. By comparison, the x264 encode looks a bit sharper, with some more visual noise around the facial features as if they’ve been sharpened resulting in some bright noise spots in CRF 24 and CRF 28. This clip is particularly tough to judge.
The still image samples seem to show that the necessary CRF to attain visually acceptable performance varies as a function of the input material. This is not unexpected, however, in the case of the more clear and simple material, CRF 12 was indistinguishable, CRF 16 was extremely good and CRF 20 was considered acceptable. For the more complex material, CRF 20 was considered good, and CRF 24 was considered somewhat acceptable.
Results: Subjective Viewing
I spent quite a few hours in front of my large TV checking out the quality of the video. In this way, the temporal quality and perception-based quality of the videos can be assessed.
On the whole, I would have to agree that the x264 CRF values produce very similar acceptance levels on x265. I would probably accept CRF 12 as being visually lossless for the average case material, CRF 16 as hard to discern near-lossless and CRF 20 as “watchable”. This is because I’m especially picky when it comes to quality and minor flaws when I watch material that I’m familiar with (and I always wonder how people put up with YouTube and other streaming services which so obviously haven’t got enough bitrate).
The key difference is the type of impairments that occur with x264 vs x265. In bitrate starvation, x264 appears to be sharper and goes into a blocky-mode of degradation preferring to retain sharp details even if it makes it look noisy. In contrast, x265 starts smoothing areas of lower detail, while “popping” sharpness into the areas that have finer details. This does sometimes look a bit un-natural. It also starts dropping motion where it is small, resulting in motion artifacts and jumpiness, but on the whole, this might be slightly less objectionable depending on your personal opinion.
With the difficult case data, we have a bit of a different opinion where CRF 16 is visually indistinguishable, and CRF 20 is almost indistinguishable. I would have to agree that x264 is better for this case and appeared more visually clean even at higher CRFs. This seems to be because the noise in x264 is “disguised” better in the patterning of the LED lights, whereas the smoothing in x265 becomes more obvious.
But a second, and more important issue, is the presence of a field oddity post-deinterlacing for the x265 clips, especially at CRF > 20.
The oddity results in “stripes” appearing every n pixels vertically as if there is something wrong with the fields there.
Using FFmpeg’s FFV1 decoded lossless file, examining it seems to show the encoded result actually does have the oddity in the fields. The reasoning for it isn’t clear at this stage, but may be related to a encode unit block boundary condition of sorts or a poor implementation of interlaced encoding. Whatever the case is, it makes interlaced files CRF > 20 difficult to watch during panning sequences especially.
This may go to explain why the SSIM/PSNR values were more smooth compared to the “average” case and were lower – these errors were not critical to the comparison, but are very temporally evident patterns.
Speaking of interlaced video, it’s a sad fact of life we still have to deal with it due to the storage of old videos, and due to some cameras still recording true interlaced content despite the majority of the world using progressive displays. Apparently H.265 supports interlaced encoding, although there was some confusion. One naive solution that some users may think of is just simply to deinterlace the video first and then encode it. The problem is that you will lose information through deinterlacing – if you’re going 50 fields per second to 25 frames, you’ve lost half the temporal information. If you frame double, then you can keep the temporal resolution but will have to generate the missing field for each frame – computationally intensive and can potentially introduce artifacts. It can also result in a file that is incompatible with many players, and if your motion compensation/prediction algorithm is poor, you might lose sharpness in some areas. I personally prefer to keep each format (progressive / interlaced) in its respective format through to the final display stage where the “best” deinterlacing for the situation can be applied.
However, as it turns out, the difficult case video is a Blu-Ray standard video, but it isn’t native interlaced material at all despite being 29.97fps. It’s 23.976fps that’s gone through a telecine process to make it 29.97fps. Why they would do such a thing, I don’t know, as Blu-ray supports 23.976p natively.
After a week and a bit of encoding and playing around with things, I think there are some rather interesting results.
On the whole, for the average case, x265 showed bitrate of about 59% of that of x264 at the same CRF. The CRF value sensitivity of x265 was slightly higher than x264, being about +/- 5.34 for a doubling/halving rather than +/- 6.05. Synthetically, the corresponding CRF values produced very similar SSIM and PSNR values for both x264 and x265, so the same “rules of thumb” might be applied, although the bitrate saving will vary depending on the specific CRF selected.
Encode rates for x265 were significantly slower than x264, as to be expected, due to the increased computational complexity. However, it seemed that lower CRF values/lower bitrates were much faster to encode on modern hardware (possibly due to better cache use). This wasn’t reflected with my older AMD Phenom II based system (possibly due to difference in instruction set and optimization).
Subjectively speaking, I’d have to say CRF 12 is indistinguishable and CRF 16 is good enough for virtually all cases. For the less discerning, CRF 20 is probably fine for watching, but CRF 24 is beginning to become annoying and CRF 28 is the least that could be considered acceptable. The result seems to be consistent across x264 and x265, although (unexpectedly) the difficult case seemed to tolerate higher CRF values probably as the harsh patterns were not as easily resolved by the eye and noise was less easily seen. As a result, even having a “rule of thumb” CRF can be hard, as it depends on the viewer, viewing equipment, source characteristics and sensitivity to artifacts.
Unfortunately, it seems that the “difficult case” data is really hard to interpret. This appears to be because x265 isn’t very good about handling interlaced content, and by using the “experimental” feature, the output wasn’t quite correct as seen in the subjective viewing. As a result, the synthetic benchmarks may have been reflective of the strange field blending on the edge of blocks resulting in a loss of fidelity that only resolved at fairly high quality values (CRF <=20). As a result, the mature x264 encoder was much more adept at handling interlaced content correctly, and I suppose we should take the difficult case data as being “atypical” and not representative of what properly encoded interlaced H.265 video would be like.
It looks like I’ve got another round of encoding ahead for testing the difficult case – as I discovered that the material was actually 23.976fps pulled up to 29.97fps, I’ll perform an inverse telecine on it and encode the progressive output to see what happens. This time, I’ll use H.264 [email protected] for consistency as well. With any luck, the results might be more consistent with the average case.
Appendix: Table of Datax265 - Average Case CRF Bitrate SSIM SSIM (dB) PSNR (avg) PSNR (min) fps 8 37545 0.995211 23.197804 51.363782 46.213167 0.40533746 12 18834 0.991562 20.737413 48.847675 43.550076 0.571760862 16 8407 0.987726 19.110261 47.004687 39.892563 0.847564601 20 4315 0.983769 17.896507 45.373892 36.558467 1.094531205 24 2504 0.978525 16.680696 43.635386 33.687427 1.289760018 28 1520 0.971082 15.388295 41.74252 30.954092 1.596318738 32 936 0.96073 14.059367 39.729187 28.777697 1.947472064 36 575 0.94749 12.797598 37.674602 27.641992 2.371282528 40 346 0.931052 11.614763 35.585072 27.202179 2.881390385 44 212 0.911069 10.509464 33.549725 26.877077 3.672094814 48 160 0.892783 9.697347 31.99061 26.683718 3.85216387 x265 - Difficult Case CRF Bitrate SSIM SSIM (dB) PSNR (avg) PSNR (min) fps 8 83914 0.994746 22.79505 47.241291 41.851423 0.150718854 12 58270 0.991083 20.4976 44.149468 38.15019 0.161308334 16 39154 0.98508 18.262177 41.057735 34.518727 0.177631469 20 25251 0.97563 16.131467 38.046595 30.987662 0.194232091 24 15318 0.961485 14.143724 35.185632 27.964271 0.093471634 28 8747 0.941772 12.348706 32.649622 25.227401 0.235893164 32 4855 0.915883 10.751168 30.507201 22.978117 0.236307704 36 2633 0.881909 9.277838 28.55634 21.244047 0.246284614 40 1409 0.839445 7.943765 26.817959 20.279733 0.297332074 44 975 0.80528 7.105894 25.836161 19.5892 0.245301742 48 888 0.791061 6.799803 25.438011 16.554815 0.234813531 x264 - Average Case CRF Bitrate SSIM SSIM (dB) PSNR (avg) PSNR (min) fps 8 44940 0.997364 25.791192 53.422334 47.377392 4.026217 12 28489 0.994607 22.681598 50.2906 44.523548 4.934854 16 14837 0.989849 19.934854 47.552363 41.977158 6.346291 20 6964 0.984727 18.16071 45.473564 37.964469 8.55346 24 3795 0.979337 16.848072 43.587039 34.802245 10.813952 28 2325 0.972033 15.53357 41.603963 31.924008 12.368113 32 1509 0.961974 14.199156 39.536816 29.436038 13.44052 36 1022 0.948866 12.912879 37.445484 27.7433 14.142429 40 716 0.932261 1.691611 35.328038 26.990109 14.590753 44 517 0.91127 10.519294 33.163862 25.74449 15.208363 48 380 0.8856 9.415741 30.925813 24.26836 15.538453 x264 - Difficult Case CRF Bitrate SSIM SSIM (dB) PSNR (avg) PSNR (min) fps 8 57884 0.997158 22.334516 45.401482 32.920994 2.67663 12 50338 0.993274 21.722554 44.885056 34.287518 2.975561 16 36946 0.990023 20.0101 42.79283 35.122098 3.339178 20 25202 0.983669 17.86938 39.792569 33.580799 3.86772 24 16425 0.972965 15.680784 36.641095 29.975891 4.494949 28 10094 0.95588 13.553604 33.580348 26.595447 5.168905 32 5977 0.931332 11.632443 30.855433 23.918678 5.85861 36 3589 0.898831 9.949544 28.509815 21.358371 6.376635 40 2264 0.858916 8.505224 26.492488 19.204705 6.556086 44 1511 0.812144 7.261751 24.741765 17.527457 6.636509 48 1040 0.762812 6.24908 23.281506 15.783191 6.661346
This entry was posted in Computing and tagged analysis, compression, computer, number crunching, tested, testing, video. Bookmark the permalink.