x265 1.4 is a regularly scheduled release, mostly focused on performance
enhancements. There was a large refactor of the analysis code that
exposed further parallelism without sacraficing any compression
efficiency. In general, 1.4 should have slightly better compression
efficiency than 1.3 for the same encodes.
The refactors also generally lowered the memory requirements and made
the faster presets more compute efficient.
The two most important new features are --pmode
(parallel mode decision)
and --pme
(parallel motion estimation).
= –pmode : param.bDistributeModeAnalysis =
This feature will distribute mode analysis for each CU. The more modes
that must be analyzed, the more effective this feature becomes. So the
slow preset that enables rectangular prediction modes benefits from
pmode more than medium preset, and the slower modes which use RD level
6 benefit from pmode even more since they enable AMP predictions and
evaluate the rate distortion cost of each mode.
The output of --pmode
encodes should have slightly better compression
than those with it disabled, since certain early outs are impossible
with --pmode
and thus all modes are measured naievely.
= –pme : param.bDistributeMotionEstimation =
This feature will distribute the motion estimations for each CU that has
more than 2 references. Its effectiveness is proportional to the number
of references, but this feature can often be a net performance loss as
the overhead of involving other CPU cores is often more expensive than
the parallelism benefit.
The output of the encode is completely unaffected by --pme
.
Both --pme
and --pmode
are only useful when x265 is otherwise unable to
fully saturate the CPU cores, and both can also at times result in lower
performance on multi-socket machines (depending on the situation) since
we are not yet keeping work localized to neighbor CPUs.
= Also in 1.4 =
As a result of the refactor work, none of the original HM classes or
source files remain in x265.
Temporal motion vector predictions (previously hard-coded to always be
enabled) are now runtime selectable (param.bEnableTemporalMvp) but still
default to being enabled in all presets.
Frame based SAO analysis was removed (frame based SAO signaling did not
make it into the final HEVC spec), and --sao-lcu-bounds=<0|1>
was
renamed to --[no-]sao-non-deblock
(param.bSaoNonDeblocked)
Some inconsistencies in the analysis logic were fixed. --amp
is now
respected in RD levels 2, 3, and 4 (previously only in 5 and 6).
--b-intra
is now respected in all RD levels. --fast-cbf
, which has only
ever effective at RD levels 5 and 6, is no longer enabled uselessly in
the fastest presets.
--weightb
is now enabled by default at presets slower, veryslow, and
placebo.
--cu-lossless
was changed to only attempt a lossless encode of the best
lossy encode method. This made --cu-lossless
a much less expensive
encode option to have enabled, and hopefully made the feature more
robust and maintainable.
The upper threshold for --psy-rdoq
was raised to 50 (from 10) since the
higher values were found to be beneficial for sources with high
frequency noise (film grain).
The default thread pool size logic was updated to account for the
addition of --pmode
and --pme
(if WPP is disabled but --pmode
or --pme
are enabled, a thread pool is still allocated).
In 1.4 there also appeared an incomplete analysis re-use feature. This
will be completed and further improved in the coming weeks.