This live report demonstrates how CK can
help create researchers reproducible and interactive article from reusable components
- GitHub repository with shared artifacts in CK format: reproduce-ck-paper,
reproduce-ck-paper-large-experiments,
ctuning-programs,
ctuning-datasets-min,
ck-analytics,
ck-autotuning,
ck-env
- BIB
- Open archive (ArXiv) with PDF
- Related DATE'16 article (PDF)
- Wiki describing how to reproduce some experiments via CK
- Some reddit discussions
- Partially funded by TETRACOM project
- Related Collective Knowledge infrastructure and repository (CK)
- Related Collective Mind infrastructure and repository (deprecated for CK)
- Extends our previous work: 1,
2,
3,
4,
5
- Supports our open publication model
Abstract
Nowadays, engineers have to develop software often without even
knowing which hardware it will eventually run on in numerous
mobile phones, tablets, desktops, laptops, data centers,
supercomputers and cloud services.
Unfortunately, optimizing compilers are not keeping pace with
ever increasing complexity of ever changing computer systems
anymore and may produce severely underperforming executable codes
while wasting expensive resources and energy.
We present the first to our knowledge practical, collaborative
and publicly available solution to this problem.
We help the software engineering community gradually implement
and share light-weight wrappers around any software piece with
more than one implementation or optimization choice available.
These wrappers are connected with a public
Collective Mind autotuning infrastructure and repository of
knowledge *
to continuously monitor all important characteristics of these pieces
(computational species) across numerous existing hardware
configurations in realistic environments together with randomly
selected optimizations.
At the same time, Collective Mind Node
allows to easily crowdsource time-consuming autotuning across
existing Android-based mobile device including commodity
mobile phones and tables.
Similar to natural sciences, we can now continuously track all
winning solutions (optimizations for a given hardware such
as compiler flags, OpenCL/CUDA/OpenMP/MPI/skeleton parameters,
number of threads and any other exposed by users) that minimize
all costs of a computation (execution time, energy spent, code
size, failures, memory and storage footprint, optimization time,
faults, contentions, inaccuracy and so on) of a given species
on a Pareto frontier along with any unexpected behavior
at c-mind.org/repo.
Furthermore, the community can continuously classify solutions,
prune redundant ones, and correlate them with various features
of software, its inputs (data sets) and used hardware either
manually (similar to Wikipedia) or using available big data
analytics and machine learning techniques.
Our approach can also help computer engineering community create
the first public, realistic, large, diverse, distributed, representative,
and continuously evolving benchmark with related optimization
knowledge while gradually covering all possible software and
hardware to be able to predict best optimizations and improve
compilers depending on usage scenarios and requirements.
Such continuously growing collective knowledge accessible via
simple web service can become an integral part of the practical
software and hardware co-design of self-tuning computer systems
as we demonstrate in several real usage scenarios validated
in industry.
*
Note that we have moved all our developments to a newer, smaller,
simpler and faster version of Collective Mind aka Collective Knowledge or CK.
This open source, BSD-licensed framework with a live repository is available here:
http://github.com/ctuning/ck
and http://cknowledge.org/repo.
Documentation with all examples is also available at http://github.com/ctuning/ck/wiki.
P1
Reproducing adaptive filter experiments (collaboratively finding features which are not in the system to enable run-time adaptation):
CK scripts: reproduce-filter-speedup
CK datasets:
Lenovo X240; Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz; Ubuntu 14.04 64bit; GCC 4.4.4
(CK public repo: all experiments,
compiler description,
all compilers)
|
|
Dataset image-raw-bin-fgg-office-day-gray: |
Dataset image-raw-bin-fgg-office-night-gray: |
Optimization: |
Binary size: |
min time (s); exp time (s); var (%): |
min time (s); exp time (s); var (%): |
-O3 |
10776 |
4.622 ; 4.634 ; 0.7% |
4.630 ; 4.653 ; 1.0% |
-O3 -fno-if-conversion |
10784 |
5.169 ; 5.193 ; 1.0% (Slow down over -O3: 1.12) |
4.091 ; 4.094 ; 0.2% (Speed up over -O3: 1.14) |
-O2 |
10168 |
4.631 ; 4.754 ; 10.2% |
4.623 ; 4.639 ; 0.7% |
-O1 |
10152 |
4.621 ; 4.633 ; 0.8% |
4.623 ; 4.685 ; 3.6% |
-Os |
9744 |
4.668 ; 4.678 ; 0.6% |
4.666 ; 4.685 ; 0.9% |
Note: CK allows the community to continue validating above results and share unexpected behavior in public cknowledge.org/repo
here. See some
of such shared results below:
Lenovo X240; Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz; Ubuntu 14.04 64bit; GCC 4.9.1
(CK public repo: all experiments,
compiler description,
all compilers)
|
|
Dataset image-raw-bin-fgg-office-day-gray: |
Dataset image-raw-bin-fgg-office-night-gray: |
Optimization: |
Binary size: |
min time (s); exp time (s); var (%): |
min time (s); exp time (s); var (%): |
-O3 |
11008 |
4.619 ; 4.630 ; 0.6% |
4.603 ; 4.628 ; 1.0% (slower than GCC 4.4.4 -O3 -fno-if-conversion) |
-O3 -fno-if-conversion |
11008 |
4.615 ; 4.625 ; 0.6% |
4.624 ; 4.628 ; 0.3% (slower than GCC 4.4.4 -O3 -fno-if-conversion) |
-O2 |
10880 |
4.632 ; 4.635 ; 0.1% |
4.602 ; 4.647 ; 2.5% |
-O1 |
10360 |
4.625 ; 4.637 ; 0.7% |
4.630 ; 4.654 ; 1.8% |
-Os |
10376 |
4.635 ; 4.653 ; 0.8% |
4.630 ; 4.652 ; 0.8% |
Lenovo X240; Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz; Ubuntu 14.04 64bit; GCC 5.2.0
(CK public repo: all experiments,
compiler description,
all compilers)
|
|
Dataset image-raw-bin-fgg-office-day-gray: |
Dataset image-raw-bin-fgg-office-night-gray: |
Optimization: |
Binary size: |
min time (s); exp time (s); var (%): |
min time (s); exp time (s); var (%): |
-O3 |
10776 |
4.622 ; 4.632 ; 0.8% |
4.630 ; 4.631 ; 0.2%(slower than GCC 4.4.4 -O3 -fno-if-conversion) |
-O3 -fno-if-conversion |
10776 |
4.629 ; 4.649 ; 0.8% |
4.610 ; 4.626 ; 1.1%(slower than GCC 4.4.4 -O3 -fno-if-conversion) |
-O2 |
10568 |
4.599 ; 4.610 ; 0.6% |
4.597 ; 4.603 ; 0.4% |
-O1 |
10032 |
4.613 ; 4.616 ; 0.2% |
4.605 ; 4.616 ; 0.8% |
-Os |
10088 |
4.609 ; 4.630 ; 1.4% |
4.608 ; 4.615 ; 0.4% |
Just for comparison during crowd-benchmarking: Samsung Chromebook 2; Samsung EXYNOS5; ARM Cortex A15/A7; ARM Mali-T628; Ubuntu 12.04 32bit; GCC 4.9.2
(CK public repo: all experiments,
compiler description,
all compilers)
|
|
Dataset image-raw-bin-fgg-office-day-gray: |
Dataset image-raw-bin-fgg-office-night-gray: |
Optimization: |
Binary size: |
min time (s); exp time (s); var (%): |
min time (s); exp time (s); var (%): |
-O3 |
7416 |
7.396 ; 7.513 ; 2.9% |
7.390 ; 7.464 ; 2.6% |
-O3 -fno-if-conversion |
7424 |
7.345 ; 7.455 ; 3.8% |
7.384 ; 7.490 ; 2.6% |
-O2 |
7100 |
7.398 ; 7.926 ; 39.2% |
7.450 ; 7.514 ; 2.3% |
-O1 |
7072 |
7.404 ; 7.444 ; 1.4% |
7.389 ; 7.443 ; 2.9% |
-Os |
6292 |
7.367 ; 7.409 ; 1.5% |
7.375 ; 7.479 ; 2.7% |
Above results support our wrapper based approach (computational species) across various
most-time consuming kernels and libraries combined with exposed features and automatically
built and continuously refined decision trees via CK as described in this paper and
its first part.
They also demonstrate that it is possible to use CK to balance decision tree's prediction
rate (accuracy) versus size and speed to ensure that the ultimate decision tree
implemented in C and embedded in a kernel wrapper is fast and compact enough.
Note: we are gradually converting all the code and data related to this paper
from the deprecated Collective Mind Format to the new Collective Knowledge Framework.