We are proud to deliver the new version of our Collective Knowledge Technology v3 with the open-source MLCommons CM automation language, CK playground and modular inference library (MIL) that became the 1st and only workflow automation enabling mass submission of more than 12000 performance results in a single MLPerf inference submission round with more than 1900 power results across more than 120 different system configurations from different vendors (different implementations, all reference models and support for DeepSparse Zoo, Hugging Face Hub and BERT pruners from the NeurIPS paper, main frameworks and diverse software/hardware stacks) in both open and closed divisions!
See related HPC Wire article for more details about cTuning and our CK/CM technology.
I deeply believe in the power of collaborative research, reproducible experiments, knowledge sharing, open science and open source to solve the world's most challenging problems! That's why I have developed an open-source Collective Knowledge Technology and donated it to MLCommons (50+ AI/ML organizations) in 2021 to help the community enable collaborative and reproducible R&D and automatically co-design efficient AI/ML systems.
From 2023, the 3rd generation of the Collective Knowledge technology is powered by a unified, technology-agonistic and human-friendly interface (Collective Mind automation language) that Arjun Suresh and I are developing in collaboration with MLCommons to access, automate, manage, modularize, reuse and run any Git project, benchmark, application, AI/ML model, tool, script and experiment while automatically adapting to any software, hardware and data.
You can learn more about my open science quest, passions and related projects with IBM, Intel, General Motors, Arm, Nvidia, Amazon, OctoML, MLCommons, HiPEAC, ACM, IEEE and other collaborators from my keynote at ACM REP'23, ACM TechTalk'21, journal article in Philosophical Transactions of the Royal Society'21 and my reproducibility initiatives at ML and Systems conferences since 2014. Feel free to reach me via our Discord server and connect at LinkedIn.
My current activities:
|2023-cur.:||Developed a prototype of the Collective Knowledge playground to collaboratively benchmark and optimize AI, ML and other emerging applications in an automated and reproducible way via open challenges.|
Developed a prototype of the cKnowledge.io to organize all knowledge
about AI, ML, systems, and other innovative technology from my academic and industrial partners
in the form of portable CK workflows, automation actions, and reusable artifacts.
I use it to automate co-design and comparison of efficient AI/Ml/SW/HW stacks
from data centers and supercomputers to mobile phones and edge devices
in terms of speed, accuracy, energy, and various costs.
I also use this platform to help organizations reproduce innovative AI, ML, and systems techniques from research papers
and accelerate their adoption in production.
I collaborate with MLPerf.org to automate and simplify ML&systems benchmarking
and fair comparison based on the CK concept and DevOps/MLOps principles.
Enhanced and stabilized all main CK components
(software detection, package installation, benchmarking pipeline, autotuning, reproducible experiments, visualization)
successfully used by dividiti to automate MLPerf benchmark submissions.
Developed CK workflows
and live dashboards for
the 1st open ACM REQUEST tournament
to co-design Pareto-efficient SW/HW stacks for ML and AI in terms of speed, accuracy, energy, and costs.
We later reused this CK functionality to automate MLPerf submissions.
I used the following technologies: CK; LLVM/GCC/iCC; ImageNet; MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD, and AlexNet; MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, and NNVM; Xilinx Pynq-Z1 FPGA/Arm Cortex CPUs/Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399)/a farm of Raspberry Pi devices/NVIDIA Jetson TX2/Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure .
Developed an example of the autogenerated and reproducible paper
with a Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques
(collaboration with the Raspberry Pi foundation).
I used the following technologies: Linux/Windows; LLVM/GCC; CK; C/C++/Fortran; MILEPOST GCC code features/hardware counters; DNN (TensorFlow)/KNN/SVM/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning .
Developed the Collective Knowledge framework (CK)
to help the community
automate typical tasks in ML&systems R&D,
provide a common format, APIs, and meta descriptions for shared research projects,
enable portable workflows,
and improve the reproducibility and reusability in computational research.
We now use it to automate benchmarking, optimization and co-design of AI/ML/SW/HW stacks
in terms of speed, accuracy, energy and other costs across diverse platforms
from data centers to edge devices.
I used the following technologies: Linux/Windows/Android/Edge devices; Python/C/C++/Java; ICC/GCC/LLVM; JSON/REST API; DevOps; plugins; apache2; Azure cloud; client/server architecture; noSQL database (ElasticSearch); GitHub/GitLab/BitBucket; Travis CI/AppVeyor CI; main math libraries, DNN frameworks, models, and datasets .
|2012-2014:||Prototyped the Collective Mind framework - prequel to CK. I focused on web services but it turned out that my users wanted basic CLI-based framework. This feedback motivated me to develop a simple CLI-based CK framework.||2010-2011:||Helped to create KDataSets (1000 data sets for CPU benchmarks) (PLDI paper, repo).||2008-2010:||
Developed the Machine learning based self-optimizing compiler connected with cTuning.org
in collaboration with IBM, Arc (Synopsys), Inria, and the University of Edinburgh. This technology is considered to be
the first in the world;
I used the following technologies: Linux; GCC; C/C++/Fortran/Prolog; semantic features/hardware counters; KNN/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning; plugins; client/server architecture .
|2008-2009:||Added the function cloning process to GCC to enable run-time adaptation for statically-compiled programs (report).||2008-2009:||Developed the interactive compilation interface now available in mainline GCC (collaboration with Google and Mozilla).||2008-cur.:||
Developed the cTuning.org portal
to crowdsource training of ML-based MILEPOST compiler
and automate SW/HW co-design similar to SETI@home. See press-releases from IBM
and Fujitsu about my cTuning concept.
I used the following technologies: Linux/Windows; MediaWiki; MySQL; C/C++/Fortran/Java; MILEPOST GCC; PHP; apache2; client/server architecture; KNN/SVM/decision trees; plugins .
|2009-2010:||Created cBench (collaborative CPU benchmark to support autotuning R&D) and connected it with my cTuning infrastructure from the MILEPOST project.||2005-2009:||Created MiDataSets - multiple datasets for MiBench (20+ datasets per benchmark; 400 in total) to support autotuning R&D.||1999-2004:||
Developed a collaborative infrastructure to autotune HPC workloads (Edinburgh Optimization Software) for the EU MHAOTEU project.
I used the following technologies: Linux/Windows; Java/C/C++/Fortran; Java-based GUI; client/server infrastructure with plugins to integrate autotuning/benchmarking tools and techniques from other partners .
Developed a polyhedral source-to-source compiler for memory hierarchy optimization in HPC used in the EU MHAOTEU project.
I used the following technologies: C++; GCC/SUIF/POLARIS .
Developed a web-based service to automate the submission and execution of tasks to supercomputers via Internet used in the Russian Academy of Sciences.
I used the following technologies: Linux/Windows; apache/IIS; MySQL; C/C++/Fortran/Visual Basic; MPI; Cray T3D .
Developed an analog semiconductor neural network accelerator (Hopfield architecture).
My R&D tasks included the NN design, simulation, development of an electronic board connected with a PC to experiment with semiconductor NN, data set preparation, training, benchmarking, and optimization of this NN.
I used the following technologies: MS-DOS/Windows/Linux; C/C++/assembler for NN implementation; MPI for distributed training; PSpice for electronic circuit simulation; ADC, DAC, and LPT to measure semiconductor NN and communicate with a PC; Visual Basic to visualize experiments .
Developed and sold software to automate financial operations in SMEs.
I used the following technologies: MS-DOS; Turbo C/C++; assembler for printer/video drivers; my own library for Windows management .