Grigori Fursin

Founder of the MLCommons taskforce on automation and reproducibility, cTuning foundation and cKnowledge Ltd developing Collective Knowledge Playground;   author of the open-source CK/CM technology adopted by MLCommons (50+ companies, startups and non-profits);   organizer of reproducible optimization challenges and tournaments with ACM and IEEE;   active open-source contributor and reproducibility champion.  

In the past: vice president of MLOps at OctoML;   founder and chief architect of (acquired by OctoML);   founder in residence at Enterpreneur First;    co-director of Intel Exascale Lab;   senior tenured scientist at INRIA/CNRS;   research associate at the University of Edinburgh;    author of the Artifact Evaluation and Reproducibility checklist for ACM conferences;   recipient of the European technology transfer award, ACM CGO test of time award and INRIA award of scientific excellence for the world's first Machine Learning based self-optimizing compiler.

LinkedIn    Google scholar    Twitter    GitHub    Contact    Discord

I am a computer scientist, engineer, entrepreneur, software engineer, educator, lifelong learner and adventurer. My passion is to help researchers, engineers and entrepreneurs bring their ideas to the real world in the fastest and most efficient way while slashing their development and operational costs! I have developed the open-source CK technology to empower everyone to quickly validate their ideas in an automated and reproducible way across diverse and rapidly evolving AI/ML models, data, software and hardware from the cloud to mobile and tiny devices.

I am honored that my technology and expertise has helped the community and many companies to automate the development and optimization of ultra-efficient AI and ML applications. For example, it has helped to automate, unify and reproduce more than 80% of recent MLPerf inference benchmark submissions (and 98% of power results) with very diverse technology from Neural Magic, Qualcomm, Krai, DELL, HPE, Lenovo, Hugging Face, Nvidia, AMD, Intel and Apple across diverse CPUs, GPUs and DSPs with PyTorch, ONNX, QAIC, TF/TFLite, TVM and TensorRT using popular cloud providers (GCP, AWS, Azure) and individual servers and edge devices provided by our volunteers and contributors.

Following this success, I helped to establish an open MLCommons taskforce on automation and reproducibility to develop Collective Knowledge Playground - a free, open-source and technology-agnostic platform for collaborative benchmarking, optimization and comparison of AI and ML Systems in terms of cost, performance, power consumption, accuracy, size and other metrics via open and reproducible challenges.

My goal and vision is to let everyone automatically generate the most efficient, reproducible and deployable full-stack AI/ML applications using the most suitable software/hardware stacks at any given time (model, framework, inference engine and any other related dependency) based on their requirements and constraints including costs, throughput, latency, power consumption, accuracy, target devices (cloud/edge/mobile/tiny), environment and data. I am also a big proponent of open science and I regularly organize reproducibility initiatives at ML and Systems conferences.

If you are interested to discuss or support our community projects, please feel free to join our Discord server or get in touch! Looking forward to hearing from you!

My news:
  • 2023 April 3: I am excited to announce that I am leading the development of the MLCommons Collective Knowledge Playground - a free, open-source and technology-agnostic platform to collaboratively benchmark, optimize and compare AI and ML Systems in terms of cost, performance, power consumption, accuracy, size and other metrics via open and reproducible challenges.
  • 2023 Feb 16: New alpha CK2/CM GUI to visualize all MLPerf results is available here.
  • 2023 Jan 30: New alpha CK2/CM GUI to run MLPerf inference is available here.
  • 2022 November: I am very glad to see that our new CK2 automation meta-framework (CM) was successfully used at the Student Cluster Competition'22 to make it easier to prepare and run the MLPerf inference benchmark just under 1 hour. If you have 20 minutes, please check this tutorial to reproduce results yourself ;) !
  • 2022 September: I am very excited to announce the release of the MLCommons Collective Mind toolkit v1.0.1 - the next generation of the MLCommons Collective Knowledge framework. It is being developed by the public workgroup after I have donated CK to MLCommons last year. We are very glad to see that more than 80% of all performance results and more than 95% of all power results were automated by the MLCommons CK v2.6.1 in the latest MLPerf inference round thanks to submissions from Qualcomm, Krai, Dell, HPE and Lenovo!
  • 2022 July: We have pre-released CK2(CM) portable automation scripts for MLOps and DevOps:
  • 2022 March: I am very excited to announce the development of the CM framework (aka CK2) based on the community feedback - join our collaborative effort!
  • 2022 February: We've successfully completed the artifact evaluation at ASPLOS'22!
  • 2021 October: My Collective Knowledge framework became an official MLCommons project! I am looking forward to work with the community to make it easier to benchmark and co-design efficient ML Systems across continuously changing hardware, software, models and data sets!
  • 2021 April: I am excited to join as a VP of MLOps and work with a fantastic team to automate development, optimization and deployment of efficient ML Systems (speed, accuracy, energy, size and costs) from the cloud to the edge that can help to solve real world problems.
  • 2021 March: My ACM TechTalk about "reproducing 150 Research Papers and Testing Them in the Real World" is available on the ACM YouTube channel.
  • 2021 March: The report from the "Workflows Community Summit: Bringing the Scientific Workflows Community Together" is now available in ArXiv.
  • 2021 March: My paper about the CK technology has appeared in the Philosophical Transactions A, the world's longest-running journal where Newton published: DOI, ArXiv.
  • 2020 December: I am honored to join MLCommons as a founding member to accelerate machine learning and systems innovation along with 50+ leading companies and universities: press-release.
  • 2020 December: We are organizing artifact evaluation at ACM ASPLOS'21.
  • 2020 November: The overview of the CK project was accepted for the Philosophical Transactions of the Royal Society: peer-reviewed preprint.
  • 2020 October: My CK framework helped to automate and reproduce many MLPerf benchmark v0.7 inference submissions: see shared CK solutions and CK dashboards to automate SW/HW co-design for edge devices.
  • 2020 September: My Reddit discussion about our painful experience reproducing ML and systems papers during artifact evaluation.

My academic research (tenured research scientist at INRIA with PhD in CS from the University of Edinburgh)
  • I was among the first researchers to combine machine learning, autotuning and knowledge sharing to automate and accelerate the development of efficient software and hardware by several orders of magnitude (Google scholar);
  • developed open-source tools and started educational initiatives (ACM, Raspberry Pi foundation) to bring this research to the real world (see use cases);
  • prepared and tought M.S. course at Paris-Saclay University on using ML to co-design efficient software and hardare (self-optimizing computing systems);
  • gave 100+ invited research talks;
  • honored to receive the ACM CGO test of time award, several best papers awards and INRIA award of scientific excellence.
Project management, system design and consulting (collaboration with MLCommons, IBM, Intel, Arm, Synopsys, Google, Mozilla, General Motors)
  • leading the development of the MLCommons CK playground to co-design efficient ML and AI Systems via reproducible optimization challenges and tournaments;
  • led the development of the world's first ML-based compiler and the platform across 5 teams to automate and crowdsource optimization of computer systems (IBM and Fujitsu press-releases; invitation to help establish Intel Exascale Lab and lead SW/HW co-design group);
  • developed a compiler plugin framework that was added to the mainline GCC powering all Linux-based computers and helped to convert production compilers into research toolsets for machine learning;
  • developed the Collective Knowledge framework to automate and accelerate design space exploration of AI/ML/SW/HW stacks while balancing speed, accuracy, energy and costs; CK helped to automate most of MLPerf inference benchmark submissions for edge devices as mentioned by Forbes, ZDNet and EETimes;
  • co-founded an engineering company (dividiti) and led it to $1M+ in revenue with Fortune 50 customers using my CK technology; donated CK technology to MLCommons in 2021;
  • founded cKnowledge Ltd;
  • founded and developed the platform acquired by;
  • was selected for the 2nd Enterprenuer First cohourt in Paris to learn how to create startups and avoid numerous pitfalls.
Community service (collaboration with MLCommons, ACM and the Raspberry Pi foundation)

Professional Career

Community service



Main scientific and community contributions

Professional memberships


Main software developments and technology used

I used Streamlit; PyTorch/ONNX/TF/TFLite/TVM; Nvidia/Intel/AMD/Qualcomm/DSPs; CK2/CM automation; Python; MLPerf benchmarks .
2023-cur.: Developed a prototype of the Collective Knowledge playground to collaboratively benchmark and optimize AI, ML and other emerging applications in an automated and reproducible way via open challenges.
2020-cur.: Developed a prototype of the to organize all knowledge about AI, ML, systems, and other innovative technology from my academic and industrial partners in the form of portable CK workflows, automation actions, and reusable artifacts. I use it to automate co-design and comparison of efficient AI/Ml/SW/HW stacks from data centers and supercomputers to mobile phones and edge devices in terms of speed, accuracy, energy, and various costs. I also use this platform to help organizations reproduce innovative AI, ML, and systems techniques from research papers and accelerate their adoption in production. I collaborate with to automate and simplify ML&systems benchmarking and fair comparison based on the CK concept and DevOps/MLOps principles.
I used the following technologies: Linux/Windows/Android; Python/JavaScript/CK; apache2; flask/django; ElasticSearch; GitHub/GitLab/BitBucket; REST JSON API; Travis CI/AppVeyor CI; DevOps; CK-based knowledge graph database; TensorFlow; Azure/AWS/Google cloud/IBM cloud .
2018-cur.: Enhanced and stabilized all main CK components (software detection, package installation, benchmarking pipeline, autotuning, reproducible experiments, visualization) successfully used by dividiti to automate MLPerf benchmark submissions.
I used the following technologies: Linux/Windows/Android; CK/Python/JavaScript/C/C++; statistical analysis; MatPlotLib/numpy/pandas/jupyter notebooks; GCC/LLVM; TensorFlow/PyTorch; Main AI algorithms, models and data sets for image detection and object classification; Azure/AWS/Google cloud/IBM cloud; mobile phones/edge devices/servers; Nvidia GPU/EdgeTPU/x86/Arm architectures .
2017-2018: Developed CK workflows and live dashboards for the 1st open ACM REQUEST tournament to co-design Pareto-efficient SW/HW stacks for ML and AI in terms of speed, accuracy, energy, and costs. We later reused this CK functionality to automate MLPerf submissions.
I used the following technologies: CK; LLVM/GCC/iCC; ImageNet; MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD, and AlexNet; MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, and NNVM; Xilinx Pynq-Z1 FPGA/Arm Cortex CPUs/Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399)/a farm of Raspberry Pi devices/NVIDIA Jetson TX2/Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure .
2017-2018: Developed an example of the autogenerated and reproducible paper with a Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques (collaboration with the Raspberry Pi foundation).
I used the following technologies: Linux/Windows; LLVM/GCC; CK; C/C++/Fortran; MILEPOST GCC code features/hardware counters; DNN (TensorFlow)/KNN/SVM/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning .
2015-cur.: Developed the Collective Knowledge framework (CK) to help the community automate typical tasks in ML&systems R&D, provide a common format, APIs, and meta descriptions for shared research projects, enable portable workflows, and improve the reproducibility and reusability in computational research. We now use it to automate benchmarking, optimization and co-design of AI/ML/SW/HW stacks in terms of speed, accuracy, energy and other costs across diverse platforms from data centers to edge devices.
I used the following technologies: Linux/Windows/Android/Edge devices; Python/C/C++/Java; ICC/GCC/LLVM; JSON/REST API; DevOps; plugins; apache2; Azure cloud; client/server architecture; noSQL database (ElasticSearch); GitHub/GitLab/BitBucket; Travis CI/AppVeyor CI; main math libraries, DNN frameworks, models, and datasets .
2012-2014: Prototyped the Collective Mind framework - prequel to CK. I focused on web services but it turned out that my users wanted basic CLI-based framework. This feedback motivated me to develop a simple CLI-based CK framework.
2010-2011: Helped to create KDataSets (1000 data sets for CPU benchmarks) (PLDI paper, repo).
2008-2010: Developed the Machine learning based self-optimizing compiler connected with in collaboration with IBM, Arc (Synopsys), Inria, and the University of Edinburgh. This technology is considered to be the first in the world;
I used the following technologies: Linux; GCC; C/C++/Fortran/Prolog; semantic features/hardware counters; KNN/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning; plugins; client/server architecture .
2008-2009: Added the function cloning process to GCC to enable run-time adaptation for statically-compiled programs (report).
2008-2009: Developed the interactive compilation interface now available in mainline GCC (collaboration with Google and Mozilla).
2008-cur.: Developed the portal to crowdsource training of ML-based MILEPOST compiler and automate SW/HW co-design similar to SETI@home. See press-releases from IBM and Fujitsu about my cTuning concept.
I used the following technologies: Linux/Windows; MediaWiki; MySQL; C/C++/Fortran/Java; MILEPOST GCC; PHP; apache2; client/server architecture; KNN/SVM/decision trees; plugins .
2009-2010: Created cBench (collaborative CPU benchmark to support autotuning R&D) and connected it with my cTuning infrastructure from the MILEPOST project.
2005-2009: Created MiDataSets - multiple datasets for MiBench (20+ datasets per benchmark; 400 in total) to support autotuning R&D.
1999-2004: Developed a collaborative infrastructure to autotune HPC workloads (Edinburgh Optimization Software) for the EU MHAOTEU project.
I used the following technologies: Linux/Windows; Java/C/C++/Fortran; Java-based GUI; client/server infrastructure with plugins to integrate autotuning/benchmarking tools and techniques from other partners .
1999-2001: Developed a polyhedral source-to-source compiler for memory hierarchy optimization in HPC used in the EU MHAOTEU project.
I used the following technologies: C++; GCC/SUIF/POLARIS .
1998-1999: Developed a web-based service to automate the submission and execution of tasks to supercomputers via Internet used in the Russian Academy of Sciences.
I used the following technologies: Linux/Windows; apache/IIS; MySQL; C/C++/Fortran/Visual Basic; MPI; Cray T3D .
1993-1998: Developed an analog semiconductor neural network accelerator (Hopfield architecture). My R&D tasks included the NN design, simulation, development of an electronic board connected with a PC to experiment with semiconductor NN, data set preparation, training, benchmarking, and optimization of this NN.
I used the following technologies: MS-DOS/Windows/Linux; C/C++/assembler for NN implementation; MPI for distributed training; PSpice for electronic circuit simulation; ADC, DAC, and LPT to measure semiconductor NN and communicate with a PC; Visual Basic to visualize experiments .
1991-1993: Developed and sold software to automate financial operations in SMEs.
I used the following technologies: MS-DOS; Turbo C/C++; assembler for printer/video drivers; my own library for Windows management .

My favorite story about Ernest Rutherford and Niels Bohr