Summary of Papers Read in 2022

Book Last year was bad in terms of paper reading. I read just 18 new papers that are worse mentioning. There is a positive shift though - I started to summarize all the papers I read in BibTeX-based catalogue. A really short summary of all the papers organized by research topics is provided below.

Machine Learning

  • Raghu Prabhakar et al. “Plasticine: A Reconfigurable Architecture For Parallel Patterns”. In: Proceedings of the 44th Annual International Symposium on Computer Architecture. ISCA ’17. Toronto, ON, Canada: Association for Computing Machinery, 2017, pp. 389– 402. isbn: 9781450348928. doi: 10.1145/3079856.3080256:

    The paper presents plasticine - a new reconfigurable hardware architecture designed specifically to facilitate machine learning requirements. Plasticine system consists of pattern compute units, pattern memory units and switches that construct a two dimensional array of the units. Address generation units allow access to off-chip memory. The system is reconfigurable thus allowing the same hardware to be used for both inference and training as well as for training of different models. The paper provide plasticine comparison with FPGA. The architecture is claimed to be both more power and performance efficient. The paper also proposes parallel pattern programming model that is used for programming the hardware system.

  • Yuanzhong Xu et al. “Automatic Cross-Replica Sharding of Weight Update in Data- Parallel Training”. In: arXiv e-prints, arXiv:2004.13336 (Apr. 2020), arXiv:2004.13336. arXiv: 2004.13336 [cs.DC]:

    The paper describes an optimization to increase parallelism of deep neural networks through sharding of the weight update computations using static analysis and transformations on the training graph. Substantial speedup is achieved on image and language models on Google TPU. MLPerf is used as the benchmark.

Processors

  • Bob Wheeler. “WD Rolls its Own RISC-V Core”. In: Microprocessor Report (Feb. 2019):

    Western Digital builds its own RISC-V processor. The paper claimed that WD achieved 40% performance, 30% power, and 25% area improvements over the licensed core they used before.

Simulators

  • J. Emer et al. “Asim: a performance model framework”. In: Computer 35.2 (2002), pp. 68–76. doi: 10.1109/2.982918.:

    The paper presents basic performance modeling approach with separation of functional and timing simulation using ports. Perfect reference for “Fundamentals of Full-Platform Simulation”.

  • C.J. Hughes et al. “Rsim: simulating shared-memory multiprocessors with ILP processors”. In: Computer 35.2 (2002), pp. 40–49. doi: 10.1109/2.982915:

    The paper presents RSIM - micro architectural processor simulation framework initially focused on shared-memory systems. The paper shows that modeling of instruction level parallelism studies is particularly important for shared-memory studies. The paper has an important side-note “Performing Valid Studies with Unvalidated Simulators” that argues that a simulator doesn’t have to be 100% precise to be useful.

  • Todd Austin, Eric Larson, and Dan Ernst. “SimpleScalar: An Infrastructure for Computer System Modeling”. In: Computer 35.2 (Feb. 2002), pp. 59–67. issn: 0018-9162. doi: 10.1109/2.982917. url: https://doi.org/10.1109/2.982917:

    SimpleScalar - an infrastructure and framework for cycle-accurate, functional and application level simulation is described in the paper. The infrastructure provides a variety of simulation modes raging speed from 0.3 to 7 MIPS.

  • Lukas Junger et al. “ARM-on-ARM: Leveraging Virtualization Extensions for Fast Virtual Platforms”. In: Proceedings of the 23rd Conference on Design, Automation and Test in Europe. DATE ’20. Grenoble, France: EDA Consortium, 2020, pp. 1508–1513. isbn: 9783981926347:

    The paper presents a KVM-based direct execution for ARM integrated with Synopsis Virtualizer. Typical “trap-and-emulate” approach. The most interesting problems like accurate instruction accounting and limiting of amount instructions executed is not solved.

  • Peter S. Magnusson et al. “Simics: A Full System Simulation Platform”. In: Computer 35.2 (Feb. 2002), pp. 50–58. issn: 0018-9162. doi: 10.1109/2.982916. url: https: //doi.org/10.1109/2.982916:

    The paper presents an overview of Virtutech (at that point in time) Simics simulator. Not much to say here in this case. To some extend it feels like not much has changes but that’s of course not true.

  • James Kenney, Simon Davidmann, and Larry Lapides. “Parallel Simulation Accelerates Embedded Software Development, Debug and Test”. In: 2015:

    The paper presents a basic algorithm for multiprocessor target on multiprocessor host simulation. Another good reference for that “Fundamentals of Full-Platform Simulation”.

  • A. Nohl et al. “A universal technique for fast and flexible instruction-set architecture simulation”. In: Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324). 2002, pp. 22–27. doi: 10.1109/DAC.2002.1012588:

    The paper presents single instruction JIT compilation approach. The paper presents a comparison with old compiled-time simulation approach - I have not heard about other use-cases of this method so far. The approach proposed by the paper looks more like cached interpretation to me comparing to contemporary techniques. The compilation is done on per instruction basis. Instruction word is compared with the cached one to catch code modifications. The technique is sounds useful for processors without memory protection support where it is impossible to intercept the moment of code modification.

  • Ola Dahl et al. “Intelligent Virtual Platforms”. In: DVcon 2018. Ericsson AB. 2018:

    Memory and thread usage analysis for software running inside a virtual platform. Such builtin tooling allows memory analysis of low-level software as well as completely non-intrusive memory usage monitoring - typical feature of a virtual platform.

Compilers

  • Matt Godbolt. “Optimizations in C++ Compilers”. In: Commun. ACM 63.2 (Jan. 2020), pp. 41–49. issn: 0001-0782. doi: 10.1145/3369754. url: https://doi.org/ 10.1145/3369754:

    An amazing overview of basic but demonstrative optimizations performed by a modern C/C++ compiler. The paper would be a really great starting point for those first entering the compiler world like students.

Embedded Systems

  • Jakob Engblom. “Processor Pipelines and Static Worst-Case Execution Time Analysis.” PhD thesis. Uppsala University, Sweden, 2002:

    PhD thesis by my previous colleague Jakob Engblom focused on studying of worst case execution time analysis for embedded systems. I read it to compare PhD structure and level of Uppsala University and MIPT. Of course one example is too little to judge but so for it looks rather similar.

Fun

  • B. Colwell. “On computers and rock ’n’ roll”. In: Computer 35.2 (2002), pp. 9–11. doi: 10.1109/MC.2002.982907:

    Really fun connection between texts of several popular songs and computer basics. Absolutely recommended reading for technical and ,music nerds to have some fun.

Performance

  • Samuel Williams, Andrew Waterman, and David Patterson. “Roofline: An Insightful Visual Performance Model for Multicore Architectures”. In: Commun. ACM 52.4 (Apr. 2009), pp. 65–76. issn: 0001-0782. doi: 10 . 1145 / 1498765 . 1498785. url: https : //doi.org/10.1145/1498765.1498785:

    The paper presents roofline - a way to determine maximum performance of a particular system together with few ideas on how to decide what optimizations make sense to do and when. Roofline provides rather useful illustration of maximum theoretical performance of a particular system. Roofline model gives insights where an applications performance is located comparing to the theoretical maximum and suggest what steps could be performed to move closer to the maximum. The model is useful for both software and hardware developers to determine what improvements could be done to a system.

Software Philosophy

  • David Lorge Parnas and Paul C. Clements. “A rational design process: How and why to fake it”. In: IEEE Transactions on Software Engineering SE-12.2 (1986), pp. 251–257. doi: 10.1109/TSE.1986.6312940:

    Discussion on what a perfect design process show look like. The authors suggest to keep a design document together with code and update the document simultaneously with code changes. In my opinion, it is a lot better to keep a clean and understandable design of the code itself. I prefer all hacks and limitations to be commented in the code thus it is more likely that they will be updated or removed when the code changes. What I totally agree about with the authors is that all input, i.e. requirements and customer requests needs to be documented, tracked and updated as the directly influence all the rest.

  • Jack Horner and John Symons. “Understanding Error Rates in Software Engineering: Conceptual, Empirical, and Experimental Approaches”. In: Philosophy & Technology 32 (June 2019). doi: 10.1007/s13347-019-00342-1:

    Philosophical discussions about causes of errors in software. The best part of the article: “various kinds of error in most non-trivial software projects occur at the empirically observed average rate of about one error per hundred lines of code”. “Line” in the article is a “statement”, not an actual line, i.e. comments and empty lines etc. are not considered.

Virtual Machines

  • Thanasis Petsas et al. “Rage against the Virtual Machine: Hindering Dynamic Analysis of Android Malware”. In: Proceedings of the Seventh European Workshop on System Security. EuroSec ’14. Amsterdam, The Netherlands: Association for Computing Machinery, 2014. isbn: 9781450327152. doi: 10.1145/2592791.2592796. url: https://doi.org/ 10.1145/2592791.2592796:

    A set of techniques to detect a virtualized Android environment using basic phone identification information, sensor data and timing of processor instruction execution. It was surprisingly easy to do and the most obvious things to me like timing or undefined/unspecified behavior are not even used by the authors. My personal favorite from the paper is QEMU scheduling detecting relying on QEMU internal optimization that avoid increasing virtual PC on every cycle.

Written on January 1, 2023.