Reza Baharani

AI/ML and Edge/IoT Systems Engineer ยท reza(at)baharani.info

With a fusion of skills in custom hardware design and deep learning, I bring unique expertise to the development of power-efficient solutions for edge devices. Proficient in High-Level Synthesis (HLS), Hardware Description Languages (HDL), and scalable modular architecture on FPGAs and ASIC, I also excel in real-time AI production, including complete development and verification cycles. As a Deep Learning Engineer, I specialize in crafting advanced neural network architectures such as convolutional, recurrent, and transformer structures for applications in computer vision, time series, and text. I've mastered handling large-scale datasets and parallel training on high-end GPU servers, applying cutting-edge techniques like AI HW/SW co-acceleration, quantization, knowledge distillation, and pruning. My commitment to continuous innovation and eagerness to explore new technologies fuels my dedication to staying at the forefront of AI and machine learning advancements, ensuring growth and excellence in all I pursue.


Experience

Scientific Researcher

TeCSAR Lab.

Developing a self-supervised training framework tailored for transformer-based architectures in the realm of computer vision, with a focus on enhancing contextual understanding in 2/3-D pose estimation tasks.

  • Developed a discrete Variational Auto-Encoder (dVAE) to accurately capture the dynamics of pose movement over time in a discrete latent space, enabling effective classification of pose movements.
  • Training transformer-based models, including those based on ViT and BERT-like architectures, through generative and contrastive self-supervised approaches. This involves masked input frame generation and frame sequence verification.

Engaging in MLIR (Multi-Level Intermediate Representation) projects to lower machine learning models for custom hardware designed on FPGA as a potential target platform.

  • Examining and evaluating open-source initiatives like CIRCT alongside pre-existing MLIR batteries, such as torch/tensorflow-mlir.
Oct 2023 - Present

Lead Edge/IoT Deep Learning Engineer

ForesightCares Inc.

Led a smartphone software development team in leveraging AI and 3D pose estimation to assess and minimize fall risk and cognitive impairment in older adults, achieving performance up to 20 FPS on the device SoC. Demo

  • Large scale parallel training and validation of a novel human 3D pose estimation algorithms on datasets such as Human3.6M and NTU-RGB+D (2.3 TB)
  • Developed Swift code to integrate TensorFlow TFLite and Apple MLPackage models with CoreML for NE(NPU)/CPU/GPU processing and utilized React Native to establish the connection between AI and user interface.
  • Leveraged ASW cloud services such as Cognito, DynamoDB, and S3.
Jun 2022 - Present

MLOps Engineer

TeCSAR Lab.

Designed and implemented an end-to-end scalable, intelligent advanced video surveillance vision pipeline, achieving a system performance of 23 frames per second (FPS) for eight concurrent cameras at Full HD resolution. Demo.

  • Utilized four deep learning models for the purposes of detection, re-identification, body pose estimation, and segmentation.
  • Trained a person re-ID model on large datasets, including DukeMTC, CUHK03, and Market1501, enhancing its accuracy for mixed-precision inference.
  • Used PyTorch Multiprocessing Process and Queues to parallelize models inference.
  • Leveraged the Flask module to construct a RESTful API that serves an ML model for re-identifying individuals across different camera clients.
Aug 2021 - Jun 2022

Graduate Student Research Assistant

The UNC, Charlotte

Designed and developed Agile Temporal Convolutional Neural Network (ATCN), a scalable deep learning model with adjustable hyper-parameters to enable time series analysis for resource-constrained edge systems.

  • Implemented in C/C++, the solution consumed only 49% of the 320KB RAM and 15% of the 1MB flash memory available on a Cortex-M7 microcontroller.
  • Used data augmentation techniques, such as jittering, magnitude warping, window warping, and scaling, to enhance model robustness on the UCR 2018 dataset.

Invented a customized multi-head attention Temporal Convolutions Network (TCN) for efficiently and precisely predicting highway vehicle trajectories for highway and self-driving cars safety applications.

  • Redesigned dilated TCN with separable depth-wise convolutional neural network to reduce the model size and complexity by approximately 33.16% compared to LSTM-based approaches.

Implemented HW/CW co-design for application-specific architectures, accelerating EfficientNet and MobileNetV2 inference on Xilinx embedded and cloud FPGAs. Achieved an improvement of up to 8.6x FPS/W. Demo

  • Architectural model optimization such as quantization (4-bit), layers fusion, pruning, and activation approximation.
  • Hardware level optimization includes pipelining, window buffering, etc.

Designed a recurrent deep learning solution for real-time edge processing in reliability modeling of Si-MOSFET power electronics converters.

  • Designed stacked LSTM networks for time series analysis.
  • Utilized a NASA dataset for training and validation, enhancing the accuracy and efficiency of the system.

Experienced in administration, system driver configuration, and resource allocation, including hardware RAID setup for multiple servers with server-class GPUs such as P100, V100, RTX 6000. Skilled in configuring deep learning frameworks like TensorFlow and PyTorch to optimize performance across various server

Aug 2017 - Aug 2021

Education

University of North Carolina at Charlotte

Ph.D.
Electrical and Computer Engineering - Computer Architecture and Deep Learning
Aug 2017 - Aug 2021

University of Tehran

M.Sc.
Computer Architecture Engineering
Sep 2009 - Sep 2021

Skills

Programming & RTL Languages
  • Python
  • C/C++
  • SystemVerilog
  • SyestemC
  • Verilog/VHDL
  • Shell and TCL
  • JavaScript|TypeScript
  • SQL
  • Familiar with Swift

AI/ML Algorithms
  • Modern Deep Learning Neural Network such as CNN, RNN, Transformer
  • Tokenization
  • Transfer Learning
  • Attention-based Neural Network
  • Big Data Distributed Processing
  • Parameters Compression & Quantization
  • Evaluation Metrics (accuracy, precision, recall, F1-score, ROC curves, etc.)
  • Knowledge Distillation

AWS Cloud Services
  • Cognito
  • DynamoDB
  • S3 Bucket
  • E2 Compute Cloud

Software Framework
  • React Native
  • Ray Cluster
  • Ansible

Embedded System
  • Git
  • GNU build tools
  • JTAG
  • Cross-compilation
  • Make/CMake
  • I2C
  • SPI
  • UART
  • GPIOs
  • RS-232/485
  • ARM Assembly
  • Familiar with Linux Driver