关于我

「他只想证明他来过」

大家好,我是CS(CHenSHuai),一名算法工程师,主要从事与机器视觉、计算机图像学和深度学习等领域。我热爱计算机科学和人工智能领域,致力于将先进的算法应用于实际问题中,解决现实世界中的挑战。

联系我

技术栈

  • 编程语言:C/C++, Python, Java, GLSL
  • 框架和库:TensorFlow, PyTorch, OpenCV, OpenGL, Qt, MNN, NCNN, CMake, ImGui
  • 算法领域:计算机视觉, 深度学习, 生成式模型, LLM
  • 开发工具:Visual Studio, Visual Studio Code, PyCharm, Jupyter Notebook, Adroid Studio, Git, Docker

开源项目

项目经历

  • XX公司: 2022-至今:算法工程师
    • 基于图传监视器的视频分析与视频处理技术负责人 —— 2019年10月-至今
      • 项目主要内容:在 Android/IOS 手机设备实时远程监控图传数据,支持软解/硬解下 Full Range/Limited Range,并对视频流数据进行实时高性能的图像处理、图像分析、渲染等。
        • 高性能 & 低功耗图像处理:NV12、YUYV视频源的图像实时60fps处理,内存优化,零拷贝纹理渲染,离线渲染,多处理的高性能叠加;
        • 全功能:实现比例缩放、3DLUT、九宫格、伪彩色、双阈值斑马纹、锐化、放大镜、伪彩色、峰值对焦、波形图、直方图、矢量图、ToF点云渲染等。
        • 跨平台:部署落地在 Android、iOS;输出 Windows、Linux 端 Demo。
      • 项目收益:近一年(2023.12-2024.12)仅在 Apple Store 中下载量为34.5W(Android 暂无具体统计参数)。
    • 直播美颜项目负责人 —— 2023年11月-至今
      • 项目主要内容:Android(高通8250)、Windows(GTX1050Ti及以上)平台下,支持任意直播场景中实时美颜处理,提升皮肤美感,弥补ISP中人脸稍暗缺陷,提升直播质量。AI结果转零拷贝纹理耗时优化,OpenGL线程状态同步优化,预初始化AI加载优化,链路耗时波动优化,最终渲染耗时少于5ms,可以稳定30fps推流。
        • 贴纸:主要负责链路中图像算法与渲染实现,具体包括:贴纸的平移、旋转、缩放、镜像等;美颜中人脸贴纸图层对齐人脸关键点的三角剖分与渲染。
        • 皮肤分割:采用轻量化皮肤matting模型,优化不同肤色五官区域数据集优化处理,并制作优化Trimap与alpha数据集。准确识别和处理皮肤区域,提升美颜效果的针对性和精确度。采用snpe2.22中PTQ量化,snpe DSP 推理后端,链路上采用分割模型并行推理方式,每帧整体链路耗时大幅减少14ms。
        • 自研绿幕抠图:高效地去除视频背景,实现前景与背景的精确分离。涵盖了颜色转换、距离计算、图像腐蚀、高斯模糊、前景合成等功能。将 RGB 颜色空间转换为 YUV 颜色空间,用于计算与目标绿色的距离,提高抠图精度。图像处理:实现了基于距离图的腐蚀与高斯模糊处理,平滑边缘,减少噪声,提高合成效果。前景合成:根据 alpha 通道与前景图像进行精确合成,确保前景与背景的自然过渡。
        • 美白LUT调色模块:根据不同程度生成美白 .cube 与 .3dl,渲染3DLut。采用pbo零拷贝方案,隔帧方案。后处理将AI皮肤分割结果结合传统皮肤检测方法(高斯概率分布)、卡尔曼滤波,增强皮肤检测准确性与稳定性。
        • 磨皮: 基于OpenGL实现磨皮算法,解决传统磨皮效果过于生硬的问题,确保皮肤细腻自然。解决gamma提亮方法中皮肤被提亮问题,并通过结合高斯权重与颜色距离权重方法解决边缘溢色的问题。
        • 瘦脸\大眼\亮牙(预研) 实现瘦脸、大眼、亮牙等特效,增强用户的个性化需求,提升直播的趣味性和吸引力。
      • 项目收益:正式上线公司相机产品与直播平台。
    • 实时alpha抠图模型及部署项目负责人 —— 2023年11月-至今
      • 项目主要内容:将单人或多人(5人)环境下人体皮肤区域进行快速、稳定地alpha抠图。
        • 模型优化:基于 MobileNetV2 backbone 的 deeplabV3+ 模型,采用四通道输入(通过不降分辨率,而采用叠加通道方式来提升性能),将模型输入层通道维数3480480改为4(3320*320),可将耗时减少约6ms,同时反卷积替换resize算子,以适应在alpha Matting任务中的高性能低延时需求。
        • 帧间稳定性:基于人脸检测区域进行卡尔曼滤波,实现帧间稳定性。
        • 模型部署:移动端采用snpe的PTQ量化方式,运行在DSP设备上; PC端采用NCNN进行部署。采用OpenCL进行前处理,在保持精度的同时,推理耗时11ms。
    • AI虚化直播:技术负责人 —— 2023年11月-至今
      • 项目主要内容:AI深度估计模型模拟DSLR的光斑模糊渲染,模仿传统摄影大光圈浅景深的效果。
        • 深度估计模型: 采用 DPT V2 单目深度估计模型,实现DataLoder与训练代码。同时提供轻量化的PyDNet2单目深度估计模型,模型转换MNN/NCNN格式,编写C++推流代码。
        • 对焦距离帧间稳定性: 基于人脸检测区域进行卡尔曼滤波,实现帧间稳定性。
        • 深度图帧间稳定性: 基于帧卡尔曼滤波,实现模拟DSLR的光斑时的模糊稳定。
        • DSLR渲染:结合AI深度图、人像抠图mask、DP等不同格式类型的输入,来提升虚化准确性与效果。使用OpenGL模拟不同镜头F值、焦段下的散景效果,相较于市面上其他主流竞品的效果更加真实。
    • 视频降噪:技术负责人 —— 2023年11月-至今
      • 项目主要内容:基于nlMeans的视频降噪,降低传感器存在的亮度噪声与彩色噪声。
      • 性能:基于GPU加速的OpenGL优化实现,满足60fps的视频降噪。
    • 开放词汇多标签图像分类系统:技术负责人 —— 2025年11月-2026年4月
      • 项目主要内容:实现零样本学习场景下的开放词汇多标签分类。该系统已落地应用于AI智能剪辑产品,支撑视频素材的自动化场景标签识别,支持 5000+ 类别开放词汇分类,无需重新训练即可识别新增类别。
        • 提出两阶段训练策略:第一阶段冻结 CLIP 主干网络进行知识蒸馏,第二阶段引入可学习提示(Prompt Tuning)联合优化,显著提升零样本泛化能力。
        • 成功适配 DINOv3 等骨干网络,通过系统性对比实验验证不同视觉基础模型的迁移效果,设计投影层将 768 维视觉特征映射到 512 维语义空间,实现跨模态特征对齐。
        • 提出多目标优化函数:结合交叉熵损失、对比损失和排名损失,稀有类别召回率提升 15%
        • 分布式训练与性能优化:实现单机多卡及多机多卡并行训练,配合混合精度训练,训练效率提升 3.5倍
        • 在 NUS-WIDE 数据集上实现零样本 mAP 41.2%(超越基线 BiAM 37.6%),超越AAAI 2023 Oral论文性能,训练吞吐量提升 2.9倍,模型参数量减少 75.5%
        • 模型已通过 ONNX 量化部署,落地应用于 AI 智能剪辑产品。

论文专利

[1]. Radar Reflectivity and Meteorological Factors Merging‐Based Precipitation Estimation Neural Network[J].Earth and Space Science,2021,8(10). [pdf]

[2]. Offline Single-Polarization Radar Quantitative Precipitation Estimation Based on a Spatiotemporal Deep Fusion Model[J].Advances in Meteorology,2021.

[3]. A Cloud-Removal Method for Snow Product Based on Denoising Autoencoder Neural Network[J].Journal of Nanjing University of Information Science and Technology (Natural Science Edition),2023,15(02).

[4]. Cloud-Removal Algorithm and Application Research for Snow Product at Basin Scale[D].2022.DOI:10.27248/d.cnki.gnjqc.2022.001085.

[5]. A Vehicle Vibration Noise Detection Alarm Device[P].Jiangsu Province:CN202120495866.X,2022-03-15.

教育背景

  • 南京信息工程大学:2019-2022:硕士研究生
  • 南京信息工程大学:2015-2019:本科

个人博客

社交媒体

我热爱探索新技术,解决实际问题,享受与同行交流与合作。如果您对我的工作感兴趣或有合作意向,请随时联系我。我期待与您一起探索计算机科学的无限可能!

Hello everyone, I am CS (Chen Shuai), an algorithm engineer specializing in machine vision, computer graphics, and deep learning. I am passionate about computer science and artificial intelligence, dedicated to applying advanced algorithms to solve real-world challenges.

Tech Stack

  • Programming Languages: C/C++, Python, Java, GLSL
  • Frameworks & Libraries: TensorFlow, PyTorch, OpenCV, OpenGL, Qt, MNN, NCNN, CMake, ImGui
  • Algorithm Domains: Computer Vision, Deep Learning, Generative Models, LLM
  • Development Tools: Visual Studio, Visual Studio Code, PyCharm, Jupyter Notebook, Android Studio, Git, Docker

Open Source Projects

Project Experience

  • XX Company: Algorithm Engineer (2022–Present)
    • Video Analysis & Processing for Video Transmission Monitor (Tech Lead) — Oct 2019–Present
      • Description: Real-time remote monitoring of video transmission data on Android/iOS devices. Supports soft/hard decoding for Full/Limited Range, performing high-performance real-time image processing, analysis, and rendering on video streams.
        • High Performance & Low Power: Real-time 60fps processing of NV12/YUYV sources, memory optimization, zero-copy texture rendering, offline rendering, and high-performance overlay of multiple processes.
        • Full Features: Implemented scaling, 3D LUT, grid lines, false color, dual-threshold zebra patterns, sharpening, magnifier, peak focus, waveform, histogram, vectorscope, ToF point cloud rendering, etc.
        • Cross-Platform: Deployed on Android and iOS; released demos for Windows and Linux.
      • Impact: 345k downloads on the Apple Store in the last year (2023.12–2024.12).
    • Live Streaming Beauty Filter (Project Lead) — Nov 2023–Present
      • Description: Real-time beauty processing for live streaming scenarios on Android (Snapdragon 8250) and Windows (GTX1050Ti+). Enhances skin aesthetics, compensates for ISP limitations, and improves stream quality. Optimized AI-to-zero-copy texture conversion, OpenGL thread synchronization, and pre-initialization, achieving <5ms rendering time and stable 30fps streaming.
        • Stickers: Implemented image algorithms and rendering for sticker translation, rotation, scaling, and mirroring; triangulation and rendering of face stickers aligned with facial keypoints.
        • Skin Segmentation: Used a lightweight skin matting model with optimized datasets for different skin tones and facial features. Created optimized Trimap and alpha datasets. Utilized SNPE 2.22 PTQ quantization and DSP backend with parallel inference, reducing latency by 14ms per frame.
        • Proprietary Green Screen Keying: Efficient background removal with precise foreground-background separation. Includes color space conversion (RGB to YUV), distance calculation, erosion, Gaussian blur, and foreground synthesis.
        • Whitening LUT Module: Generated .cube and .3dl files for varying degrees of whitening. Used PBO zero-copy and frame-skipping schemes. Combined AI skin segmentation with traditional methods (Gaussian probability) and Kalman filtering for stability.
        • Skin Smoothing: Implemented OpenGL-based smoothing to avoid harsh effects, ensuring natural skin texture. Solved gamma brightening issues and edge color bleeding using Gaussian and color distance weights.
        • Face Slimming/Eye Enlarging/Teeth Whitening (R&D): Implemented special effects to enhance personalization and engagement.
      • Impact: Officially launched in company camera products and live streaming platforms.
    • Real-time Alpha Matting Model & Deployment (Project Lead) — Nov 2023–Present
      • Description: Fast and stable alpha matting for human skin regions in single or multi-person (up to 5) environments.
        • Model Optimization: Based on MobileNetV2 backbone DeeplabV3+. Changed input from 3480480 to 4(3320*320) (stacked channels) to reduce latency by ~6ms. Replaced resize operators with deconvolution for high-performance low-latency alpha matting.
        • Inter-frame Stability: Applied Kalman filtering on face detection regions.
        • Deployment: Mobile deployment via SNPE PTQ quantization on DSP; PC deployment via NCNN. Used OpenCL for pre-processing, achieving 11ms inference time while maintaining precision.
    • AI Bokeh for Live Streaming (Tech Lead) — Nov 2023–Present
      • Description: AI Depth Estimation Model and DSLR-like Bokeh Rendering to simulate shallow depth of field.
        • Depth Estimation: Implemented DPT V2 monocular depth estimation (DataLoader & training code). Provided lightweight PyDNet2 model, converted to MNN/NCNN, with C++ streaming code.
        • Focus Distance Stability: Kalman filtering on face detection regions.
        • Depth Map Stability: Frame-based Kalman filtering for stable bokeh simulation.
        • DSLR Rendering: Combined AI depth maps, portrait masks, and DP inputs. Simulated bokeh effects for different lens F-stops and focal lengths using OpenGL, offering more realistic results than competitors.
    • Video Denoising (Tech Lead) — Nov 2023–Present
      • Description: NLMeans-based video denoising to reduce luminance and chroma noise from sensors.
      • Performance: GPU-accelerated OpenGL implementation achieving 60fps denoising.
    • Open-Vocabulary Multi-Label Image Classification System (Tech Lead) — Nov 2025–Apr 2026 (Note: Please verify these dates as they are in the future)
      • Description: Zero-shot open-vocabulary multi-label classification system deployed in AI Smart Editing Products. Supports automated scene tag recognition for video materials, handling 5000+ categories without retraining.
        • Two-Stage Training Strategy: Stage 1 freezes CLIP backbone for knowledge distillation; Stage 2 introduces Prompt Tuning for joint optimization, significantly improving zero-shot generalization.
        • Backbone Adaptation: Successfully adapted DINOv3 and other backbones. Designed a projection layer to map 768-dim visual features to 512-dim semantic space for cross-modal alignment.
        • Multi-Objective Optimization: Combined Cross-Entropy, Contrastive, and Ranking losses, improving recall for rare classes by 15%.
        • Distributed Training & Optimization: Implemented single/multi-node multi-GPU parallel training with mixed precision, improving training efficiency by 3.5x.
        • Performance: Achieved zero-shot mAP of 41.2% on NUS-WIDE (surpassing baseline BiAM 37.6% and AAAI 2023 Oral paper performance). Training throughput increased by 2.9x, and model parameters reduced by 75.5%.
        • Deployment: Model quantized and deployed via ONNX in AI smart editing products.

Papers and Patents

  1. Radar Reflectivity and Meteorological Factors Merging‐Based Precipitation Estimation Neural Network[J]. Earth and Space Science, 2021, 8(10). pdf
  2. Offline Single-Polarization Radar Quantitative Precipitation Estimation Based on a Spatiotemporal Deep Fusion Model[J]. Advances in Meteorology, 2021.
  3. A Cloud-Removal Method for Snow Product Based on Denoising Autoencoder Neural Network[J]. Journal of Nanjing University of Information Science and Technology (Natural Science Edition), 2023, 15(02).
  4. Cloud-Removal Algorithm and Application Research for Snow Product at Basin Scale[D]. 2022. DOI:10.27248/d.cnki.gnjqc.2022.001085.
  5. A Vehicle Vibration Noise Detection Alarm Device[P]. Jiangsu Province: CN202120495866.X, 2022-03-15.

Education

  • Nanjing University of Information Science and Technology: Master’s Degree (2019–2022)
  • Nanjing University of Information Science and Technology: Bachelor’s Degree (2015–2019)

Personal Blog

  • My Personal Blog: Sharing technical articles, project experiences, and industry insights.

Social Media

Contact Information

I am enthusiastic about exploring new technologies, solving practical problems, and enjoy exchanging and collaborating with peers. If you are interested in my work or have any cooperation intentions, please feel free to contact me. I look forward to exploring the infinite possibilities of computer science with you!