Hedwig Explains

[Paper notes] Fine Tuning Large Vision-language Models as Decision-Making Agents via Reinforcement Learning

vision llm agents for supercool future

Description This paper might have scratched the surface of future VisionAgent models. Maybe, Rabbit inc might be able to do 1% of what they have promised :p The Problem Usually the LLMs are able...

Posted by Hitesh Kumar on August 10, 2024

[Code review] Dig into TransReID official code repo

vision transformer for reid

Description This blog post is to revive my blogging habit and post some bits of transreid code that i found interesting. TransReid is a popular paper focused on the concept of reidentification of ...

Posted by Hitesh Kumar on June 16, 2024

[Paper review] CLIP2Scene: Towards Label-efficient 3D Scene Understanding

Towards Label-efficient 3D Scene Understanding By clip

Description We have seen the usage of CLIP in 2D scenes for example zero/few shot anomaly detections. Here in this paper scenes are more oriented toward 3D setting. Focused on transfering 2D CLIP ...

Posted by Hitesh Kumar on May 9, 2024

[Paper review] WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation

Zero-/Few-Shot Anomaly Classification and Segmentation

Description - The idea of anomaly detection is pretty known thoughout the industry and research community. Yet, it remains one of the toughest problems to solve. Particularly in the context of man...

Posted by Hitesh Kumar on April 10, 2024

[Paper review] UniDistill: A Universal Cross-Modality Knowledge Distillation Framework....

Knowledge Distillation on 3D object detection

Description - Knowledge Distillation is developing technique for bridging the information gap between bigger and smaller model. Smaller models are essentially caters to industry specific needs due...

Posted by Hitesh Kumar on March 29, 2024

Explained : Attention mechanism & Transformers

what actually is LLM ?

Introduction Year of 2022 and 2023, probably go down as the “infliction year”, on similar magnitude to rise of internet services. ChatGPT, Claude, Gemini based high quality LLM(large language mode...

Posted by Hitesh Kumar on March 27, 2024

How does a lidar work ?

diving deep into lidar intricacies

Introduction If you have worked on Lidar involving in data gathering, data processing or anything else. You might know why it is so highly used in different applications process. But sometimes we ...

Posted by Hitesh Kumar on March 25, 2024

[Paper review] Single-image camera calibration with model-free distortion correction

Single image precise camera calibration

Description - Camera calibration is important in the variety of task from image stitching to 3d to 2d point projection. So having accurate intrinsic matrix and distortion co-efficients is crucial....

Posted by Hitesh Kumar on March 24, 2024

How to convert Onnx to TFlite model?

Model conversion

Idea : Converting onnx to tflite are crucial in many sense. When you want to ship models for android phone or embedded device. Generally, you might not be tensorflow inclined. You may use differen...

Posted by Hitesh Kumar on February 24, 2023

How to learn ALMOST any programming language?

Learning programming languages

is it essential to learn as many programming language as possible ? - No. Ofcourse not. But you should know how to approach learning new programming language when the time comes ( which i believe ...

Posted by Hitesh Kumar on February 16, 2023