Computer Vision - ECCV 2024

Computer Vision - ECCV 2024

18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part X

Roth, Stefan; Varol, Guel; Sattler, Torsten; Leonardis, Ales; Russakovsky, Olga; Ricci, Elisa

Springer International Publishing AG

12/2024

490

Mole

9783031726835

Pré-lançamento - envio 15 a 20 dias após a sua edição

Descrição não disponível.
Modeling and Driving Human Body Soundfields through Acoustic Primitives.- m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks.- Label-anticipated Event Disentanglement for Audio-Visual Video Parsing.- High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding.- Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization.- I-MedSAM: Implicit Medical Image Segmentation with Segment Anything.- ReMamber: Referring Image Segmentation with Mamba Twister.- TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting.- CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios.- Segmentation-guided Layer-wise Image Vectorization with Gradient Fills.- Implicit Style-Content Separation using B-LoRA.- OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models.- ActionVOS: Actions as Prompts for Video Object Segmentation.- FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance.- U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation.- Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization.- Rotary Position Embedding for Vision Transformer.- Local All-Pair Correspondence for Point Tracking.- MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection.- ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments.- S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis.- ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos.- Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos.- PQ-SAM: Post-training Quantization for Segment Anything Model.- CPM: Class-conditional Prompting Machine for Audio-visual Segmentation.- Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition.- DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment.
artificial intelligence;computer networks;computer systems;computer vision;education;Human-Computer Interaction (HCI);image analysis;image coding;image processing;image reconstruction;image segmentation;learning;machine learning;object recognition;pattern recognition;reconstruction;signal processing;software engineering