Computer Vision - ECCV 2022

Computer Vision - ECCV 2022

17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVI

Avidan, Shai; Brostow, Gabriel; Farinella, Giovanni Maria; Hassner, Tal; Cisse, Moustapha

Springer International Publishing AG

10/2022

755

Mole

Inglês

9783031200588

15 a 20 dias

Descrição não disponível.
Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing.- Generative Negative Text Replay for Continual Vision-Language Pretraining.- Video Graph Transformer for Video Question Answering.- Trace Controlled Text to Image Generation.- Video Question Answering with Iterative Video-Text Co-Tokenization.- Rethinking Data Augmentation for Robust Visual Question Answering.- Explicit Image Caption Editing.- Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding.- Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.- GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features.- Selective Query-Guided Debiasing for Video Corpus Moment Retrieval.- Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding.- Object-Centric Unsupervised Image Captioning.- Contrastive Vision-Language Pre-training with Limited Resources.- Learning Linguistic Association towards Efficient Text-Video Retrieval.- ASSISTER: Assistive Navigation via Conditional Instruction Generation.- X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks.- Learning Disentanglement with Decoupled Labels for Vision-Language Navigation.- Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input.- Word-Level Fine-Grained Story Visualization.- Unifying Event Detection and Captioning as Sequence Generation via Pre-training.- Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation.- Fine-Grained Visual Entailment.- Bottom Up Top down Detection Transformers for Language Grounding in Images and Point Clouds.- New Datasets and Models for Contextual Reasoning in Visual Dialog.- VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage FeatureSelection.- Classification-Regression for Chart Comprehension.- AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant.- FindIt: Generalized Localization with Natural Language Queries.- UniTAB: Unifying Text and Box Outputs for Grounded VisionLanguage Modeling.- Scaling Open-Vocabulary Image Segmentation with Image-Level Labels.- The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning.- Speaker-Adaptive Lip Reading with User-Dependent Padding.- TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation.- SemAug: Semantically Meaningful Image Augmentations for Object Detection through Language Grounding.- Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance.- NewsStories: Illustrating Articles with Visual Summaries.- Webly Supervised Concept Expansion for General Purpose Vision Models.- FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation.- CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval.- Language-Driven Artistic Style Transfer.- Single-Stream Multi-level Alignment for Vision-Language Pretraining.
Este título pertence ao(s) assunto(s) indicados(s). Para ver outros títulos clique no assunto desejado.
artificial intelligence;color image processing;computational linguistics;computer systems;computer vision;image analysis;image coding;image processing;image quality;image segmentation;information retrieval;machine learning;mathematics;natural languages;pattern recognition;signal processing