NIME Diversity Committee Officers
Juan Pablo Martinez Avila
PhD Student
Mixed Reality Lab
School of Computer Science
University of Nottingham
Nottingham, UK
Juan Pablo Martinez Avila
PhD Student
Mixed Reality Lab
School of Computer Science
University of Nottingham
Nottingham, UK
|
Research Fellow in Machine Learning for Audio Captioning
Applications are invited for a Research Fellow (RF) position for 22 months within the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey, UK, to work on a project titled "Automated Captioning of Image and Audio for Visually and Hearing Impaired", which is a collaborative project between the University of Surrey and the Izmir Katip Celebi University (IKCU), Turkey, with project partners from charities and industrial sectors working with the hearing and visually impaired. This project aims to address fundamental challenges in audio and image captioning, develop new algorithms to improve performance of audio and image captioning algorithms, and application tools that could be used by the hearing and visually impaired to access audio and image content.
The work at Surrey will focus on new methods and algorithms of automated audio captioning and natural language description of audio. This work is built on the significant contributions of CVSSP in the area of acoustic scene analysis, audio event detection, environmental sound recognition, and audio tagging, together with preliminary results on audio captioning. This new project offers an opportunity to take this work to the next stages, and demonstrate the benefit of such technologies for the hearing and visually impaired. A smartphone based prototype will be developed for audio and visual captioning jointly by Surrey and IKCU. New data will also be gathered, including audio-visual data for captioning, and user feedback for the prototype system.
The postholder will be responsible for investigating and developing audio signal processing, machine learning algorithms for natural language description of sound, and implementing software for prototyping the concept and algorithms. The postholder should have a doctoral level (or equivalent) research and development experience in electronic engineering, applied mathematics, computer science, artificial intelligence, machine learning, natural language processing, or related subjects. The postholder should ideally have experience in one of the following areas: audio captioning, machine description of audio, audio classification, audio tagging, image captioning, video captioning, translations between audio/image and texts, and/or translation between audio and video.
The post-holder will be based in CVSSP, and work under the direction of the Principal Investigator Prof Wenwu Wang, with co-supervision by Prof Sabine Braun, Director of the Centre for Translation Studies, at University of Surrey, and in collaboration with Dr Volkan Kilic, from the IKCU, Turkey.
CVSSP is an International Centre of Excellence for research in Audio-Visual Machine Perception, with over 150 researchers, a grant portfolio of £24M (£17.5M EPSRC) from EPSRC, EU, InnovateUK, charity and industry, and a turnover of £7M/annum. The Centre has state-of-the-art acoustic capture and analysis facilities and a Visual Media Lab with video and audio capture facilities supporting research in real-time video and audio processing and visualisation. CVSSP has a compute facility with 120 GPUs and >1PB of high-speed secure storage.
For informal inquiries, please contact Prof Wenwu Wang (Email: w.wang@surrey.ac.uk; Web: http://personal.ee.surrey.ac.uk/Personal/W.Wang/).
Please apply online using the following link:
https://jobs.surrey.ac.uk/vacancy.aspx?ref=015821
Apologies for cross-posting.
Attached below is an advertisement for four research posts available at CVSSP, Surrey, in the area of audio-visual AI. Please feel free to share with those who might be interested. Many thanks.
Best wishes,
Wenwu
Join a new research partnership with the BBC at the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey
Four research posts available in Audio-Visual AI, Computer Vision and Audio:
Research Fellow B https://jobs.surrey.ac.uk/Vacancy.aspx?ref=015321
Research Fellow A https://jobs.surrey.ac.uk/vacancy.aspx?ref=015121
2x Research Software Engineer/Research Assistant https://jobs.surrey.ac.uk/vacancy.aspx?ref=015621
Exciting opportunity for outstanding researchers in Computer Vision, Audio and Audio-Visual AI to join CVSSP as part of a major new five-year research partnership with the BBC to realise Future Personalised Media Experiences.
The goal of the research partnership is to realise future personalised content creation and delivery at scale for the public at home or on the move. CVSSP research will address the key challenges for personalised content creation and rendering by advancing computer vision and audio-visual AI to transform captured 2D video to object-based media. Research will advance automatic online understanding, reconstruction and neural rendering of complex dynamic real-world scenes and events. This will enable a new generation of personalised media content which adapts to user requirements and interests. The new partnership with the BBC and creative industry partners will position the UK to lead future personalised media experiences.
The Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey is ranked first in the UK for computer vision. The centre leads ground-breaking research in audio-visual AI and machine perception for the benefit of people and society through technological innovations in healthcare, security, entertainment, robotics and communications. Over the past two decades CVSSP has pioneered advances in 3D and 4D computer vision, spatial audio and audio-visual AI which have enabled award winning technologies for content production in TV, film, games and immersive entertainment.
BBC R&D (bbc.co.uk/rd) has a worldwide reputation for developments in media technology going back over 90 years and has worked closely with CVSSP for over 20 years. It has pioneered the development of object-based media, working closely with programme-makers and technology teams across the BBC. Recent work has included object-based audio delivery across multiple synchronised devices for sports and drama, and AI for recognising wildlife for natural history.
This is an opportunity for outstanding researchers to join a world-renowned research centre at the start of a major new five-year research partnership.
Research Fellow B
The Research Fellow B will be an experienced researcher with an excellent track-record of publication in leading academic forums and post-doctoral research leadership. The successful candidate will take an active role in leading the research programme, contributing novel machine learning approaches to real-world dynamic scene understanding and reconstruction from video, and co-supervision of post-doctoral and PhD researchers.
Research Fellow A
The Research Fellow A will hold a PhD in computer vision, audio and/or audio-visual AI with a track-record of publication in leading academic forums. The successful candidate will contribute novel machine learning approaches advancing audio-visual AI to transform video of real-world scenes to object-based representation and neural rendering. The post-holder will collaborate with the team and project partners to realise personalised media experiences.
2x Research Software Engineer/Research Assistant
The Research Software Engineer/Research Assistant will have experience of research and software development in computer vision, audio and machine learning. The post holder will support the research programme, contributing to research and technologies which enable the transformation of video to audio-visual objects, production of personalised media experiences and object-based audio-visual rendering.
All posts are at the core of a research team working together with the BBC, University and industry partners to realise personalised object-based media experiences at scale for offline content and live events. These posts will enable individuals to advance knowledge in computer vision, audio and machine learning and raise their own academic and research profile by joining Europe's largest research centre in this field. All posts will initially be offered for a fixed term which is extendable for the 5 year duration of the partnership.
--
Professor Wenwu Wang
Centre for Vision Speech and Signal Processing
Department of Electronic Engineering
University of Surrey
Guildford GU2 7XH
United Kingdom
Phone: +44 (0) 1483 686039
Fax: +44 (0) 1483 686031
Email: w.wang@surrey.ac.uk
http://personal.ee.surrey.ac.uk/Personal/W.Wang/