I'm a PhD student in the Music and Audio Research Lab (MARL) at New York University working with Juan Bello. Before that, I was a Masters student at the Center for Data Science. I try to tackle problems in machine listening using methods from machine learing, deep learning, signal processing, and recently biological auditory systems. Previously, I worked on large-scale data analysis on social networks and before that empirical asset pricing and asset allocation. Here's a collection of projects that I've worked on in the past couple of years:
Papers
Peter Li, Israel Malkin, Tian Wang, Kyunghyun Cho, and Juan Bello
In this paper, we propose a source separation model based on recurrent neural networks and a novel iterative subtraction architecture that allows us to train speaker dependent and independent separators. We describe architectures and weight sharing methods for estimating sources via masks and spectrum directly. Our approach achieves a 5 dB - 7 dB SDR a NMF baseline in a closed speaker set evaluation. Further, we show that our proposed model is robust to additional broadband noise and mixing conditions not seen during model training. pdf
Rachel M. Bittner, Brian McFee, Justin Salamon, Peter Li, Juan P. Bello
In 18th International Society for Music Information Retrieval Conference (ISMIR), Suzhou, China, Oct. 2017..
In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets pdf
Justin Salamon, Duncan MacConnell, Mark Cartwright, Peter Li, Juan Pablo Bello
In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.
Scaper is a library for soundscape synthesis and augmentation. Using scaper one can automatically synthesize soundscapes with corresponding ground truth annotations. It is useful for running controlled ML experiments (ASR, sound event detection, bioacoustic species recognition, etc.) and experiments to assess human annotation performance. It's also potentially useful for generating data for source separation experiments and for generating ambisonic soundscapes. pdf
Projects
Traditional methods to tackle many music information retrieval tasks typically follow a two-step architecture: feature engineering followed by a simple learning algorithm. In these ”shallow” architectures, feature engineering and learn- ing are typically disjoint and unrelated. Additionally, feature engineering is difficult, and typically depends on extensive domain expertise.In this report, we present an application of convolutional neural networks for the task of automatic musical instrument identification. In this model, feature extraction and learning algorithms are trained together in an end-to-end fashion. We show that a convolutional neural network trained on raw audio can achieve performance surpassing traditional methods that rely on hand-crafted features. pdf
Encoder-decoder models are a powerful class of models that let us learn mappings from variable length input sequences to variable length output sequences. In this report, we investigate the efficacy of Encoder-decoder systems for the task of phoneme recognition. This was a project for the Natural Language Understanding with Distributed Representations course at NYU. pdf
In this report, we explore techniques for speech enhancement using matrix factorization. We focus on enhancing speech signals corrupted with environmental noise. We implement unsupervised and "semi-supervised" methods that do not rely on access to uncorrupted speech for model training. This was a project for the Optimization-based Data Analysis course at NYU. pdf
In this paper, we propose a method to infer demographic attributes of social media users. We present a model that uses social ties between users to infer demographic attributes. This is a graph-based algorithm that leverages homophily by spreading age labels on the the @mention network.
This is a project that I worked on during a summer internship at HRL Laboratories. It was presented at WIN 2015. pdf