Peter Li

View My GitHub Profile

Welcome

I'm a PhD student in the Music and Audio Research Lab (MARL) at New York University working with Juan Bello. Before that, I was a Masters student at the Center for Data Science. I try to tackle problems in machine listening using methods from machine learing, deep learning, signal processing, and recently biological auditory systems. Previously, I worked on large-scale data analysis on social networks and before that empirical asset pricing and asset allocation. Here's a collection of projects that I've worked on in the past couple of years:

Audio and Machine Learning

Papers

An RNN Model for Single Channel Source Separation with Iterative Subtraction [submited ICASSP 2018]

Peter Li, Israel Malkin, Tian Wang, Kyunghyun Cho, and Juan Bello

In this paper, we propose a source separation model based on recurrent neural networks and a novel iterative subtraction architecture that allows us to train speaker dependent and independent separators. We describe architectures and weight sharing methods for estimating sources via masks and spectrum directly. Our approach achieves a 5 dB - 7 dB SDR a NMF baseline in a closed speaker set evaluation. Further, we show that our proposed model is robust to additional broadband noise and mixing conditions not seen during model training. pdf

Deep Salience Representations for f0 Estimation in Polyphonic Music

Rachel M. Bittner, Brian McFee, Justin Salamon, Peter Li, Juan P. Bello
In 18th International Society for Music Information Retrieval Conference (ISMIR), Suzhou, China, Oct. 2017..

In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets pdf

Scaper: A Library for Soundscape Synthesis and Augmentation

Justin Salamon, Duncan MacConnell, Mark Cartwright, Peter Li, Juan Pablo Bello
In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.

Scaper is a library for soundscape synthesis and augmentation. Using scaper one can automatically synthesize soundscapes with corresponding ground truth annotations. It is useful for running controlled ML experiments (ASR, sound event detection, bioacoustic species recognition, etc.) and experiments to assess human annotation performance. It's also potentially useful for generating data for source separation experiments and for generating ambisonic soundscapes. pdf

Projects

End-to-end Source Identification Using Convolutional Neural Networks

Traditional methods to tackle many music information retrieval tasks typically follow a two-step architecture: feature engineering followed by a simple learning algorithm. In these ”shallow” architectures, feature engineering and learn- ing are typically disjoint and unrelated. Additionally, feature engineering is difficult, and typically depends on extensive domain expertise.In this report, we present an application of convolutional neural networks for the task of automatic musical instrument identification. In this model, feature extraction and learning algorithms are trained together in an end-to-end fashion. We show that a convolutional neural network trained on raw audio can achieve performance surpassing traditional methods that rely on hand-crafted features. pdf

Automatic Speech Recognition Using Recurrent Neural Networks Encoder/Decoder Models with Attention

Encoder-decoder models are a powerful class of models that let us learn mappings from variable length input sequences to variable length output sequences. In this report, we investigate the efficacy of Encoder-decoder systems for the task of phoneme recognition. This was a project for the Natural Language Understanding with Distributed Representations course at NYU. pdf

Speech Enhancement with Matrix Factorization

In this report, we explore techniques for speech enhancement using matrix factorization. We focus on enhancing speech signals corrupted with environmental noise. We implement unsupervised and "semi-supervised" methods that do not rely on access to uncorrupted speech for model training. This was a project for the Optimization-based Data Analysis course at NYU. pdf

Social Networks

Inferring Demographic Attributes of Social Media Users Using Label Propagation

In this paper, we propose a method to infer demographic attributes of social media users. We present a model that uses social ties between users to infer demographic attributes. This is a graph-based algorithm that leverages homophily by spreading age labels on the the @mention network.

This is a project that I worked on during a summer internship at HRL Laboratories. It was presented at WIN 2015. pdf