We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard synchronous cross-modal relations, CrissCross also learns asynchronous cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong time-invariant representations. Our experiments show that strong augmentations for both audio and visual modalities with the relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics-400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50). The codes and pretrained models are publicly available.

read more:
[arXiv] [project website]

The first row from the top is showing generated ECG, corresponding to input PPG (second row from the top). The last row is the original ECG.


We propose a novel framework called CardioGAN for generating ECG signals from PPG inputs. We utilize attention-based generators and dual time and frequency domain discriminators along with a CycleGAN backbone to obtain realistic ECG signals. To the best of our knowledge, no other studies have attempted to generate ECG from PPG (or in fact any cross-modality signal-to-signal translation in the biosignal domain) using GANs or other deep learning techniques. Moreover, CardioGAN makes it possible monitoring daily life cardiac activity in a continuous manner.

read more:
[AAAI] [project website]

Self-supervised ECG Representation Learning

We exploit a self-supervised deep multi-task learning framework for electrocardiogram (ECG) -based emotion recognition. To the best of our knowledge, this is the first time self-supervised learning is utilized to perform emotion recognition using ECG.

read more:
[ICASSP] [IEEE-TAFC] [slides]
[official code (tf)] [official code (tf)] [pytorch-reimplementation (noqa)]

Classification of Cognitive Load and Expertise for Adaptive Simulation

We propose an end-to-end framework for a trauma simulation that actively classifies a participant’s level of cognitive load and expertise for the development of a dynamically adaptive simulation.

read more:
[ACII] [Sensors] [slides]

Computer-Aided Diagnosis

This paper presents a deep learning method for computer-aided differential diagnosis of benign and malignant breast cancer tumors by avoiding potential errors caused by poor feature selection as well as class imbalances in the dataset.

read more:
[ICMLA] [poster]