Outline

  • Abstract
  • Keywords
  • 1. Introduction
  • 2. Convolutive Non-Negative Sparse Coding
  • 3. Overlap Detection
  • 4. Speaker Attribution
  • 5. Conclusions
  • References

رئوس مطالب

  • چکیده
  • کلید واژه ها
  • 1. مقدمه
  • 2. روش رمز گذاری پراکنده غیر منفی پیچیده
  • 3. شناسایی همزمانی
  • 4. ویژگی های مربوط به سخنگو
  • 5. نتیجه گیری

Abstract

Overlapping speech is known to degrade speaker diarization performance with impacts on speaker clustering and segmentation. While previous work made important advances in detecting overlapping speech intervals and in attributing them to relevant speakers, the problem remains largely unsolved. This paper reports the first application of convolutive non-negative sparse coding (CNSC) to the overlap problem. CNSC aims to decompose a composite signal into its underlying contributory parts and is thus naturally suited to overlap detection and attribution. Experimental results on NIST RT data show that the CNSC approach gives comparable results to a state-of-the-art hidden Markov model based overlap detector. In a practical diarization system, CNSC based speaker attribution is shown to reduce the speaker error by over 40% relative in overlapping segments.

Keywords: - - -

Conclusions

This paper reports an investigation into the use of convolutive nonnegative matrix factorisation with sparse constraints (CNSC) for the detection and attribution of overlapping speech in the context of speaker diarization. The CNSC approach gives overlap detection results which are comparable to a state-of-the art HMM overlap detection approach. It is also seen to perform well in the case of attributing an overlapping speech interval to contributing speakers. A limitation of the approach relates to the cross-projection of a speaker’s energy onto the bases of other speakers. This is to be expected since the bases are purely spectral representations and are thus not entirely decorrelated across speakers. The application of sparse constraints alleviates the problem to some extent by encouraging activations to be concentrated on a small number of bases. Further work is nevertheless required to optimise the number of bases, the convolution length and sparseness constraints to reduce cross projection. Our current work aims to integrate CNSC activations into HMM overlap detection framework to exploit the benefit of duration modelling. Future work could include an analysis of different speaker bases to detect speakers with multiple models in a typical diarization system and the full integration of CNSC into a regular speaker diarization framework. This should include a thorough study of the impact of overlap on speaker diarization.

دانلود ترجمه تخصصی این مقاله دانلود رایگان فایل pdf انگلیسی