Week 3!

This week felt a bit shorter, but probably because it was shorter, haha! (no work Memorial Day Monday) We still got a decent amount of stuff done though! For one, we wrapped up the background research portion of our project and presented that to our peers on Thursday. I learned a lot from our peers’ presentations and noticed some concepts that tie into our research. The Sign Detection team noted a tradeoff between computation time and accuracy for sign detection systems, which is a prevalent issue in captioning services (what we are working on) as well. I learned what Web RTC is about from the group that’s working on the UX/UI of it. I had no idea Web RTC is so robust, capable of supporting more than one hundred clients without a loss in performance or a gain in latency :o! We will likely be using a Web RTC service (probably Google Meet) to simulate the ASR captioning portion of our study. Hopefully, we will figure out how to only display the captions of one speaker to the participants though. Linda has an idea, something to do with a VB-Cable…I think? The Caption Metrics team mentioned that DHH people tend to prefer verbatim over edited captions as verbatim captions more accurately capture what was said. That’s good to know and it provides even more rationale for our study which focuses on the accuracy of verbatim captions. The Caption UX/UI team brought up the disparity in health literacy between DHH and hearing individuals. This was my first time learning about health literacy and wow! It’s wicked important. One more sector where caption quality and accessibility have serious implications!

I familiarized myself with two of the softwares we’ll be using in our study this week. After poking around here and there for a day on ELAN and Praat, I realized ELAN’s more of an annotation application while Praat is better geared for acoustic measurements although I think it also has some annotation capabilities. I ended up deciding to annotate on ELAN since it seems easier to organize and export annotations. With Isabelle, I learned how to synchronize time segments of audio on ELAN and Praat, so we can drop measurements we make in Praat in our ELAN annotations. Friday, I found a captioning chrome extension, Otter.ai, that would not only generate the captions for a Google Meet call but also produce a text transcript of the captions that you can download. This is HUGE as it would eliminate the need to run a recording of the call with on screen captions through an optical character recognition software to produce a text transcript. We also wouldn’t have to separately run a recording of the audio through a software that generates the captions which we would then edit to produce a ground truth document. We can just save the text transcript from Otter.ai and then save another copy where we edit to make a ground truth. We would then feed these two transcripts through SCLITE to find word error rate; hopefully, we will get access and learn how to do that this upcoming week. There are some downsides with the Otter.ai chrome extension approach but Isabelle and I will discuss with Linda on Monday and see. I’m hopeful and it looks like things are starting to come together! I hope we can get everything set up and run through a workflow, from beginning to end, by the end of next week! I’m excited, for sure.

Last week, I mentioned that I was participating in Joey and Ramzy’s mock study this week. Unfortunately, Ramzy was having internet connection issues when we were scheduled to do it, so it didn’t happen. Joey and I did have an opportunity to chat with each other for some time though and that was nice. We rescheduled the study for Monday, so I’m looking forward to that now!

Written on June 4, 2021