Play Video

Ben:

Hi, and welcome to the better podcast in today’s video. We’re talking to Tim Kerbavaz Talon entertainment. We’re going to be talking about live captioning. We’re going to be talking about live streaming accessability for websites, accessability for events and hybrid events. Basically, this man has done so many things and we’re going to touch on a point of everything. So please enjoy tonight’s video.

 

Tim:

 

We talk about hybrid events and virtual events, and a lot of people think of these as a really new thing. As something that has never happened before. And when I look at this, I look at what we’re doing and say, this is television. We’re making television. It’s a broadcast and I think, one of the things that happens a lot is in the old in-room space. There was definitely this very, very hard line and it was often literally like the loading dock door was the line between the in-room AV team and the broadcast team. And they’re often separate teams and they meet with two SDI cables and that’s all, and there’s really no communication. There’s no integration and so I think really what, to me, what the hybrid event word means is not the necessarily that the tools and technologies are new it’s that we’re integrating those teams into the process earlier on that the broadcast is no longer an afterthought.

 

Tim:

 

It’s really part of the planning process upfront and we’re looking at how can we make sure the broadcast experience, the online audience experience matches that of the in-room and how do we make sure that they’re getting that interactivity? So these are not necessarily new things. There’s been chat, there’s been polling, but really what are we doing to actually build that in upfront and bring those teams in earlier so that your broadcast is not a bolt-on, it’s an integral part of the project.

 

Ben:

Absolutely. You brought up a good point and I want to ask you this. Do you think it is a prerequisite that AV teams have to replicate the conference experience or that it’s now on them to create something new and engaging that isn’t just a replication?

 

Tim:

 

So I will say you cannot replicate the conference experience online. It is impossible, particularly when you’re talking about things like an expo hall, just really honestly, there is not a single virtual expo product that is any good.[crosstalk 00:02:36] And anything that I’ve seen that just. The expo experience, what, I mean, it’s like the trade show hall experience is either this silly VR thing where we’re walking around a virtual hall and looking at avatars, or it’s a webpage with a bunch of links to watch videos or conference calls. I don’t necessarily think that’s really replicating. What to me is important about an expo hall, is the organic discovery. The fact that you literally build inconvenience into your event and force people to walk from one end of hall, A all the way to hall Z-

 

Ben:

The IKEA plan

 

Tim:

 

Right. To get you pass everyone and so that you have these islands, these flagship booths, and make you make everybody walk past the low rent ones before they get between and that creates an organic discovery. And as an attendee, as much as I resent that one breakout is over here and one breakout is over here and I have to hike. I really love that organic discovery, and so I feel like, what I want is to be able to wander and see something to catch my eye and have a conversation and jumping into a scheduled video call with a vendor is really not that level of organic discovery.

 

Ben:

 

Yeah. Do you think that there is… I have a thought that there should always be for any exhibitor held online, there should be a small booth where you are spending the day interviewing and talking to exhibitors and having a small breakout stage that kind of gets that. I’ll just tune in between the things and look around and I mean, that I think is a close replication to get that magical experience of just, because like you I’ll walk the halls, just to see what’s there.

 

Tim:

 

Right. Yeah. And I think that that does kind of replicate the organic discovery where I can just drop in and see lightning talks kind of just that pitch or elevator pitch, “Hey, Benjamin from better cast, give me your elevator pitch to anyone who’s tuning in.” And just cycle through and that makes sense. I mean, you ask kind of what is the difference or what is the goal with a virtual event replicating an in-person event versus creating a different experience. I think it’s important that the content and the core mission of the event to be translated online, I don’t think it’s necessary to replicate every single experience and I don’t think that’s really possible, but I do think that, one of the promises of virtual events and this is not something I think that people necessarily talk about a lot but one of the things that for me is really important about the virtual event and hybrid event model is over the last year, there are lots of people who have been able to attend events of a wide variety of topics and a wide variety of industries who could not come to an in-person event.

 

Tim:

 

And people who have been excluded from in-person events, by virtue of their finances, they can’t afford to travel by virtue of their geography. They’re in a different country, but want to attend this event. By virtue of their ability, people who have disabilities that prevent them from traveling or prevent them from wandering around the exhibit hall all day on their feet, can now participate in these events in an interactive way. And in many ways, these are things that people with disabilities have been asking for, for years. The ability to participate remotely, and we’re finally giving to them and I think the ability to build that in as a promise to our viewers, that we’re continuing this ability for you to participate no matter where you come from and where you meet us, we’ll meet you there.

 

Ben:

 

Absolutely. Perfect segue. I mean, your skillset is incredibly diverse. If you spend a little bit of time in your LinkedIn, you’ve done everything. One of the things that stands out of course, is this commitment to accessability. You have it in all factors now, even though hybrid has given the opportunity for people outside of the country to attend these events. That usually wouldn’t… exactly as you’ve just said, how will you seeing… I have my own opinion on this, but how are you seeing a hybrid event platforms dealing with basic accessability.

 

Tim:

Terribly honestly.

 

Ben:

Full stop.

 

Tim:

 

Yeah. There are very few platforms that support even the most baseline accessability features. And it’s something that almost all of the big names in the events space use one of a handful of video engines and I said this on Twitter kind of snidely, but most virtual platforms are basically a Mux or IVS with a house of cards of IFrames around them. All of them are at the point of like, I look at it, I was like, oh, if I poke this wrong, the whole thing will blow over. Even the biggest names in the industry are just Iframes around a video player and the problem is that in all of this, almost none of them have built in basic accessability functions into that system and part of it is that honestly, Mux does not support captioning at all. As far as I know-

 

Ben:

On a live stream they don’t support captioning. We use Mux, we used to use IVS we’ve moved to Mux. On a live stream they don’t support captioning on post event, however, so playback they do support captioning which is fine. I also have it on good authority that they are working on it.

 

Tim:

 

Okay, yeah. And I think, one of the things is, a lot of platforms that… Even the ones that do support caption are often just putting in an Iframe for stream text or something, which is I think at this point kind of an acceptable solution, but just not ideal in my view. When platforms like, YouTube and IBM and Vimeo and all of the kind of big video CDNs support six or eight captions in frame and I think the in-frame is a better experience for someone who’s actually trying to follow the content because they could see the slides in the video and read in the same box, whereas with an Iframe stream text, you’re looking below. It really irks me when they don’t embed the Iframe and just have a link and so now somebody has to have their phone out and trying to read the captions and like-

 

Ben:

oh my

 

Tim:

Or like a different browser window, but it’s just not [crosstalk 00:09:10]an ideal experience. Yeah. It sort of vaguely passes muster, but to me you’re not trying very hard. But again, it’s because these platform, even the biggest ones really don’t support that and then, when it comes to other things, I mean, I’m not as much an expert in WCAG compliance in terms of actually testing, compliance testing. So I can’t necessarily tell you what the big five are doing in terms of actual compliance, but I know reading their compliance statements, they’re all sort of, it might work, I mean, that’s pretty much the accessability statement from a lot of these is like, we tried kind of.

 

Ben:

 

And trying is not enough. I actually have a call coming up with someone who’s talking specifically about the WCAG 2.1, 2.2 and how platforms can work to become a little bit more compliant because the point you just said, hybrid allows accessability geographically, where if the platform lets everyone down because of the second step of accessability you might as well not bother.

 

Tim:

 

Right. One of the things that obviously as you know, a production vendor, who’s not building a platform, obviously you have a lot more say in this when you build a platform, but in terms of as someone who’s often told by client, “Hey, we’re using XYZ platform, make a video show up in it.” It’s something where, obviously I often have very little control over that and so it’s often figuring out how best to sort of shove the accessability into the space that we’re given and even, I mean frankly, I’m embarrassed to say, but I met a lot of my clients just don’t want to spend the money.

 

Tim:

 

A lot of the events I do don’t have live captions because as much as I tell them, they should. I mean, these are groups that’s maybe not a priority for them. And so, as you said, VOD captures, I’m pushing really hard to have everything that I do captioned on demand if it’s not capturing live but I cannot say that I have a perfect back record of event accessability in terms of… based on client demands but it’s something that I’m really interested in furthering as an industry. And it’s something that I’m really interested in working with other technical partners, companies like Bettercast company, AV companies, technicians on really building out these systems and really having folks understand what the need is, because until folks really know what at the very least captioning is and how it works much less, all these other levels of instability.

 

Tim:

 

I mean, I think we’re getting into the point where we’re going to start needing to do audio description and it’s already been on the horizon, but as we get right into the higher WCAG standards, those are not required and so things like audio description, sign language interpretation, which of course, you’re in a different continent than me and so sign language, varies by language and continent and region and so finding partners, interpreting partners and production partners that really understand that nuance of how to translate, interpret that into regionally specific or audience or culturally competent translation interpretation.

 

Ben:

Yeah, Absolutely. So it’s a passion obviously of yours, which is a segue to something we should’ve done at the start of the call, but it was so good. I just had to hit record. Tim, tell me about yourself. Where does this passion come from?

 

Tim:

 

Yeah, so I’m Tim Kerbavaz, I’m the owner of Talon entertainment audio visual. I am based in California. I’ve been in the production space for I guess a little more than a decade and I’ve been working for almost all of my professional career in what we now call hybrid events. Almost every event that I’ve done for years has been in this space where we have in-person audiences and interactive web component and so I think the idea to me that hybrid or virtual events are a new thing is frustrating to me because I’m like, well, I’ve been doing this for 10 years. What have you been doing? Oh, no. As much as I don’t mean to be so snide, but in terms of the-

 

Ben:

It’s okay, you were doing it before it was cool.

 

Tim:

Yeah, exactly. And it’s something that this space very much interests me, the technology interests me, but also as a geek, it’s fun for me to kind of see all these tools and mature. And so, when I was doing things with literal phone hybrid, analog phone hybrids for remote audiences and coming to full HD video at this point and beyond but when we talk about these models, I mean, I come from of a fairly odd mix of sort of academic and scientific in, hyper-specific industry groups. I bumped by mic there. I’ve done a lot of automotive events and then in a sort of broader corporate and high tech space and so being exposed to this pretty broad spectrum of clients and a very, very broad swath of audiences, it’s given me appreciation for really the impact that events can have and the reach that events can have and how much better that reach is when we do video well, and when we do video and events in an accessible manner. One of the things, oh, go ahead.

 

Ben:

No, I was going to say absolutely.

 

Tim:

 

Yeah. One of the things when accessability specifically I have this sort of anecdote about one of the reasons that accessability is important to me. I was working in event many years ago where I was the production manager and it was a very long weekend for me or a week. It a three or four day event with a load in day and there was also this performance in the general session, one of the days. And so we did rehearsals overnight, two nights before. So this was a very small event. So I was the GS technician and the production manager, and I had texts and breakouts. And so I was working the day I would, then we reload it in for the rehearsal.

 

Tim:

We did the rehearsal, I was there until midnight and then back at six in the morning and it was through three days in a row of this and on the final day, halfway through, I went up to help someone fiddle with their laptop on the lectern and I stepped backwards and fell off the stage and just twisted my ankle and laid on the floor and the front of this audience for five minutes.

 

Tim:

I just couldn’t get up. And I ended up, finding someone to just sit at the board for a while, while I went to a clinic and got my ankle x-rayed and splinted, and then came back to the event because I had to see this event through, it was the last day, we had to tear it down.

 

Ben:

Show has got to go on.

 

Tim:

And the show was going to go on. One of the expressions that I’ve kind of adopted is, the expression, the show must go on is kind of a misnomer. The show is going on and we’re kind of all keeping up-

 

Ben:

Regardless of what you want to show is happening.

 

Tim:

 

Right, and obviously this was not so much a safety issue in terms of something was built wrong and the stage was not an inappropriately build stage. It was that I was exhausted and just didn’t realize that I was on the edge of the stage and fell off and thankfully I wasn’t hurt worse. It was only a two foot riser but I came back to this event that I had been at for three days walking around and I came back on crutches and I was in this venue knew that I’d been in hundreds of times and I realized that the front door opener button doesn’t work. So I had to wait for somebody to come to open the door for me so I can a get in.

 

Tim:

 

And then I went to sit down and I realized, this event, they squeeze so many chairs and the aisles are too narrow for me to get my crutches in. There’s just not enough space in the building for me now that I have this mobility aid or that I can’t maneuver tight corners and I realized, we had cables coming out of floor boxes and had boxes over them, which was already a terrible idea, but the building didn’t have enough power and so we were using floor outlets, but they were in aisles, there was obstructions in aisles that we had to sort of put a bollard on top of, to keep people from tripping on them but now the aisle is cut in half.

 

Tim:

 

And so, I can’t get through and I’m realizing, I went to use the restroom and the restroom had no door openers, whether the batteries worked or not, there just was no button to open the restroom door. And so, I couldn’t get into the restroom and so just the realization that this space completely changed for me when I came back with a body that was no longer [crosstalk 00:18:41] this space. Right. Yeah.

 

Ben:

Yeah. That’s such a minor fee in mobility.

 

Tim:

 

Exactly. It was a minor injury and it was temporary. I knew in a few weeks I would be on my feet again, but it was just this wake up call that for people who interact with the world differently, for whatever reason, then sort of the normal. The world is not built for anybody but people who move about the world in the way that I mostly do and so having that temporary perspective of this experience with a physical disability and extrapolating that to, what the rest of the world is like for folks who have any number of disabilities or even, I mean, with captioning. I mean, I think one of the things that is really important to me about captioning particularly is that it helps so many people besides necessarily just a deaf audience.

 

Tim:

 

And so certainly captioning is needed for someone deaf. Although actually, for someone who’s deaf, sign language might be more appropriate because that might be their native language, but for captioning specifically, you are helping deaf audiences. You are helping audiences for whom the conference language is not their native language, because maybe they are fluent or mostly fluent, but reading the text lets them understand it better. You’re helping people who are in a crowded room and can’t have the sound very loud. You’re helping, people who are, maybe parents who have small children who have to keep the volume low because their kids are napping, but they’re attending this conference from home. You’re helping a really wide breadth of the audience who maybe aren’t necessarily who you think is your audience for an accessability feature, but for whom you’re really improving the experience and providing access to your event in a way they really wouldn’t have it before.

 

Ben:

Yeah, absolutely and it’s good that went into the captioning because that’s actually how, we started talking, it was about captioning.

 

Tim:

Yeah.

 

Ben:

 

And you did mention, previously about seeing the technology grow over the last decade specifically Stynography and the technology of Stynography which from what I can tell has not grown over the last two or three decades. For one, how is this not a growing market? Or is it a growing market and why is the technology not keeping up?

 

Tim:

 

Right. So I can say with regards to technology, I mean, Stynography is a tool set. I don’t know if you’d call it a language, but it’s a notation method for rapidly creating verbatim transcripts and it’s primarily[crosstalk 00:21:40] Exactly. So Stynography is essentially the machine, the keystrokes, the mechanism we’re stynographic captioning is exactly the same mechanism that’s used for court reporting and the reason it’s used for both court reporting and real-time captioning, is it is the most accurate way or the way to be the most accurate and the fastest of creating a real-time transcript, where you get the difference between a court reporter and a real time captioner. Who a stynographers, they’re trained at the same, but a court reporter is creating a transcript, which they then take home and edit and then bring back the next day a captioner is doing Styno live and they can’t edit it once it goes live, it’s on the television and it’s out, they cannot edit it.

 

Tim:

 

And so that’s where there’s, in some ways a skillset gap, because most Stynographers are court reporters. That’s the biggest market there’s, so many legal reasons that you would have a Stynographer and they are required by court. For captioning, there’s a much smaller pool of people who are even interested in that realm and then of those people who are really comfortable and have the skillset to do that really rapidly and accurately without being able to edit it and kind of have the stomach for once I ship it, it’s gone. And so there’s already a kind of limited pool of a limited pool. I mean, it’s not Stynography or court reporting is a trained process that takes certification and so there are several years of training and school that go into that. And so once you’ve gone to court reporter school, you can start working, in a courtroom, once you have your certificate, but to go captioning, you really need to have kind of worked in a courtroom for a few years and gotten comfortable and gone and gotten better at it.

 

Tim:

 

And so, I think we’re looking at a limited pool of an already limited sort of pool of certified people. In terms of the technology itself and I think that the software that stynographers use is fairly complex and when they’re building custom libraries or custom dictionaries, rather of their custom language and of how they’re typing. So with stynography, just a really brief primer, it’s a chord based typing and so there’s this keyboard. It’s like a piano. So they press multiple keys at a time, the keys represent essentially phonemes. They’re not actually letters they represent essentially sounds and so they play the keys as if they were the sounds that take to say the word, and then their software says, ah, this keepers combination translates to this word.

 

Tim:

 

And so if you looked at a transcript of a stynographer that had not gone through this translation software, it would just be these letter combinations that represent the phonemes and it would not be legible. So court reporters used to use a mechanical keyboard that typed on a paper strip, and they would then read back. They would translate out loud to read back the software now does that translation for them in real time, but it’s a software translation. So they’ll have a custom dictionary. If there’s two words that have the same set of key presses, they’ll have a modifier, while there are standards, they often have their own custom dictionaries and particularly when we get into things captioning for Silicon Valley tech events. There’s all these weird words that are not real words or are real words spelled wrong. Al these companies that have names that are like… and so or, even things like, Bettercast with no space and capital B and capital C,-

 

Ben:

TTR CSG.

 

Tim:

 

Right, Exactly, but even things like, conjunctions into that. So, they’ll build a custom library for their work and serve the kinds of clients and so when we give captioners prep. We tell them, this the identities, the speakers here’s their slides they’re going through and programming their dictionary for the words that they don’t already have and the dictionary.

 

Ben:

Okay. So there’s a lot of prep before an event. It’s not just come, listen, type.

 

Tim:

 

And so, I mean, sometimes it is that and so like for live TV, it often is you’re just listed there on the phone listening and they’re typing and you got what you get. For events, particularly at the higher level where we expect a higher level of accuracy the expectations that we’re providing the captioners with advanced material, where we are giving them typically agenda, speaker names, speaker titles, slides, if they’re available and basically any kind of collateral that we have in existence for the event and for that talk specifically, so abstracts, speaker bios, so that they’re prepared to know that when Benjamin Powell comes on screen, he’s Benjamin Powell of Bettercast and his and Powell has two Ls. And so that level of prep helps them be more accurate.

 

Tim:

 

And that’s how you get into the level of accuracy that human captioning can provide. Certainly if you’re not providing that prep or if you’re working with… When you’re sort of at the bottom of the list of captioners and maybe some folks who are newer to the industry or aren’t as experienced you won’t get as good of accuracy. And so I’ve definitely had events where the human capturing was not very good, but I also have had events where I have seen essentially miracles, these captioners are keeping up and typing things that I… they know how to spell words that I didn’t know existed.

 

Ben:

Nice. And it almost seems like there’s a potential new Korea opportunity for unemployed classical pianists.

 

Tim:

 

Right. I mean, I think, it’s an interesting nexus and there’s this… It’s a very niche industry. I mean, as you point out most people don’t really know it exists or how it works and it’s something that I think is very interesting, but you ask kind of why has this technology not changed? I mean, I think in terms of the technology that they use has changed in that it sort of computerized and they’re sort of these automatic translation features, but the reality is that the underlying system of taking stynographic captions and getting them onto a broadcast is essentially based on NTSC TV standards. I mean, literally like the systems… The systems that I’m working in I mean, there are some innovations in that space, but the underlying connection is essentially a serial cable between the stynographer and a box that injects them onto an NTSC TV line.

 

Tim:

 

And so the systems that I’m using that are a little more advanced are effectively creating an internet serial cable between the captioner and the injection system. I mean they’re literally emulating serial ports at that point, because these systems have been kind of built for TV technology where the stynographer is dialing in via a modem. I mean, that’s how remote captioning was done as they would listen to a phone call, a conference call and dial in via a modem to write to the TV studio, the TV station. And so there are newer technologies, but they’re kind of based on this old legacy.

 

Ben:

 

The good thing is a little bit of people who are going to be watching this, know what a modem is and probably still know what it sounds like.

 

Tim:

Right. And I recognize a lot of folks probably haven’t heard of modem at this point. Luckily I have not used a modem in a very long time. I mean, I’m doing all this stuff over the internet, but I say that in chess, but I mean, the technologies really are based around broadcast TV and are there for kind of these legacies of literally NTSC standards.

 

Ben:

Well, that brings up a really good point then. So, if there is an assumption that 60% of events are going to be hybrid.

 

Tim:

Right.

 

Ben:

 

I see that assumption going forward post COVID and that is an expectation of a level of accessability that you need this captioning. I want talk to you about AI versus human captioning[crosstalk 00:29:50] but there’s an expectation that captioning is part of the conference. Do you think that the technology that’s required for captioning is going to hold back the ability full platforms to easily… because I mean, we’ve spoken at length about how I can integrate capturing technology, know my platform and it’s not easy. Is it going to hold it back do you think?

 

Tim:

 

Yeah. I mean, I think that the basic mechanism by which captionist get added to web video in terms of live stynographic captions is sort of this stack of standards that kind of all go back to these old TV standards. They’re all sort of the standard references of the standard references of the standard that literally references NTSC and I think that there has been very little innovation in terms of really upending that that method, at least I should say on a commercialized scale, because I mean the typical way that I’m getting captions into a YouTube video is with either a hardware or software captioning encoder that injects a CEA-608 encoded data on the, on text data or the on caption data line of our field in a RTMP H.264 stream. There are obviously ways to do that over in HLS stream as well. They’re slightly different, but basically that particular methodology is, like I said, literally just taking TV technology, NTSC technology and sort of shoving it into the web.

 

Ben:

Okay. So, is that injection into the RTMP that’s done pre-encoding on site to the RTMP just trans-coder server?

 

Tim:

 

So it can be done two ways and I use, there’s a company called EEG Enterprises. They’re on the east coast, upstate New York, I think obviously east coast of the U.S, I should say specifically I’m in the U.S, and they make an ecosystem called iCap and iCap is, as I alluded to earlier, this digital, web-based serial internet serial cable where there’s an applet that lives on the captioners computer creates a virtual serial port, sends this data over the internet to their ecosystem and then sends the audio back to the captioner to listen to headphones. What iCap talks to either a hardware appliance and that hardware appliance, you plug SDI in and with no captions and then you have an SDI out with captions and what it does is it takes that SDI in, takes the audio off, sends it to iCap takes the captions coming back and it puts them into the CEA-608 data field.

 

Tim:

 

And so 608 is actually line 21 on NTSC TV on analog television. Of course video digital video SDI is a little different, but it’s effectively emulating that line 21 data in the digital realm. What happens with that output that then has this 608 text data is your streaming and encoder your High Vision, Makito, your Teradek Cube, whatever you’re using, AJA HELO, there’s only a few at the sort of mid price tier that support captions. A lot of the inexpensive encoders do not support captions, but when you’re working with an encoder that sports captions, do you plug that SDI in, it takes that SDI data, and then it takes the video and it takes the text data and sends it on that on caption or on text data field.

 

Tim:

 

The other way to do this is with a web based encoding product and there’s very few of these in the market. EEG makes one, and I use this all the time. Their product is called Falcon. What that does is you send an RTMP stream to it. So rather than pushing your encoder to YouTube directly, you push your encoder to Falcon and then Falcon takes the audio, it sends it to the captioner. It takes that caption data puts it back on the stream, adds it to that context data field, and then forwards that RTMP stream on to your destination. So essentially it lives in the cloud as intermediary. So it’s essentially a re-encoding. I don’t know re-encoding is right word. It’s basically adding that data. I don’t think it actually re-encoding the video, but it’s essentially slipping in, injecting that the text data. So-

 

Ben:

So that’s going to work well with anyone using IVS because you can ingest, that additional strain you’ve got extra latency because you’re having that of course. Screwed from Mux.

 

Tim:

Screwed for Mux and-

 

Ben:

Come on, Mux get your stuff together.

 

Tim:

 

The other thing I will say is, so the other way that platforms get captions and YouTube does this some other platforms do this and I’ve seen custom platforms where companies are really building entirely custom workflows and so I say commercial solutions EEG kind of makes the one, but in terms of if you’re building an ecosystem, if you’re building out an IVS workflow and you’re building out, you’re building cloud hosted whether it’s Wowza or MediaLive Elemental and building out kind of cloud encoded workflows. There are ways to take an HTTP post, text string and write that on caption data field. It’s not necessarily a trivial task. It’s not necessarily just a few clicks, but it is possible to inject that yourself, if you’re writing software.

 

Tim:

 

So, for folks like you who are building platforms are both people who are building custom and encoding workflows, cloud encoding workflows in AWS, in Azure media services. There are ways to essentially add that data and then you can, however you get texts into that HDB post is up to you, whether you create a web page that a stynographer is typing into it, because they can sort of emulate a keyboard and just type really fast, whether their software just lets them post strings to that URL. So there’s a product called StreamText, which is a product that basically creates a webpage with scrolling text on screen. I looked at this earlier, this is a way to send somebody a link for an in-person event or a web event that they can open up their phone and have the captions scroll.

 

Tim:

 

So if you’re in a ballroom, or if you’re in an arena, where you couldn’t necessarily have screens flanking the monitor, because somebody is up on the rafters you can give them this link and they can read on their personal device, the captions go by, but the StreamText also will do this kind of HDB post of that same text and so you can give it essentially an API key or some kind of way of getting into your URL that he can post to. And then it can… so if you’re building a system and you create that link that you can have StreamText posts to that link.

 

Ben:

So you could potentially have that as an overlay over the video player, if you want on page. Of course, that’s going to have additional issues with Safari and mobile where you can’t overlay,. It’s not an easy problem to fix. One of the ways that people are trying to fix that of course is with AI.

 

Tim:

Right

 

Ben:

I don’t want it to get to AI. I imagine it’s a bit of a topic for you, but I’ve had experience with AI. I do understand the limitations of it. In your experience though, where are we with AI interpretation?

 

Tim:

 

In the last year or so it has gotten so much better. I mean, I would say two years ago it was unusable. I would say at this point-

 

Ben:

In that short amount of time. It’s made that big jump?

 

Tim:

 

I would say so. I would say it’s at the point where it may not quite meet legal compliance. So if you’re in the U.S, a university or a public institution that is by statute and compelled to have captioning it probably doesn’t quite meet your legal requirement because any kind of errors could then be held against you in a lawsuit, that said I think for many clients I would’ve said two years ago, it was not better than nothing. I would’ve said it was worse than nothing, because it was so inaccurate that it was misleading. I would say, now it is better than nothing. I would say, if your option is AI or human styno, go with human Styno, but if your option is AI or nothing I think the AI is better than nothing.

 

Tim:

 

Go with, go with AI, and I think that the other thing is some of the systems have trainable language models where you can give it phonetic pronunciation and actual spelling of those things like your product name, your CEO’s name, your speakers, where you can basically upload a CSV file, or manually enter kind of the key words and phrases. The kinds of stuff that you would give to a human captioner but just not the slide just plain text, but you essentially give it a phonetic spelling and a actual spelling and every time it hears that phonetic pronunciation, it types it this way. So if your product is Betr, B-E-T-R you would tell it every time it hear better to type it that way. Of course you’d run the risk, if something is a Better Option. [crosstalk 00:39:47] yeah.

 

Tim:

 

And I’ve talked to vendors about this and they’re like, well, would you rather it get that wrong or get your product name wrong? , but that’s the kind of where I’m feeling is it’s better than nothing at this point and it’s getting better. Once you have systems that can be trained if you’re using those products that are then paid products that you’re getting support in training they are even better still. I still think humans are better, but I think that it’s a viable option for events that either don’t have the money or particularly that don’t have the statutory requirements. So, if you’re worried about getting sued, use human captioners but if you’re trying to do the right thing and can’t afford the human captioners for your week long event I do think AI is better than nothing.

 

Ben:

So I imagine you’ve used most of the AIs or you’ve had experienced most of the Ais from Microsoft, Google, Amazon of course. Is there anyone that stands out as like the leader of the pack?

 

Tim:

 

I have been really impressed with Google’s performance out of the box with very little training. That said, I think Google’s AI works better in something like Google Meet, where it knows who the speakers are, so it can kind of differentiate between voices and all of the AI’s no matter who’s, they are, really have trouble differentiating speaker voices in a mixed audio situation. So, they’re going to perform better in Google meet, then they are going to perform in a YouTube broadcast because, it can separate out who’s talking when and who are talking over each other it hears them separately, because it’s got essentially isn’t accessing the mix-minuses or the separate audio, isolated audio. So I’ve been impressed with Google’s in Meet. I’m not amazingly impressed with it on YouTube, but like I said, it’s gotten a lot better.

 

Tim:

 

I mean, in terms of when I’ve run stuff through YouTube, it’s less terrible. I’ve done a lot of events with IBM streaming and built in captioning. IBM is offering, that’s built into their video enterprise product does not currently have the ability to offer training to it as of a week or two ago. When I last talked to them about this, they are working on that so that should be on the roadmap. So I don’t know what the timeline is, but I expect that to be available. IBM already has some captioning products that are actually designed for broadcast and so a lot of local news, IBM has a whole suite of products for TV news, including weather.

 

Tim:

 

I mean, a lot of TV weather products are actually powered by IBM’s products, and they’re not necessarily IBM branded, but IBM has a lot of kind of stuff that sits quietly in the background and runs the things we do. And one of things they do is have this system for captioning that integrates with their kind of suite of TV products that ties into the teleprompter and the station automation software. So, it’s captioning based on the teleprompter until you go off an ad lib and then it uses ASR. So it’s much more accurate because mostly when you’re reading the script, it knows what you’re saying.

 

Ben:

Do you actually see a blend then of human captioning and AI blending together where, you notice as it’s fixing spelling it’s listening and then… do you see that in the future?

 

Tim:

 

So there already is a lot of that for on-demand captioning. So a lot of the companies, companies like Rev, companies like Ai-Media in fact AI media who just bought EEG does that for live where they’ve got robots doing speech to text and then people correcting it and certainly for on-demand. Most on-demand captioning services, the reason they can get the price so low, under $2 a minute is they’re having an ASR product run through the video, and then they have a human go in and correct it. So you’re still paying for some human time, but it’s essentially the robot. does first pass [crosstalk 00:44:04] Yeah. In terms of live, I would say when shopping for captioning services there are several agencies that use a combination of either AI and human or something called voice writing.

 

Tim:

 

And I didn’t talk about this earlier, but the voice writing is a mechanism for doing live captioning where essentially if I’m a voice writer I’m in a booth like this and I’m listening to the video, and then I’m actually just repeating the words into a microphone that’s connected to Dragon NaturallySpeaking and so essentially it’s using a trained voice model from that speaker to improve the accuracy and so it’s a contiguous, a single voice and then they’re annotating when they need to add things they’re speaking that out loud. The problem with that is it isn’t as accurate because you’re still relying on that AI, you’re improving the accuracy of the AI captioning or the speech to text captioning, because it’s a voice that it’s already trained for the software.

 

Tim:

 

You’re still running the risk of a misheard mispronounced being said wrong and there was an issue fairly recently of a TV station in the U.S who had used a service that provided this voice writing service and there was essentially a slur that was said accidentally in the captions because the voice writer mispronounced something, or miss said something that came out as essentially an insult against the victim of this crime or whatever, whoever was on screen. And so essentially it appeared in the captions that the reporter had said something really negative, which they had not said it was not what they said, but it was now captured on screen-

 

Ben:

out there.

 

Tim:

 

Out there, and so it led to actually this TV station firing the company, canceling their contracts with that captioning vendor over this, but one of the things that, there was this public outcry and people were like, why didn’t you fire them sooner? It took a week or two and it’s like, well, because it takes time to find a new vendor, I mean They can’t sneak in just[crosstalk 00:46:20] and particularly at the level of TV station that operates 24/7, how do you… Even if you’re only capturing the news, everything else is pre-recorded, pre-captioned, but if you need that on demand level of service, there are not a lot of vendors out there that do that, at that scale.

 

Ben:

Well, that brings me onto the next question then. So for platforms, AV teams who are maybe building their own systems, what would be some tips that you could give them to help them become a little bit more accessible and deliver that level of accessability?

 

Tim:

 

Yeah. I mean, I think, as much as we discussed kind of all the pitfalls and pain points of captioning, I think captioning is a great starting point because it’s something that is pretty easy to understand as a concept. I want the words on the screen to match what people are saying and so while there’s obviously lots of technological challenges there. I think it’s a really… I think it’s sort of step one. I think the other big thing is, at the core when you’re building a platform or as a vendor, when you’re buying a platform for your clients it’s a webpage, at its core, it’s a browser based as a webpage and so making sure that either your product you’re building or a product you’re buying really do meet those WCAG captioning standards and obviously as we discussed, there are several levels and layers and versions and they have recently been updated, but I think, universities and public institutions require a checklist from their vendors before they sign a contract of what all of these accessability points they meet.

 

Tim:

 

And so I think, downloading one of those templates or those checklists, and when you’re shopping ask vendors, if they have a copy of their checklist of how do they meet the standards? And, they probably have one, if they’ve ever sold to a public institution in almost any country. And so say, I’m not going to give you my custom form maybe we’re not big enough to negotiate that, but send me what you have in terms of accessability checklists and statement because make it clear that’s really a value that your event and client have. And so I think, the more we’re asking vendors, these questions, and I make a point whenever I’m at a demo to really hit hard, these questions about accessability, about captioning about audio, is with the ability to do multiple audio channels in terms for audio description languages, the ability to adjust colors for color contrast.

 

Tim:

 

If we’re looking, if we’re asking these questions of the sales people who often are new to companies right now. Some of these companies have hired a bunch of people who don’t come from the events world and don’t really know what we’re talking about and so the more we ask them hard questions and really make them define in absolute terms what they meet and don’t meet, I think the more pressure we’re putting on the industry to be better. And the other thing I would say is, I mean, I think the reality is there’s very few platforms that really do this very well, that really do accessability very well. And so, I think there’s a point at which as much, I don’t think this is the right moral position. I think there’s a point at which, you’re not going to get perfection, and I think, it’s part of the equation in terms of picking a platform in terms of designing things, and I think the reality is that most people buying these services, most stakeholders, whether that’s event planners or people signing the check at these companies are not disabled.

 

Tim:

 

So they don’t necessarily have as much personal stake or buy-in to the accessability and so for the most of these folks, it’s not their number one priority. I’m really heartened to see big companies. I mean, Google, Microsoft, Salesforce really pressing their vendors hard on this point and really enforcing those accessability standards and really making sure that the public facing work, they do really does meet standards and I think there will be pressure throughout the industry as more big players take on that requirement. I think that the rest of the industry will figure it out. And I think it really does start with the companies that have that level of clout to say we’re Microsoft, and you have to do this for us-

 

Ben:

Or, you don’t get the contract.

 

Tim:

 

Exactly, and so I think that that level of pressure will make the industry better but I think if all of us, as we’re shopping for these platforms, ask those questions even if we can’t find the perfection, which is, I think hard to find I think that it’s better, that everyone is clear that this is important as an industry that we focus on this.

 

Ben:

So what you’re saying is it’s not really necessary for all of us to break our ankle and have to experience an event on crutches. We just have to be a little bit more considerate to those who do.

 

Tim:

 

Exactly. And I think really, a lot of these are not necessarily a really hard concept to understand. I think it’s just something that often for a lot of us is invisible in terms of our daily lives and so I think just remembering that a lot of your audience is going to interact with your events in person or online in a way that is different from the way that you do, and engaging with the voices of people who are outspoken about their needs in whether that’s, as a disability or whether that’s their parents, or all the kinds of ways that we can make our events more inclusive and I think listening to the voices of disabled people, of single parents, of people who are not as wealthy as you might think that your audience is, I think really is going to make your event more inclusive because we can talk all day about what we think people need, but I think asking your audience what they actually need and how we can actually include them better and then actually doing that.

 

Tim:

 

It’s one thing to ask, but if you’re going to actually do what they ask you to do. I think that’s the way that you really improve inclusivity because it’s not necessarily, meeting WCAG standards is important, but it’s also important to understand what your community needs, what your audience needs and make sure that, even if folks aren’t asking for it at the gate, when they write their ticket, that you’re building those kinds of features and tools and opportunities and flexibility into your event so that somebody can come to you and your project and say, as an audience member, I know I can attend this without problems because I trust this brand.

 

Ben:

 

So I kind of want to move, we’ll start around a one eight up, but I want to move a little bit more wider scope in your experience again. What would you say are the trends good or bad that you’re seeing now, as it’s essentially a new, I mean, you’ve been in it for over a decade, but it’s an emerging industry of hybrid. What are the trends you’re seeing good or bad?

 

Tim:

 

I see. One of the things that stands out to me and to be honest, I have very little experience with this, but it’s that kind of the VR experience, whether that’s a Goggle space experience or whether that’s browser-based moving around in a 3D world. I have very little with that experience of that and probably it’s because I’m not a video gamer, so that’s not necessarily a natural environment for me, but I do think about this, oh, we’ve reinvented second life.

 

Ben:

Well, I have to say, I do on some VR goggles. Yeah. I personally would never want to attend an event. Never.

 

Tim:

So why is that? I’ll turn this interview. Why don’t you want to attend an event in VR?

 

Ben:

Spending too long in that environment is painful headaches, eyes, It’s not an enjoyable, immersive experience as it could be. And also, I mean, for me, it’s a broadcast experience. I want to be watching the content and enjoying the content like I do YouTube or TV or movies. Do you know what I mean? VR is immersion. I don’t want to have a conversation with a sales dude. Do you know what I mean? It just, it’s not.

 

Tim:

 

Yeah, I get it. And I Think that there’s a lot of a lot of innovation in that space. I don’t really think it’s the right fit for a conference really. I think as you said, the broadcast model, I think works better. In terms of tangential fields, I do think that AR is a really useful tool for virtual events, because even things like mobile based AR where I can point the camera at the table and see your product on my phone, but virtually in the space, whether that’s literally, trying Ikea furniture in my living room or whether that’s this, medical device that I can interact with and in 3D space with my hands as part of a sales call. I can be on a conference call with you, but then looking at this and interacting and maybe the salesperson can manipulate it.

 

Tim:

 

I feel like there’s a future there in terms of particularly at the trade show side. I think that is one of the things that would make a virtual trade show better. I do think, XR is a really rapidly growing space and there’s really only a few manufacturers that are really working in that space but I do think that’s increasingly relevant. It’s obviously super, super relevant in Hollywood. It’s saving Hollywood and even advertising producers, ungodly amounts of money, because they can essentially film, five locations in one day in one set because-

 

Ben:

like the Mandalorian type set.

 

Tim:

 

So, Mandalorian originally it was production and became direct view led at the second generation but that kind of typically powered by Unreal Engine and Disguise media servers. Disguise is really the player in that industry right now and Disguise is incredibly expensive. Mindbogglingly expensive but they, produce a product that works quite well and really specializes in that space and so when your option is film the whole thing on a soundstage, or ship the crew to the desert, it’s cheaper to just buy the box and shoot in the soundstage.

 

Ben:

 

I was actually having a… sorry, I was having a call with one of the Bettercast partners today who was in the process of using unreal engine to build a stage in he’s green screen stage, a very fancy type thing. And he was like, the options. I love the fact that AV people are now getting the opportunity to be creative, not just couple of par cans on a stage cameras left and right and that’s it, there’s a creative input that they giving.

 

Tim:

 

But again, when we’re filming a conference on a volume metric stage, we’re making a movie it’s television again, it’s not, AV it’s broadcast. And so I was talking to somebody recently about this, where they work in theater. And then there are a ballet or dance, and they’re a production manager for a ballet company. And, during the pandemic, basically their boss said, Hey, we’re going to do this event online and they told me like, well, what I said was… They told their boss, we can do that, but you’re asking me to do… We were like, no, it’s online dance. It’s television.

 

Ben:

The delivery method is irrelevant, the production is the same.

 

Tim:

 

Right. And so I think, again, it’s this thing where we’re taking whatever we call it and using a television production model to produce it and I think, as you point out, there’s a lot of room for creativity and a lot of room for really new technical challenges. I mean, these volumetric stages and XR and led backdrops are not technically insignificant. Even as we’re talking about broadcasting from an in-person event and kind of, building in that broadcast that really provides a quality experience. Again, it’s not insignificant in terms of technology.

 

Tim:

 

I would say it’s not necessarily new technology in that case, but it’s still a bigger ask than throwing a camera at the back of the room and hitting play or hitting record. So I think that there’s a challenge for those of us in the technical space to really rise to that opportunity of learning new skills, helping our clients develop really creative and high quality productions, whether that’s at the end still TV, I think there really is an opportunity for those of us in the production space to really provide that level of creativity and storytelling to our clients.

 

Ben:

Couldn’t agree more and that perfect… It’s like, you’ve read my questions before I started talking-

 

Tim:

I did not in fact read the question.

 

Ben:

But as there is this movement from traditional into essentially broadcast conferences. You’ve got a lot of teams that are just starting out just trying to learn, and they’re trying to learn as quickly as possible. Where would you recommend that they start that learning process to be able to broadcast an event? Where do go forums, websites, Udemy.

 

Tim:

I mean, I think, there are a lot of resources online. Honestly, I would start with YouTube, there so many people, I mean, the reality is a lot of the technologies and techniques that we’re using in the events space now are things that game streamers have been using for a few years and it’s really funny, these 15 year olds have a better broadcast than most of us. I’d look on the production world, in any of these events, this 15 year old has a better set, than I do.

 

Tim:

 

I think watching videos from folks who are giving a tutorial, how to use OBS, how do you use vMix how to set up your camera and lighting and I think particularly as we’re talking about virtual events and I know, even as parts of the world start opening up, I know for a lot of folks in a lot of places of the world, things are not going to be open for a while or going to kind of work. I think even in the U.S where we’re mostly open, I think we’re going to have this kind of fluctuation again, I expect things to sort of tighten up again, and events to cancel and things to go online.

 

Tim:

 

And so I think it is still useful if you’re starting out in this space to get to the basics of how to set up a home studio, how do you use these tools, even starting with the free tools OBS and figure out, learn some macros to build on the Stream Deck and just get a feel for the workflow. The biggest thing I would stress is, and no matter whether you learn this via taking classes or watching videos, or just testing things out, really, really the strongest thing that’s important for people to understand is the signal flow. How signals get from here to there, to the web and particularly with things like conferencing. I see a lot of people who are really even very seasoned production technicians who really have trouble understanding how to integrate conference calling into productions.

 

Tim:

 

And I think that’s really a very important and often missing skill set is really just understanding the mix-minus and the routing. And again, I think a lot of this comes down to fundamentals. There are a lot of software tools and you can kind of just pick one and dive into it and learn how to use it but no matter what tool you’re using, whether it’s, Eton Mini, like I have here, whether you’re using vMix, whether you’re using, any number of web based products, StreamYard, I think it’s really important to understand the underlying signal flow and how, why signals got from here to there, because that’s the fundamental knowledge that no matter what the technology is, is going to let you figure out how to use it, because you can learn what buttons to push, but if you don’t understand how the signal gets in and out and through the system, you’re really going to run into headache and heartache.

 

Tim:

 

When you have issues in the show the basic function of like a mix-minus at a conference call has been around since, analog phone lines. Like I said, I’ve been doing kind of conference calling from, get in under a table at the conference call, but I literally have found a picture online of me where I had under the tech table, on the floor because the phone jack was in the floor. So I just stuck in… Anyway stories. I mean, I think, the idea of… the tools will always evolve. The technologies will evolve, but the fundamental understanding how signal flow for remote participation works is, I think going to be ha has never been more important than now, but I think will continue to be important no matter what the tools evolve to be.

 

Ben:

So and to wind this up, we’ve given out of work classical pianists a new career. We’ve given Twitch streamers a new career in video production for corporate events and I think I recently saw Gary Vaynerchuk on a live stream, funnily enough and he was talking about hybrid events or events in the future. It’s not or, it’s not online or online, it’s online and online, offline and online.

 

Tim:

That’s the one.

 

Ben:

And that’s, I think the future of the industry as, as we know it.

 

Tim:

The internet is not going away.

 

Ben:

No, and events aren’t going away.

 

Tim:

 

Events aren’t going away and I think haven’t gone away. And I think, one of the things that’s impressive to me is just how creative and resilient our industry is and how people who have been in this industry for a long time and got beat down in 2008 and then got beat down this last year and have all, so many people have lost their jobs or all the work dried up have really stepped up to the plate in testing and trying new techniques and using technologies and really getting up to speed. And I have been really impressed with the level of comradery and knowledge sharing in our industry and the number of discord groups that I’ve joined and Facebook groups where people are really sharing their knowledge and collaborating and teaching everyone, how to do what they learned.

 

Tim:

 

I really do think that in our industry, there is enough work for everyone and I don’t think that gate keeping knowledge is useful. I think that the way that we build a better industry where we all make more money is by making the industry bigger. I think that, the ways that we grow as an industry, the ways that we adopt, these new technologies, the way that we meet our client’s needs, blow them away with amazing tools, techniques, and creativity, and really tell their stories better than they’ve ever been told before is the way that we make everyone better. That rising tide raises all ships model. That’s really how I view the industry and I don’t think there’s a need necessarily to be competitive. And I think obviously we’re always competing for business between, other folks, but I think at an individual level, at a person to person level, I don’t think that competitiveness is necessarily useful.

 

Tim:

 

I think it’s better to share knowledge, be open with how we build things and really teach everyone to build those skills that we built. And I don’t necessarily think it’s useful to just hand somebody a cheat sheet and say, this is how you do XYZ, but I think it is really important that we share those fundamental skills, the ways that we learn and as people are really making an effort to learn things on their own, that we support them and provide guidance and when somebody calls for help, my view is I answered the phone because, if they’re calling for just a quick question, the next time they might be calling with work.

 

Ben:

Absolutely. I could not agree anymore and on that incredibly positive and hopeful note. Tim, thank you so much for talking to me today on the Better Podcast. I appreciate it.

 

Tim:

Good to be her. Thanks