OpenAI's Latest AI Model - Welcome Sora...👋

Show notes

Sora in 4 Minutes: https://www.youtube.com/watch?v=3nyk9LjRTs Max also has a podcast: https://maximilian-schwarzmueller.com/podcast/

Show transcript

00:00:00: So, last week was a very exciting week.

00:00:03: By the way, welcome to the next episode of our podcast, because last week AI fought back

00:00:11: once again, you could say.

00:00:12: Yeah, I'm not sure if it's fighting back because it's basically battling us since the end of

00:00:16: 2022.

00:00:17: But you're right, last year, last week, Sora was released, OpenAI's latest new model, which

00:00:26: now also shows us that they totally destroy AI-generated video.

00:00:34: Because until Sora, it wasn't really a big thing, not too impressive.

00:00:38: But I guess Sora changed it all.

00:00:41: I think it did, yes.

00:00:42: And you already created a YouTube video about that.

00:00:45: It's last Friday, I think.

00:00:48: Yeah, right after it came out.

00:00:49: And then we thought, let's also talk about that a bit, because it affects us personally,

00:00:55: professionally, and also everybody on the planet.

00:00:58: What is your general opinion about this whole AI thing, or maybe specifically about Sora?

00:01:04: So how does this video or this text-to-video aspect now, or this option that now works

00:01:09: quite impressive, actually, change your view on AI, maybe?

00:01:14: Yeah, so I think we're getting less and less things that we can trust.

00:01:20: So to start with the downside right away, but I think that's the negative impact.

00:01:24: I immediately thought of when I saw that.

00:01:27: But at the same time, I also thought, well, that's really impressive.

00:01:31: If we look at it from a positive angle, it gives us a lot of new options.

00:01:37: For example, to kind of stick to an example from our bubble, it makes it easier to generate

00:01:43: B-roll or it essentially replaces or could replace the need for stock footage.

00:01:50: If you want to have some short clip in a video, some background video, that's something

00:01:56: Sora seems to be able to do just fine, where you no longer need to reach out to expensive

00:02:04: and maybe not optimal stock footage.

00:02:07: So that's the positive side, I guess.

00:02:09: But at the same time, as I said, I also thought, wow, that opens up a lot of potentially bad

00:02:16: use cases.

00:02:18: And I'm rather on the negative side, I guess, to be honest, because for me, the development

00:02:25: is too fast.

00:02:26: It's really fast, yeah.

00:02:27: Because when I think about ChatGPT, which we heard about at the end of 2022, we were

00:02:33: quite late.

00:02:33: We were, but yeah, that was our bad.

00:02:37: And now we are a little bit more than one year later, and now we can create videos,

00:02:41: one minute videos only, but still we or OpenAI can create these videos based on text prompts.

00:02:48: That's scary, in my opinion.

00:02:50: And yes, the trust side is one very important one that you mentioned.

00:02:54: But maybe I'm too old again, but I think about the work of other people, how it devalues

00:03:01: the work that people did.

00:03:03: Because if you needed B-roll, as you said, somebody had to record it, had to film it,

00:03:07: had to fly there, had to go there, had to plan it, had to turn the ideas he or she had

00:03:15: into reality by creating that scene.

00:03:17: And of course, as you said, now this can be done faster and in minutes, maybe.

00:03:21: We don't know how this evolves.

00:03:23: Maybe we can use it on our own in six months or something like that.

00:03:27: But this whole creativity that humans have and that helped humans to produce these amazing

00:03:33: videos, for example, now turns into simple computer-generated footage.

00:03:39: So I personally don't know how I should think about it at the moment.

00:03:44: Basically, it's more negative, impressive, but negative.

00:03:48: That's one of my points here to get started.

00:03:52: Still, I think this trust issue you mentioned is also a very big one because even now you

00:03:58: don't know what to believe in the Internet, if it's written, if it's images, and now you

00:04:03: can't trust videos anymore.

00:04:05: So basically, everything you see on the Internet has to be double-checked and you have to find

00:04:10: a way to double-check to make sure it really is the truth.

00:04:13: And I'm not sure how we or all these people using the Internet daily can find a way to

00:04:20: implement this double-check feature or whatever you want to call it.

00:04:25: Yeah, lots of good points.

00:04:27: I totally agree on the argument that it is a problem for all those people who created

00:04:36: the stock footage, for example.

00:04:39: And of course, stock footage is just one thing that could be replaced by SORA or by

00:04:45: text-to-video models like SORA.

00:04:48: It's absolutely valid.

00:04:50: I do think that at the same time – and I might be wrong here – but it looks like

00:04:55: these text-to-video models might not be able to replace Hollywood or movie studios in general

00:05:07: because, of course, it's one thing to replace some background footage, some clip where the

00:05:14: exact content might not be too important.

00:05:17: I mean, if you want like a dancing bunny, which was one of the examples OpenAI showed

00:05:22: on their website, a video generated by SORA where we can see a dancing bunny.

00:05:26: If you need something like that and you want to have it as B-roll in your video, you might

00:05:31: not care too much about the exact details.

00:05:33: It might be fine if you have a prompt where you roughly describe the scene and that you

00:05:37: want a dancing bunny and some neon colors or something like this, but the details aren't

00:05:42: too important.

00:05:43: But of course, that would change if we would talk about a real movie.

00:05:46: That's, of course, all the way longer than a minute, so where multiple clips would have

00:05:51: to be coherent and fit together.

00:05:53: And where we also have dialogues, which are not a thing with SORA at all, as it seems.

00:05:59: So there we only had videos without people talking or without lip synchronization and

00:06:06: all these things.

00:06:07: So I think it's not the entire video industry that's in danger here, but it looks like some

00:06:16: selected examples, as it almost always is with AI, I guess, for whom this really could

00:06:23: be a big problem.

00:06:25: I got two points here.

00:06:27: The first one is that, of course, things always changed.

00:06:32: And as you said, maybe certain jobs or certain tasks are replaced.

00:06:37: This happened all the time in the history, I guess.

00:06:40: So if we talk about industrialization, where people probably thought, oh my God, I won't

00:06:45: have a job anymore.

00:06:46: And it turns out there is more work than ever that has to be done.

00:06:49: But the other point is that it's the speed.

00:06:54: You mentioned it already, but you just said they can't do dialogues at the moment.

00:06:59: They can't replace Hollywood movies.

00:07:01: Hollywood movies are maybe also something different because you want to have the actors

00:07:04: and so on.

00:07:05: But I guess nobody knows where we are next year at the same point in time.

00:07:10: So next February.

00:07:12: And that's the little bit maybe scary thing that I see, that at the moment, I would say

00:07:16: yes, this has some bad implications, but can also be good.

00:07:20: But if it evolves that rapidly within the next 12 months, then we might talk about dialogues.

00:07:27: We might talk about 30-minute movies maybe.

00:07:30: And then things get really crazy, I guess.

00:07:32: Still I don't think that being scared or something like that helps in any way because

00:07:37: things will be as they are.

00:07:40: But yeah, I'm quite skeptical, as you see.

00:07:42: We also don't know how far they are already.

00:07:45: I'm not sure if OpenAI, for example, shows us all they can do because Sora just came

00:07:51: out of nothing, basically.

00:07:53: So maybe six months from now, the next thing comes out of nowhere.

00:07:57: So yeah, I don't know.

00:07:59: Yeah, it's of course possible, absolutely.

00:08:01: We don't know what we'll see in a year from now and what we'll see in two years and what

00:08:07: AI might be able to do then.

00:08:09: I totally agree.

00:08:10: I also agree that maybe if we're talking about big movies, it's a different thing there.

00:08:15: I would imagine it's also kind of hard to describe the complexity of a big movie in

00:08:23: a prompt, even if it could be a super long prompt, even if it could be an entire book.

00:08:28: I'm not sure.

00:08:29: But as I said, we don't know how things will turn out.

00:08:35: Of course, it's also worth noting that what we see with Sora, the videos we see, they

00:08:41: are not perfect, of course.

00:08:43: They look pretty perfect.

00:08:44: I have to say on first sight and even maybe the second time you look at one of those videos,

00:08:52: they look absolutely fine.

00:08:53: I have to say they do.

00:08:55: Of course, there are all these people who like tell you, yeah, it's clearly AI generated,

00:08:59: but is it really that clear if you just take a brief look at it?

00:09:05: It's sort of interrupting, but especially people who say that really know that this

00:09:10: is AI generated.

00:09:12: Exactly.

00:09:13: That's exactly the point I also wanted to make.

00:09:16: You can spot subtle errors like the hands of people, some movements which are a bit

00:09:23: strange.

00:09:24: Yeah, you can spot them because you know what you're looking at is AI generated and you're

00:09:30: looking for those hints.

00:09:32: But fast forward a year from now, let's say Sora or similar models are available to everyone

00:09:40: or to the majority or in five years, maybe we have models we can run on our own machines

00:09:45: or in our own data centers and we can achieve similar results.

00:09:49: And now let's think we have some bad actors.

00:09:52: We have some people who want to fake certain things, maybe with celebrities, maybe fake

00:09:59: some war videos, anything like that.

00:10:03: And you see that and you might not be prepared for the fact that it could be AI generated.

00:10:09: Of course, you know that there is a danger of that being the case, but we're coming from

00:10:14: a world where we take video for the truth, where we know if it's a video, it's very,

00:10:23: very likely not faked.

00:10:26: It might be staged or something like this, but it was recorded like that.

00:10:31: That is a relatively fair assumption.

00:10:33: It would take a lot of effort to really fake a video.

00:10:37: It's possible without AI, but it's basically a lot of work.

00:10:42: It's not easily done.

00:10:44: And we're coming from that world and adjusting to a different world where every video could

00:10:49: be fake and could be AI generated.

00:10:52: That'll be tricky.

00:10:54: There also is another side here because we can, of course, and we will probably have

00:11:00: the problem where we have to anticipate that videos might be AI generated and might not

00:11:08: showing – might not show us the truth.

00:11:11: But at the same time, it could also mean that we don't trust any video sources anymore.

00:11:18: And of course, that means that if we have like some media outlet, which we normally

00:11:23: would trust, let's say some big – let's say the New York Times or whatever.

00:11:29: And of course, there are reasons why you could mistrust them as well.

00:11:32: But let's say there are some media outlets whom we generally would trust.

00:11:37: And now they're showing a video of some, let's say, war crimes being committed by

00:11:42: whatever, anywhere in the world.

00:11:45: There will or there can be people in the future who say, yeah, that's fake.

00:11:50: That's not the reality.

00:11:52: So we have those two sides.

00:11:53: We have videos where we might see something horrible or might see some politicians say

00:11:59: something horrible and it's faked.

00:12:02: But it's hard to prove that it's fake.

00:12:04: But we also have the other side where we see something that happened but we have people

00:12:08: who can say, no, that didn't happen.

00:12:09: That is fake.

00:12:11: There might not really be a way of proving them wrong or it might be very difficult at

00:12:16: least because we're living in a world where video can't be trusted anymore.

00:12:20: And that's another big problem I see.

00:12:23: And that's really a problem because this means now we have another medium that people

00:12:28: can use to influence people regarding their agenda.

00:12:33: You know what I mean?

00:12:33: So if you think about the elections this year in the US, for example, I read that open AI

00:12:37: will do all they can to prevent any abuse of that technology.

00:12:41: And it's not publicly available yet.

00:12:43: So it might not matter for the elections this year, actually.

00:12:46: But considering this rapid development in February, who knows what happens in July,

00:12:51: for example.

00:12:51: Yeah, absolutely.

00:12:53: Who knows for which persons have this technology or it doesn't have to be open AI technology.

00:12:57: Maybe other similar technologies evolve in the next months.

00:13:02: So taking this into a global political context, it becomes really scary, I must say.

00:13:08: Absolutely.

00:13:08: Because your point is really valid.

00:13:09: No matter if you have a real source, a fake source, everybody can influence or tell people,

00:13:16: no, don't trust this.

00:13:17: I know the truth.

00:13:18: This video has the truth, not this one.

00:13:20: But which one is fake in the end?

00:13:22: And this also brings me to my next point, actually.

00:13:25: How does or how do we as humans, as tech people, maybe, but also the ordinary person, deal

00:13:32: with that?

00:13:33: Because if I think about some friends of mine who are not in the tech industry, they are

00:13:38: not aware of all these things.

00:13:39: We read this on X, we see it on YouTube, whatever.

00:13:43: But many people have the normal jobs, the normal lives.

00:13:45: They have a mobile phone for WhatsApp, whatever.

00:13:48: But they don't read all these tech news.

00:13:50: They don't use AI at all in their daily lives.

00:13:53: So how should they be aware of the fact that the video they just watched might be fake?

00:13:58: I would guess that probably some techniques evolved that can help with identifying AI-generated

00:14:07: videos.

00:14:08: But of course, as AI gets better, that might not really work.

00:14:13: I guess AI will always be in the advantage here.

00:14:18: It's like the old problem.

00:14:19: The person who wants to validate something always has to react.

00:14:23: If you want to protect against something, you have to react.

00:14:26: And it's the average part, it's AI in this case, who's the actor, who's advancing and

00:14:32: who's innovating.

00:14:32: And therefore, it will really be hard.

00:14:36: And I don't know what the future will be there.

00:14:40: Maybe there are some technical hurdles which can't be overcome in certain areas.

00:14:47: So that creating super long videos is maybe still a long road to go before we get there.

00:14:54: So maybe that's the case.

00:14:56: But I wouldn't rely on that.

00:14:57: Maybe it becomes even more important who's spreading a video or who's publishing a video.

00:15:07: Maybe that – persons as trust sources or publishers as trust sources, maybe that becomes

00:15:15: more important, which of course means that people have to trust them in the first place.

00:15:19: But maybe that is one of the ways of establishing trust that you know, okay, if it's that outlet,

00:15:27: if it's that person that publishes a video, I can trust that person or that outlet.

00:15:33: But obviously, that's – it's a question and it's not a super – it's not the perfect

00:15:40: solution, I'd say.

00:15:41: And it also kind of is contrary to the internet ideas, right, that everybody can share his opinion.

00:15:49: And of course, if somebody has an opinion and wants to prove or make a point, then he

00:15:53: has to validate that point.

00:15:56: But this became more difficult now with all this possible fake content that we have.

00:16:02: But still, we have to deal with that technology, I guess.

00:16:06: If you just deny it and say, I won't be a big thing, maybe we are wrong.

00:16:10: But at the moment, it looks like AI won't be gone in a year.

00:16:14: It will be here and it will get bigger.

00:16:17: So we have to adapt, right?

00:16:19: And we have to find ways to adapt to this, as you said, by thinking about the sources

00:16:24: we trust, also being even more suspicious, maybe, if somebody tells you, hey, this is

00:16:29: the only truth and this is fake, so you have to do your homework, so to say.

00:16:33: But blind trust was never a good idea in the past.

00:16:38: But it's an even worse idea in the future, I guess.

00:16:40: Absolutely.

00:16:40: So what this means in the end, AI theoretically makes our lives easier, as you say, because

00:16:46: we can create B-roll easily, we can summarize text with chat GPT, for example.

00:16:51: But in the end, it becomes more complex in other parts of our lives.

00:16:56: So once again, technology solves one problem but creates others.

00:17:01: So it's kind of funny, actually.

00:17:03: I guess it's always been like this, probably, but yeah, I totally agree.

00:17:09: And I'm not sure if it's a good or a bad thing that we have SORA and some other technologies.

00:17:16: Me neither.

00:17:16: So these are our thoughts after this SORA release last week, which also led to lots

00:17:23: of discussions between us two.

00:17:25: We would be happy to receive your feedback.

00:17:28: What do you think about AI, specifically about SORA in this case?

00:17:32: How this impacts your lives?

00:17:34: Maybe it doesn't have an impact on your life at all.

00:17:37: Maybe it's something you say, it's here, but I don't care.

00:17:40: And with that, we hope to see you in the next podcast episode.

00:17:44: Absolutely.

00:17:45: Bye.

Show notes

Show transcript

New comment