OpenAI's Latest AI Model - Welcome Sora...π
Show notes
Sora in 4 Minutes: https://www.youtube.com/watch?v=3nyk9LjRTs Max also has a podcast: https://maximilian-schwarzmueller.com/podcast/
Show transcript
00:00:00: So, last week was a very exciting week.
00:00:03: By the way, welcome to the next episode of our podcast, because last week AI fought back
00:00:11: once again, you could say.
00:00:12: Yeah, I'm not sure if it's fighting back because it's basically battling us since the end of
00:00:16: 2022.
00:00:17: But you're right, last year, last week, Sora was released, OpenAI's latest new model, which
00:00:26: now also shows us that they totally destroy AI-generated video.
00:00:34: Because until Sora, it wasn't really a big thing, not too impressive.
00:00:38: But I guess Sora changed it all.
00:00:41: I think it did, yes.
00:00:42: And you already created a YouTube video about that.
00:00:45: It's last Friday, I think.
00:00:48: Yeah, right after it came out.
00:00:49: And then we thought, let's also talk about that a bit, because it affects us personally,
00:00:55: professionally, and also everybody on the planet.
00:00:58: What is your general opinion about this whole AI thing, or maybe specifically about Sora?
00:01:04: So how does this video or this text-to-video aspect now, or this option that now works
00:01:09: quite impressive, actually, change your view on AI, maybe?
00:01:14: Yeah, so I think we're getting less and less things that we can trust.
00:01:20: So to start with the downside right away, but I think that's the negative impact.
00:01:24: I immediately thought of when I saw that.
00:01:27: But at the same time, I also thought, well, that's really impressive.
00:01:31: If we look at it from a positive angle, it gives us a lot of new options.
00:01:37: For example, to kind of stick to an example from our bubble, it makes it easier to generate
00:01:43: B-roll or it essentially replaces or could replace the need for stock footage.
00:01:50: If you want to have some short clip in a video, some background video, that's something
00:01:56: Sora seems to be able to do just fine, where you no longer need to reach out to expensive
00:02:04: and maybe not optimal stock footage.
00:02:07: So that's the positive side, I guess.
00:02:09: But at the same time, as I said, I also thought, wow, that opens up a lot of potentially bad
00:02:16: use cases.
00:02:18: And I'm rather on the negative side, I guess, to be honest, because for me, the development
00:02:25: is too fast.
00:02:26: It's really fast, yeah.
00:02:27: Because when I think about ChatGPT, which we heard about at the end of 2022, we were
00:02:33: quite late.
00:02:33: We were, but yeah, that was our bad.
00:02:37: And now we are a little bit more than one year later, and now we can create videos,
00:02:41: one minute videos only, but still we or OpenAI can create these videos based on text prompts.
00:02:48: That's scary, in my opinion.
00:02:50: And yes, the trust side is one very important one that you mentioned.
00:02:54: But maybe I'm too old again, but I think about the work of other people, how it devalues
00:03:01: the work that people did.
00:03:03: Because if you needed B-roll, as you said, somebody had to record it, had to film it,
00:03:07: had to fly there, had to go there, had to plan it, had to turn the ideas he or she had
00:03:15: into reality by creating that scene.
00:03:17: And of course, as you said, now this can be done faster and in minutes, maybe.
00:03:21: We don't know how this evolves.
00:03:23: Maybe we can use it on our own in six months or something like that.
00:03:27: But this whole creativity that humans have and that helped humans to produce these amazing
00:03:33: videos, for example, now turns into simple computer-generated footage.
00:03:39: So I personally don't know how I should think about it at the moment.
00:03:44: Basically, it's more negative, impressive, but negative.
00:03:48: That's one of my points here to get started.
00:03:52: Still, I think this trust issue you mentioned is also a very big one because even now you
00:03:58: don't know what to believe in the Internet, if it's written, if it's images, and now you
00:04:03: can't trust videos anymore.
00:04:05: So basically, everything you see on the Internet has to be double-checked and you have to find
00:04:10: a way to double-check to make sure it really is the truth.
00:04:13: And I'm not sure how we or all these people using the Internet daily can find a way to
00:04:20: implement this double-check feature or whatever you want to call it.
00:04:25: Yeah, lots of good points.
00:04:27: I totally agree on the argument that it is a problem for all those people who created
00:04:36: the stock footage, for example.
00:04:39: And of course, stock footage is just one thing that could be replaced by SORA or by
00:04:45: text-to-video models like SORA.
00:04:48: It's absolutely valid.
00:04:50: I do think that at the same time β and I might be wrong here β but it looks like
00:04:55: these text-to-video models might not be able to replace Hollywood or movie studios in general
00:05:07: because, of course, it's one thing to replace some background footage, some clip where the
00:05:14: exact content might not be too important.
00:05:17: I mean, if you want like a dancing bunny, which was one of the examples OpenAI showed
00:05:22: on their website, a video generated by SORA where we can see a dancing bunny.
00:05:26: If you need something like that and you want to have it as B-roll in your video, you might
00:05:31: not care too much about the exact details.
00:05:33: It might be fine if you have a prompt where you roughly describe the scene and that you
00:05:37: want a dancing bunny and some neon colors or something like this, but the details aren't
00:05:42: too important.
00:05:43: But of course, that would change if we would talk about a real movie.
00:05:46: That's, of course, all the way longer than a minute, so where multiple clips would have
00:05:51: to be coherent and fit together.
00:05:53: And where we also have dialogues, which are not a thing with SORA at all, as it seems.
00:05:59: So there we only had videos without people talking or without lip synchronization and
00:06:06: all these things.
00:06:07: So I think it's not the entire video industry that's in danger here, but it looks like some
00:06:16: selected examples, as it almost always is with AI, I guess, for whom this really could
00:06:23: be a big problem.
00:06:25: I got two points here.
00:06:27: The first one is that, of course, things always changed.
00:06:32: And as you said, maybe certain jobs or certain tasks are replaced.
00:06:37: This happened all the time in the history, I guess.
00:06:40: So if we talk about industrialization, where people probably thought, oh my God, I won't
00:06:45: have a job anymore.
00:06:46: And it turns out there is more work than ever that has to be done.
00:06:49: But the other point is that it's the speed.
00:06:54: You mentioned it already, but you just said they can't do dialogues at the moment.
00:06:59: They can't replace Hollywood movies.
00:07:01: Hollywood movies are maybe also something different because you want to have the actors
00:07:04: and so on.
00:07:05: But I guess nobody knows where we are next year at the same point in time.
00:07:10: So next February.
00:07:12: And that's the little bit maybe scary thing that I see, that at the moment, I would say
00:07:16: yes, this has some bad implications, but can also be good.
00:07:20: But if it evolves that rapidly within the next 12 months, then we might talk about dialogues.
00:07:27: We might talk about 30-minute movies maybe.
00:07:30: And then things get really crazy, I guess.
00:07:32: Still I don't think that being scared or something like that helps in any way because
00:07:37: things will be as they are.
00:07:40: But yeah, I'm quite skeptical, as you see.
00:07:42: We also don't know how far they are already.
00:07:45: I'm not sure if OpenAI, for example, shows us all they can do because Sora just came
00:07:51: out of nothing, basically.
00:07:53: So maybe six months from now, the next thing comes out of nowhere.
00:07:57: So yeah, I don't know.
00:07:59: Yeah, it's of course possible, absolutely.
00:08:01: We don't know what we'll see in a year from now and what we'll see in two years and what
00:08:07: AI might be able to do then.
00:08:09: I totally agree.
00:08:10: I also agree that maybe if we're talking about big movies, it's a different thing there.
00:08:15: I would imagine it's also kind of hard to describe the complexity of a big movie in
00:08:23: a prompt, even if it could be a super long prompt, even if it could be an entire book.
00:08:28: I'm not sure.
00:08:29: But as I said, we don't know how things will turn out.
00:08:35: Of course, it's also worth noting that what we see with Sora, the videos we see, they
00:08:41: are not perfect, of course.
00:08:43: They look pretty perfect.
00:08:44: I have to say on first sight and even maybe the second time you look at one of those videos,
00:08:52: they look absolutely fine.
00:08:53: I have to say they do.
00:08:55: Of course, there are all these people who like tell you, yeah, it's clearly AI generated,
00:08:59: but is it really that clear if you just take a brief look at it?
00:09:05: It's sort of interrupting, but especially people who say that really know that this
00:09:10: is AI generated.
00:09:12: Exactly.
00:09:13: That's exactly the point I also wanted to make.
00:09:16: You can spot subtle errors like the hands of people, some movements which are a bit
00:09:23: strange.
00:09:24: Yeah, you can spot them because you know what you're looking at is AI generated and you're
00:09:30: looking for those hints.
00:09:32: But fast forward a year from now, let's say Sora or similar models are available to everyone
00:09:40: or to the majority or in five years, maybe we have models we can run on our own machines
00:09:45: or in our own data centers and we can achieve similar results.
00:09:49: And now let's think we have some bad actors.
00:09:52: We have some people who want to fake certain things, maybe with celebrities, maybe fake
00:09:59: some war videos, anything like that.
00:10:03: And you see that and you might not be prepared for the fact that it could be AI generated.
00:10:09: Of course, you know that there is a danger of that being the case, but we're coming from
00:10:14: a world where we take video for the truth, where we know if it's a video, it's very,
00:10:23: very likely not faked.
00:10:26: It might be staged or something like this, but it was recorded like that.
00:10:31: That is a relatively fair assumption.
00:10:33: It would take a lot of effort to really fake a video.
00:10:37: It's possible without AI, but it's basically a lot of work.
00:10:42: It's not easily done.
00:10:44: And we're coming from that world and adjusting to a different world where every video could
00:10:49: be fake and could be AI generated.
00:10:52: That'll be tricky.
00:10:54: There also is another side here because we can, of course, and we will probably have
00:11:00: the problem where we have to anticipate that videos might be AI generated and might not
00:11:08: showing β might not show us the truth.
00:11:11: But at the same time, it could also mean that we don't trust any video sources anymore.
00:11:18: And of course, that means that if we have like some media outlet, which we normally
00:11:23: would trust, let's say some big β let's say the New York Times or whatever.
00:11:29: And of course, there are reasons why you could mistrust them as well.
00:11:32: But let's say there are some media outlets whom we generally would trust.
00:11:37: And now they're showing a video of some, let's say, war crimes being committed by
00:11:42: whatever, anywhere in the world.
00:11:45: There will or there can be people in the future who say, yeah, that's fake.
00:11:50: That's not the reality.
00:11:52: So we have those two sides.
00:11:53: We have videos where we might see something horrible or might see some politicians say
00:11:59: something horrible and it's faked.
00:12:02: But it's hard to prove that it's fake.
00:12:04: But we also have the other side where we see something that happened but we have people
00:12:08: who can say, no, that didn't happen.
00:12:09: That is fake.
00:12:11: There might not really be a way of proving them wrong or it might be very difficult at
00:12:16: least because we're living in a world where video can't be trusted anymore.
00:12:20: And that's another big problem I see.
00:12:23: And that's really a problem because this means now we have another medium that people
00:12:28: can use to influence people regarding their agenda.
00:12:33: You know what I mean?
00:12:33: So if you think about the elections this year in the US, for example, I read that open AI
00:12:37: will do all they can to prevent any abuse of that technology.
00:12:41: And it's not publicly available yet.
00:12:43: So it might not matter for the elections this year, actually.
00:12:46: But considering this rapid development in February, who knows what happens in July,
00:12:51: for example.
00:12:51: Yeah, absolutely.
00:12:53: Who knows for which persons have this technology or it doesn't have to be open AI technology.
00:12:57: Maybe other similar technologies evolve in the next months.
00:13:02: So taking this into a global political context, it becomes really scary, I must say.
00:13:08: Absolutely.
00:13:08: Because your point is really valid.
00:13:09: No matter if you have a real source, a fake source, everybody can influence or tell people,
00:13:16: no, don't trust this.
00:13:17: I know the truth.
00:13:18: This video has the truth, not this one.
00:13:20: But which one is fake in the end?
00:13:22: And this also brings me to my next point, actually.
00:13:25: How does or how do we as humans, as tech people, maybe, but also the ordinary person, deal
00:13:32: with that?
00:13:33: Because if I think about some friends of mine who are not in the tech industry, they are
00:13:38: not aware of all these things.
00:13:39: We read this on X, we see it on YouTube, whatever.
00:13:43: But many people have the normal jobs, the normal lives.
00:13:45: They have a mobile phone for WhatsApp, whatever.
00:13:48: But they don't read all these tech news.
00:13:50: They don't use AI at all in their daily lives.
00:13:53: So how should they be aware of the fact that the video they just watched might be fake?
00:13:58: I would guess that probably some techniques evolved that can help with identifying AI-generated
00:14:07: videos.
00:14:08: But of course, as AI gets better, that might not really work.
00:14:13: I guess AI will always be in the advantage here.
00:14:18: It's like the old problem.
00:14:19: The person who wants to validate something always has to react.
00:14:23: If you want to protect against something, you have to react.
00:14:26: And it's the average part, it's AI in this case, who's the actor, who's advancing and
00:14:32: who's innovating.
00:14:32: And therefore, it will really be hard.
00:14:36: And I don't know what the future will be there.
00:14:40: Maybe there are some technical hurdles which can't be overcome in certain areas.
00:14:47: So that creating super long videos is maybe still a long road to go before we get there.
00:14:54: So maybe that's the case.
00:14:56: But I wouldn't rely on that.
00:14:57: Maybe it becomes even more important who's spreading a video or who's publishing a video.
00:15:07: Maybe that β persons as trust sources or publishers as trust sources, maybe that becomes
00:15:15: more important, which of course means that people have to trust them in the first place.
00:15:19: But maybe that is one of the ways of establishing trust that you know, okay, if it's that outlet,
00:15:27: if it's that person that publishes a video, I can trust that person or that outlet.
00:15:33: But obviously, that's β it's a question and it's not a super β it's not the perfect
00:15:40: solution, I'd say.
00:15:41: And it also kind of is contrary to the internet ideas, right, that everybody can share his opinion.
00:15:49: And of course, if somebody has an opinion and wants to prove or make a point, then he
00:15:53: has to validate that point.
00:15:56: But this became more difficult now with all this possible fake content that we have.
00:16:02: But still, we have to deal with that technology, I guess.
00:16:06: If you just deny it and say, I won't be a big thing, maybe we are wrong.
00:16:10: But at the moment, it looks like AI won't be gone in a year.
00:16:14: It will be here and it will get bigger.
00:16:17: So we have to adapt, right?
00:16:19: And we have to find ways to adapt to this, as you said, by thinking about the sources
00:16:24: we trust, also being even more suspicious, maybe, if somebody tells you, hey, this is
00:16:29: the only truth and this is fake, so you have to do your homework, so to say.
00:16:33: But blind trust was never a good idea in the past.
00:16:38: But it's an even worse idea in the future, I guess.
00:16:40: Absolutely.
00:16:40: So what this means in the end, AI theoretically makes our lives easier, as you say, because
00:16:46: we can create B-roll easily, we can summarize text with chat GPT, for example.
00:16:51: But in the end, it becomes more complex in other parts of our lives.
00:16:56: So once again, technology solves one problem but creates others.
00:17:01: So it's kind of funny, actually.
00:17:03: I guess it's always been like this, probably, but yeah, I totally agree.
00:17:09: And I'm not sure if it's a good or a bad thing that we have SORA and some other technologies.
00:17:16: Me neither.
00:17:16: So these are our thoughts after this SORA release last week, which also led to lots
00:17:23: of discussions between us two.
00:17:25: We would be happy to receive your feedback.
00:17:28: What do you think about AI, specifically about SORA in this case?
00:17:32: How this impacts your lives?
00:17:34: Maybe it doesn't have an impact on your life at all.
00:17:37: Maybe it's something you say, it's here, but I don't care.
00:17:40: And with that, we hope to see you in the next podcast episode.
00:17:44: Absolutely.
00:17:45: Bye.
New comment