New AI: Photos Go In, Reality Comes Out! 🌁
Embed
- Published on Nov 15, 2021
- ❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com/papers
📝 The paper "ADOP: Approximate Differentiable One-Pixel Point Rendering" is available here:
arxiv.org/abs/2110.06635
❤️ Watch these videos in early access on our Patreon page or join us here on TheXvid:
- www.patreon.com/TwoMinutePapers
- thexvid.com/channel/UCbfY...
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Mark Oates, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: www.patreon.com/TwoMinutePapers
Thumbnail background design: Felícia Zsolnai-Fehér - felicia.hu
Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: discordapp.com/invite/hbcTJu2
Károly Zsolnai-Fehér's links:
Instagram: twominutepa...
Twitter: twominutepapers
Web: cg.tuwien.ac.at/~zsolnai/ Science & Technology
This is literally the magic that I thought computers could do when I was a kid. Now its a reality.
Thanks to you for presenting this to the world
I was writing exactly the same thing you wrote. So I make of your words my words. Amazing!
Technically the computers from when you were a kid could do this, it would just take years and you would have to manually input everything it requires to function.
Me too. There's a Star Trek: TNG episode where Geordie is on the holodeck and he asks the computer to extrapolate features of a very blurred figure based on other metrics. I knew one day we get that, but this is even more impressive than that!
@Lynn Harrod good, he is right
"Don't expect too much"
Ai: *makes a perfect video*
Honestly I expected a bunch of shifting and warping
Lol
I can really foresee this being refined and packaged for mobile use, when computing power gets a bit better. Samsung or Apple would pay a LOT of money for this feature if a phone could run it.
You take a few photos on holiday, and this app recreates the scene. like a movie panorama, wow.
And hook up to VR headset and it would be perfect
Games where you can feed pictures at runtime at it can make the environment as needed...
@Hemant K thats possible now
it will be even more useful for vr and ar
If this ever comes to a mobile, I guess it'll be an application that takes in the photos, runs the process on the backend servers, and produces the rendered video to the user.
I've been very proud of being able to hold onto my papers until now, but you have successfully made me drop them. I was expecting it to be good, because otherwise it wouldn't be here, but I was not expecting it to be *that* good. Stunning work.
Looking forward to visiting all places from my old photo album in 360° 3D VR soon...
I'd prefer some nice classic movie scenes
@Kliman Khmeron he means in the future
Unless you took 35 pictures of the same location
@Leonard Choo nightmare!
Now it's time to apologize to those friends or relatives who used to take a lot of pictures from a specific place, person or building, when we thought only one was enough. They were collecting training data for today use.
There are just so many applications of AI in the CG World. Looking forward to creative AI that are capable of learning and building 3d worlds including cities, economies, etc.
@zdenek burian cool
It will be much cheaper to make movies and we will actually have Hollywood level effects for low budget sitcoms
imagine if these AI agents could search inside online storages of every kind of data, photos, books, etc to reach the knowledge to rebuild the parts lacking since you provided only a limited set of informations; they could ie rebuild an entire walkthrough of a city as it was centuries ago learning from historical researches
The problem is as more automatic as harder to control the output and use for professional work
Instead of rendering a whole CGI scene, render certain frames from story boards, run this AI to get a preview of the whole scene before you render it fully. Added bonus, you can probably use your final renders as training data to refine the network for next time
It's gonna be pretty cool when Google Earth implements this for full VR
I do research in this field and my research team discussed this paper a week ago.....its nice to see Karoly pick these great topics and share them with the general public. Love the work you are doing, Karoly. One day I'll work hard enough to get my paper shown on this channel - idc if it makes to SIGGRAPH Asia, if it makes to Two Minute Papers then its game won :)
This will dramatically change and simplify certain workflows in the film industry. Imagine taking just a few high quality photos and getting this as an output. Unreal.
Also: Does this mean that we will be soon able to recreate full 3D environments from just a few pictures?
I’d guess that using the output from this one as input for a 3D scanning application would be possible with some tweaking.
Source: none, i have no idea what i’m talking about
Okay I'm definitely going to use this some day. This is awesome, I was really close to dropping that paper
2:10 It couldn't make an interior but it managed to make the background, this AI is pure wizardry
IKR :D! How the hell did it figure out how to put the fence right there? That's so amazingly clever!
I've been waiting for years for Ai to enter the photogrammetry field. Both this filling out areas part - but I hope also AI to remove shadows and detect reflections will appear. It will be very interesting to see this implemented in the future in metashape or realityCapture type software.
I'd love to see this used in tandem with photoscanning. Usually to do it well you need a bunch of images, but if this can fairly accurately interpolate most of them, that could save a lot of time!
It's amazing that it really picks up on the surface/material qualities. Metal looks and reflects like metal, plenty of specular reflections and highlights on most surfaces where it should be. Would much rather have a bit of noise than those details be missing.
OK, so this has been assigned a fairly mechanical task, but the fact that it far exceeds my expectations does give me pause for thought. Watching this makes me realise that I really have no idea what AI will be capable of in the coming years.
Yea this one broke me lol
Nobody should invest time in a steep learning career in 2d graphics, 3d or video until things clear
It could be used to improve current photogrammetry programs ! Since it outputs a video, which is basically a group of pictures... it could be used to create even more photo references for the programs.
This is an AI that's able to understand and render a scene based on observations. It doesn't really help with the photogrammetry process, it in fact needs it to function in the first place.
NeRF (Neural Radience Fields) are a far better candidate if you're concerned with actual 3D reconstruction as it generates a viewable volume of your subject, which contains depth information. However, a NeRF mesh reconstruction loses the data of view-dependant changes (Transparency, Reflections, Lighting), which makes it a lot less useful. We're missing a way to convert that data into usable PBR values for meshes. That's where the real potential lies if you ask me.
I am preparing to go shoot a somewhat complex site for photogrammetry next week, so I was wondering something similar. The idea is to give the photogrammetry system as much information as possible in order to produce as accurate a point cloud/mesh as possible. But the original set of photos only has a limited amount of information in them to build the spatial data from. One possibility is that using the output of this system to feed into a photogrammetry system would result in a more dense point cloud, but not necessarily more accurate. The other is that the existing standard photogrammetry systems are not extracting as much information as is possible from the initial data set. I hope someone with more knowledge of this puts this type of system together with photogrammetry and figures out wether we can get better end results for the point cloud from initial photo sets.
It is absolutely incredible how it manages to even recreate the tree density from different perspectives without washing out the details.
And sure, some particular scenes where there is nothing to draw detail from, it won't even attempt to do so, but I wouldn't even complain if this recreation was done by hand.
Truly astounding. Can't wait to see how this could later be also applied to upscaling to refine image quality.
I remember watching a doraemon episode where a gadget creates a full recreation of a scenery from a part of it in an image, which they use to find out who was walking a particular dog or something. to think that something similar exists in reality is mind boggling!
For those wondering, the "consumer GPU" seems to be an RTX 2080 Ti rendering 1080p frames
This is amazing ...we can now stockpile images of places/buildings that might not be here in another 10/20/30 years and then make a 3d representation of them in VR for future generations to see.
@J M so if it was built from point cloud, then it's just photogrammetry then >.>
well, but at least they need much less input.
And I also quickly forgotten that it emphasize on the in between. maybe it's just pseudo 3d.
at least that's what it seems like from the paper. It needs point cloud as input
@The Potato It is real 3d because it builds those images from the point cloud
@J M No, you're talking about photogrammetry. this isnt that. this one is just pixels put together by AI to produce a 3D looking video. not real 3d.
@Michael Leue It's a point cloud rendered as a video. It is 3d
This is so good. Can we directly export as 3D models or use photogrammetry on the new footage, maybe this can be used to enhance photogrammetry software? How are flight paths determined?
This is an AI that's able to understand and render a scene based on observations. It doesn't really help with the photogrammetry process, it in fact needs it to function in the first place.
NeRF (Neural Radience Fields) are a far better candidate if you're concerned with actual 3D reconstruction as it generates a viewable volume of your subject, which contains depth information and can be converted into a mesh. However, a NeRF mesh reconstruction loses the data of view-dependant changes (Transparency, Reflections, Lighting), which makes it a lot less useful. We're missing a way to convert that data into usable PBR values for meshes.
Glad that I take photos from time to time, whether it is mundane or important events.
The photos that I took are the only legacy (historical records) that I can hand down to my future descendants.
It's up to them how they will use it, as long as I will be remembered in a form of a digital reality.
This is incredible!!
The digitalization of cities is very close!
This could be used to improve google maps street view. You no longer have to teleport around and can instead just slowly drive across the road.
Google already uses photogrammetry although not in street view but when you zoom into a city, and like all photogrammetry which this is, you need to see things from multiple angles which google street view currently is only seeing from the road. If they sent drones rather than cars then they could generate point clouds like this
If they can use this sort of tech to create a photorealistic 3D model, then we could move about within Google Street View in first person. This would be super useful for game development Imagine race car games, where you could drive across the entire planet :)
@Michael Leue Google can precompute all video footage needed and serve the videos in the player one after another seamlessly.
Would be better if google started taking 360 videos instead. Although, it would be pretty trippt seeing people when reversing the video etc.
I mean, sure, if "slowly" means an hour per meter.
This feels like dreams, close to reality but with some bits missing.
Wow, this is amazing! The amount of potential this has for creating 3D assets and environments is UNREAL!!
This truly is incredible! I can't wait for the kind of applications that might come from this. You could integrate it into 3D modelling tools, video editing suites, image editors, so many things. And think about the applications in video compression :D.
I remember some paper from Google a few years back when they had a bunch of photos of the exterior of a building and people were losing their shit over how it was possible to move the camera around virtually to generate new images from new angles. In that paper they could only move like a few feet to the side before the model started to fall apart but that was impressive back then. This is a whole new world in comparison!
I wonder if the data it's using in the learning algorithm could be translated to a 3d model for use in normal 3d rendering... I'll just have to wait for 1-2 more papers down the line.
Really impressed about how such a smooth and high quality video can be produced from so little pictures! Love learning this kind of stuff.
I wonder if this works well enough to feed the results into photogrammetry software? the progress is stunning
Amazing. Now I'd like to see something like that generate volumetric point clouds that can be viewed in VR.
The time is fast approaching where we will never know if what our senses tells us is based in reality or some augmented AI simulation.
I can see it being used as-is as an assistant for creating animations! You can take a few pictures on your phone, feed it into this, and have an entire video to rotoscope for fancy dynamic shots.
Holy moly, this is amazing!
An immediate application I can think of is for are:
• Google's Street View and similar services
• Spying and intelligence
• Virtual tours
• Reality-inspired/based maps (e.g for games)
• Architecture (modeling and exterior and interior design)
And I think that it's amazing that it's so simple to train. Especially for Street View I can imagine a google car driving around, taking pictures and processing them in/close to real time into these videos as it drives around and takes them.
This is mind blowing... I would really love to connect with you or one of the AI experts who developed and/or engineered the algorythms because this type of intelligent "fill in the blanks" rendering could be precicely what I have been envisioning as a solution to a future iteration of a visualization product/production service that my company offers to our clients.
Would love to see how this could be applied to turning a series of drawings or architectural sketches into a 3D model for presentation to a client. potential applications for design, photography and 3D modelling work seem pretty incredible!
They should totally use this plus a more thorough google street view sweep to a 3d roadmap of the entire lower 48 in the US or even Europe.
MS flight sim has it, but what about truck simulator?
Would be curious to also know the input resolution limits, I have a dataset of half tera-pixel scenes of the same city from multiple lookouts, would be interesting for a way to explore the scene at intermediate angles
This would be amazingly cool to see used in larger image datasets such as... google maps street view
I wonder how well this would work as frame interpolation to upscale 24/30 fps to 60
If the speed of this increases well enough a few papers down the line, this would be great to render missing frames or even reduce rendered frames of a 3D scene and synthesise the rest.
I love your videos. But it would be great to have some sort of overview of how this is done? Otherwise it really is just magic. AI research and programming is zooming off into the future so fast its hard to keep up. Keep up the good work!!
Imagine this being used for seamless transitions in Film/TV/Etc., that would extremely impress me. This is so ADVANCED when it comes to 3D Scene Creation.
This would work wonders for making smooth 60 frame animations that are low frame rate, like anime or any film
Absolutely incredible! Literally jaw dropping for me.
What about letting the a.i create a 3d space from the information from the pictures to get rid of artifacts? It probably needs to be combined with some other program, but it would be a great way to create games (and other experiences) in the future. What a time to be alive♡
@Michael Leue It is possible...we can do it. I work in the field and one direction that is being explored is how we can build tools that handle the way these neural networks represent things u know. But as u said there is a huge way we have to go. Another thing is its one AI thingy for one scene so things blow up when u wanna design game scenes and all...
The previous paper on this channel, about generating a 3D mesh based on pictures of an object, could be used to do this, but comparing the processing requirements between the two shows that we're pretty far away from realistically doing THIS with THAT. Also, this requires a completely static environment, which wouldn't be a huge improvement over existing creation techniques in games.
this combination of AI and AR is going to make the metaverse very interesting
Wow. It starts looking like complete 3D scenes.
Glad I held on to my papers.
Incredible. Really looking forward to some future version of Google Earth using this.
This is actually scary. I'm incredibly amazed, and really enjoy looking at this. But where are we headed in the world at this rate??
Awesome! I wonder if this could help with getting cleaner photogrammetry data somehow?
So glad that this channel even exists. Always mind-blowing things to learn about, thanks a lot
This is insane for VFX. Im a matte painter and some of my work is reconstructing the environment and reproject photos on it. This AI just does all of that for you. This is crazy
It seems to reproduce lighting behavior quite accurately as well. Very impressive.
Can't wait to see whats two papers down the line!
WOW. This would be insane to use for Photogrammetry when you have not enough data.
Need this to make 3d models for blender etc...
Interesting vid as always, going to keep an eye on this ;)
unbelievable, imagine what the future of mapping could lead with this
That is insane! Is it possible to export this not as a video but as a perfectly projected 3D model with applied textures?
So, presumably you could output a tumbling camera version to photogrammetry software that could build the scene in 3D and texture it?
That title is really promising, thanks for all the great content man!
This could be manually replicated with photogrammetry and classical 3d camera animation rendering, so nothing that was impossible but much simpler for the end user. Fantastic how it also improves detail.
cant wait for someone to train an AI on 3D images, so you give it a photograph and it generates a 3d model. awesome.
Very impressive. The only performance metric I wish you would have mentioned is the rendering speed. Like how many seconds per frame on the mentioned consumer graphics card.
technology is evolving too fast, not that it's a bad thing
i just wanna say thanks to/for our robot ai overlords.....ha ha just kidding ha ha ha no seriously they are here they win :)
I love you for bringing us the bleeding edge in AI!
I'm imagining combining this with photogrammetry to create 3d models from a few photos
Can these be turned into SDFs, is my question.
If so then maybe some extra layered effects could be added, or maybe virtual objects added directly to the scene and have them properly lit.
Could you use this for video compression? Feed the algorithm keyframes, check the output, and then overwrite the places that it fails to do a good job? That way you'd only have to send the keyframes as well as smaller bits of video that would fill in the messed up places...
I think this is better than photogrammetry for the same few amount of photos
It achieves background and thin objects really well.
It uses a point cloud, I hope we can export a mesh from it because it would look awesome for VR (They support VR view, but I don't know if it is 6dof)
Absolutely cannot wait for what the next paper on this has in store
WHOA this is certainly one of the really spectacular ones, mind blown. I thought I had been desensitized to dropping my papers and I know photogrammetry has been around for awhile but yowza that’s clean!! And from so few pics! I’m heartbroken I can’t easily jump in and play with it this second
Perfect, now we need an AI that recreates from a bunch of photos a complete 3D model of the environment!
u can do it withh this one too.....
This could be integrated into Google Maps.. would be amazing
Okay, so this could create AI assisted photogrammetry with a very limited set of source photos... crazy.
Amazing vid. What, specifically, do I have to learn to be able to do this to my pictures?
This is so great! When are we going to watch fully AI created movies?
Imagine this technology weaving together Google street view images seamlessly. You could drive down any road anywhere with photoreal results.
I was sitting on 2 legs of my chair when I saw this, and I literally lost balance and fell off. This is INSANE! Holy f***. WOW
This can also be used in part with metaverse or future gaming applications with ar and vr. Imagine an artist making few scenes and the gaps will be filled by this.. Then another software will convert it into 3d models and boom.. We have the space for the game ready.. Amazing
This will be gold for game development. Forget ray-tracing. Forget map design, forget mesh design forget materials. Just go outside, take photos => Ready to play level. Hyper-realistic graphics on the way. Jokes aside, this is groundbreaking and will make a huge impact in a lot of domains.
@Jeroen Goyvaerts This is well different from photogrametry. The only thing they have in common is the process of images acquisition and that's about it. In photogrametry, you end up with actual geometry and textures. Here it's a different thing. You end up with a system that can synthesize a photo without actually knowing what it synthesizes. It's like a neural network having a dream in which you control the camera. There's nothing tangible that you can extract besides the renders themselves. For sure the system has an understanding of what it represents, but that understanding is deeply embedded into the networks' model data (ex. weights of the neural networks used in the system). That's why this paper is so groundbreaking. Perspective interpolation is a thing many seek for tens of years and now it's finally here. Even if not mature yet, the results are extremely promising.
Similar technology to this already existed for quite some time its called photogrammetry and it does make 3D models. Look at quixel megascans for example a lot of their 3D models are made using that technique.
@Polygonist Perhaps a game could be done without even using polygons. We are so deeply embedded into the classic rasterization approach in 3D computer graphics, that we forget 3D data can be represented in other ways (ex. SDFs) or perhaps not represented at all. If a game gives you the illusion that you can move freely in a 3D space, do you really care if the rendering of that 3D space is actually based on meshes? For all I know, everything could be a hallucination of a ANN. And if I can interact and play, for me that's a game by all means. With technologies like the one presented in the paper, we're really stepping in a zone where the gap between videos (highly realistic, less interactive) and games (highly interactive, less realistic) could be shorten and perhaps get the best of both worlds some day. Also, cool name.
This doesn't generate 3D meshes, and therefore cannot be used to make a conventional game in any capacity. Combine this with newer, improved photogrammetry techniques and maybe you're on to something
@Michael Leue It looks to me like it forms a 3D environment with depth data. One can take that data and implement it into a game... no? Maybe not right now, but it's certainly possible.
Something I'd really like to see at some point, is to be able to provide a single picture or small group of pictures, and then just be able to explore an infinite AI-generated world.
Not just filling in spaces or smoothing video, but being able to delve deeper and deeper in to a photo based with AI, including exploring entirely-unseen areas that the AI generates.
Also, not just a regular generation algorithm with AI visuals slapped on.
I wonder if we could train this on cartoon objects/information as well - it would mean a lot for hand-drawn animations if one could create essentially 3d objects and scenes from a few select key frames.
This is god sent. Imagine the cost of movies and animation studios drop dramatically as we can create amazing and dynamic content using such little amount of key frames.
Imagine reconstructing scenes from a family vacation or photos of an old family home. Amazing research.
This is a stunning result. Far better than I would have expected!
Amazing!!
is that possible to change it to a 3D model?
This could really speed up the photogrammetry process!!
waiting for the day this can be used easily to pull the real world into VR worlds in near real-time
I've never experienced how not realistic photorealism feels/looks before.
So basically, this is "Start with 1 frame per second, and get back 60 frames per second."
"SUPER INTERPOLATION"
very impressive interpolation I'd say
I hope stuff like this will be implemented in the gaming industry to make engines look absolutly real.
This would be useful for street view, you could view places where cars can't even go
I think its called photogrammetry, or where you take a TON of photos then use software to 3D model something. What this did with the playground slide is amazing compared to that
This looks to good to be true. I see many creative usage of this in future.
This is a step towards 3d scene from a picture (or several). This can be used in CG (imagine an artist sketches a scene and it's in 3d moments later) and if performance allows, even in real/near-real-time applications like computer vision and autonomous robotics.
Also a great thing for the nascent metaverses of all kinds. A few photos of the area and you have your simulation environment to train more AIs.
Taking virtual house tours in real estate to a whole new level!
I wonder if such technology can be used to make a driving game from Google Earth images.
This would be a great idea!
Could be used to create highly accurate 3d model of streets anywhere
Damn, starting to want to test it on some fake photos, or photos of smth with really complex form...
Or rotate from inside, not outside
Or change the environment while making photos(then, could it be used as scary transformation videoeffect?)
Or shoot the drawings/3D images on screen