Arrays vs Linked Lists - Computerphile

Share
Embed
  • Published on Jul 11, 2017
  • Which is faster? The results *may* just surprise you. Dr 'Heartbleed' Bagley gives us an in depth shoot-out - Arrays vs Linked Lists...
    Link to code can be found in Dr Bagley's tweet: t.co/4kg3AZDA57
    Sun Server: thexvid.com/video/c5qH-LW3tq8/video.html
    Megaprocessor: thexvid.com/video/lNa9bQRPMB8/video.html
    computerphile
    computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Comments • 784

  • Mathcartney
    Mathcartney 3 days ago

    +1 for the Marshall reference

  • Jeffrey Morris
    Jeffrey Morris 12 days ago

    Fascinating. (said Mister Excitement)

  • Johns Accordion Channel

    The advantage of the linked list is that it is much faster when the data set is small as compared to the array, which has to be as large as ever needed at all times!

  • Thijs Bruineman
    Thijs Bruineman Month ago

    That's all great, but this doesn't answer the question, this just checks how fast it is, not in which cases you should use linked list vs array.
    you sort of talked about it but not in depth, linked lists are variable in length, whereas arrays aren't.

  • David Brown
    David Brown Month ago

    Take the camera off the guy and keep it on what he's writing down.

  • azrul nizam
    azrul nizam 2 months ago

    “...which means I need a very high tech piece of equipment”
    *showed us a floppy disk*

  • Andrey Krasnov
    Andrey Krasnov 2 months ago

    nice video, and fun experiment on Atari ))
    reminds me Stroustrup's talk about vector vs list

  • watashiwahatchi
    watashiwahatchi 2 months ago

    Loved this video! Thanks guys. Great take away; "if you're not sure, come up with some tests and collect the real data". This echos Mike Acton's 'truths', 1. Hardware is the platform
    2. Design around the Data.

  • Patrick Wallwork
    Patrick Wallwork 3 months ago

    20:10 am I the only one who gets annoyed watching someone type the same command over and over? This is what bash history is for! Just tap your up arrow and go lol! Anyways, great video and very interesting, too. Thanks!

  • senator
    senator 3 months ago

    the tlb (translation look aside buffer) should solve this issue entirely anyway

  • D P
    D P 3 months ago

    can we start a goFundMe to get this guy some pens and a notepad?

  • Valley Kid
    Valley Kid 3 months ago +2

    I'll be that guy...
    My obsessive compulsion wishes the list version of the first code example traversed using something more like
    for (p = first; p != null; p = p->next)
    It would just look more similar! =)

    • senator
      senator 3 months ago +1

      Valley Kid yet you still initialise the variable before the for loop

  • DeathToCockroaches
    DeathToCockroaches 3 months ago

    Int sum = 0, i what does this do? The comma after the 0 is what I don't understand

  • Graham Rice
    Graham Rice 4 months ago +1

    I do get excited for a decrement and branch if zero instruction 🤤

  • MJ_305
    MJ_305 5 months ago

    You can reverse traverse a linked list using recursion, no need to make another pointer point to the previous node. However, I can imagine this to be slower as calling functions is more expensive.

  • Arsen
    Arsen 5 months ago

    another important difference is locality

  • Rotten Brainz
    Rotten Brainz 5 months ago

    what about self indexing arrays, cross the two together.

  • Sonia Blanche
    Sonia Blanche 5 months ago

    what about arraylist vs linkedlist

  • Dr. Guy Madison
    Dr. Guy Madison 5 months ago +2

    Actually... you may want to look at the index calculation time on the array on the older machine. Some compilers didn't optimize for powers of two and used a multiply for the index. This was pretty painful back in the day.. remember the ARM processor had a serial multiplier on some models but a 5 or 6 clock multiply was also a possibility.
    People forget this today, typically its 1 clock.
    IIRC the 80486 had a 3 clock integer multiply

  • Stefan Wiese
    Stefan Wiese 5 months ago

    I like rhis guy, his programming style, and his results! And I really had to laugh, when I saw the empty bottles of coke in the background...\o/ A real programmer :)

  • Chubby Panda
    Chubby Panda 6 months ago +2

    Did he actually say "doog yrev"!? lmao

  • Kayla and Jim Bryant
    Kayla and Jim Bryant 6 months ago

    emacs ftw!
    btw: i woulda double-linked the list.

  • Ahmed Noman
    Ahmed Noman 6 months ago

    that was a nice shirt

  • Kyle McClay
    Kyle McClay 6 months ago

    TL;DW always use an array unless you have a PhD from Stanford and your a lvl99 wizard

  • M S
    M S 6 months ago

    Excellent video!!! I had Atari Falcon 030. It was extraordinary at its time. :)

  • Antonio Innocente
    Antonio Innocente 6 months ago

    I always wondered why Computer Science was called a science because most people who claim to be computer scientists just talk about writing code. However, this video is the first example I have seen where the scientific method is applied to computers. Well done!!

  • Rui Dian
    Rui Dian 7 months ago

    I actually wrote a program in java to test out between array, linked list, hash map and hash set with a data of size 1000 which consist of three letter each.
    Here is the result on my laptop. Everything is measured in milliseconds.
    Operation Insert Retrieve Determine is in collection Retrieve Nth item Remove
    ArrayList 0.023838 0.2400026 0.0724832 0.013792 0.073589
    LinkedList 0.1600976 0.4727642 0.0730988 0.0241258 0.0750658
    HashSet 0.6962236 0.4007606 0.0134912 not available 0.0200704
    HashMap 0.6954188 0.503499 0.0157814 not available 0.020857
    The format might run a little since I can't upload the table here.
    ps: Just for reference to determine which suits the operation you're doing

  • Jeffrey Black
    Jeffrey Black 7 months ago

    When you say the array is optimised, what is included in that?
    For example, you have it referring to element i and increment i.
    Is it increment i, then performing the math required for a random access of the ith element of the array; or is it internally storing a pointer to what element of the array you are looking at, and then increment that pointer by the required amount when you increment i?
    i.e. is it going say starting with i=10, does it go i=10+1=11.
    pointer to array pos[i]=i*size+offset=11*size+offset.
    etc.
    or is it going i=10, current pointer=10*size+offset, then go i=11, next point=current pointer+size?

    • elodens4
      elodens4 7 months ago

      @Jeffrey Black The point is: if optimisation is your goal, think cache locality. This means thinking in terms of contiguous memory. Doesn't matter if the whole vector can't fit in the cache, it's certainly going to result in far less cache misses than otherwise.
      It's not uncommon for cpu's now adays to spend 50% of their time waiting for data. A single memory reference costs about the same as 100 instructions. And if your program is vastly more instruction than data, that's usually an indication that you've designed your solution badly.

    • Jeffrey Black
      Jeffrey Black 7 months ago

      @elodens4 That highly depends upon if the array is small enough to fit into the available cache. It also isn't what I was a dressing.
      Even with loading into memory being more expensive, modern CPUs can do that in the background while other instructions are running.

    • elodens4
      elodens4 7 months ago

      Arrays are optimised because they are stored in one contiguous block of memory. Because of this, they can be loaded into the cpu's cache from RAM in one operation, after which the cpu can read from it's cache basically for free. Linked lists on the other hand require the machine to jump all over the place and load things into cache many times over. On modern cpu's now adays the most expensive operation by far is fetching from RAM.

  • Dénes Sebestyén
    Dénes Sebestyén 8 months ago

    With pointer arithmetics I bet you could get at least the same speed with arrays. Because you don't tave to always get the starting pointer and add the sizeof(record)*n to the array, you can just add sizeof(record) to a pointer and iterate over (and it's done by c, so instead of p[n] you can iterate it q=p; q++ style)

  • Fredo FPV
    Fredo FPV 8 months ago

    Traversal is O(n) in both cases, so you shouldn't really care if one is marginally faster than the other.
    If you don't bother watching 30 minutes of video *arrays are faster than linked lists* , the reason being that on modern computers you have CPU cache and being an array contiguous it can be cached while a linked list is all scattered in memory and cannot be cached (you need to query the next location in memory every step of your loop).

  • Abhay Raj Singh Rathod
    Abhay Raj Singh Rathod 8 months ago +1

    Acessing array through pointers will be even faster
    for( int* i = a; i

  • Jugginator 12
    Jugginator 12 9 months ago

    Woah! A floppy disc? What kind of budget do you guys have over there ?!?! That's indeed a very high tech.

    • David Brown
      David Brown Month ago

      Jugginator Did that model Atari have hard drive capability?

  • boboala1
    boboala1 9 months ago

    Yeah baby! Atari! LOL! Reminds me...I got a 1040ST and MegaST boxed up in the basement. Flashback of grad school in the 1980s....compiler disk in A: and data disk in B: on a Sanyo MBC-550...compiling C code (mmrrap-mmrrap-mmrrap...mmrrap-mmrrap......)

  • Nick1wasd
    Nick1wasd 9 months ago +13

    "We're not gonna run this on the iMac, we're not gonna run it on the Pi, we're gonna run it on the Atari behind me" I love this guy so much.

  • chrissx Media
    chrissx Media 9 months ago +1

    well...you should really re-run it with c++11 iterative for-loops, because these have a kinda big impact on the cpu-performance of arrays

  • vcokltfre
    vcokltfre 9 months ago

    And if you start coding by using lua, arrays don't start at 0, they start at 1... Just to make life harder...

  • Timothy Stewart
    Timothy Stewart 11 months ago

    How were you able to compile on your iMac x86_64 cpu and run on Atrai Motorola chips?

  • Prez Golez
    Prez Golez 11 months ago

    Null =/= 0

  • Alexandre Vieira
    Alexandre Vieira 11 months ago

    Sweet mother of electrons... o.O I have not seen a floppy disk loaded into a computer since, can't even remember?! 12-18 years??

  • Ruffian Eo
    Ruffian Eo 11 months ago

    He should have done a multi-core version of that. Then, arrays hands down beat singly linked lists as you have an easier way to delegate chunks to the cores.

  • Alastair Montgomery
    Alastair Montgomery 11 months ago

    Dream selection of retro kit in that office :)

  • Alex Oakley
    Alex Oakley Year ago

    WHY are you explaining how to sum elements in a data structure!? This video is heeeeeeeaps longer than it needs to be. I could have read an entire chapter on data structures by now.

    • Peterolen
      Peterolen Year ago

      If all you want to do is learn why are you even watching these videos?

  • mspenrice
    mspenrice Year ago

    Hmm... another thing with the Falcon is that it makes use of the 68030's burst mode memory transfer mode (sort of an early attempt at EDO, which naturally makes loading large blocks faster than true random-access), as well as the cache. It'd be interesting to see the code run again on the MSTe at full 16MHz speed with cache enabled, to see which program is faster in that case, or if it ends up being pretty much 50:50.

  • Giovanni Carovanello

    I wonder why both the array and the linked-list started off with 42... Must be a coincidence, I guess.

  • RangeRover RanYouOver

    great vids and stuff keep it up have just found this channel is v good watch :)

  • kenj
    kenj Year ago

    The most important thing that everyone should learn is this: BOTH ARE FAST ENOUGH! Use whatever method works for your application.

  • Matt T
    Matt T Year ago +2

    My guess: Arrays! Cache locality.

  • tonyvice6661616
    tonyvice6661616 Year ago

    Dope shirt

  • Marc van Leeuwen
    Marc van Leeuwen Year ago

    While there is some amusement value to this video, it provides very little information of what is actually going on. We do not know how the nodes of the linked list are ordered in memory, and this is very relevant. The fact that the nodes were just all generated in one swoop just before the test would suggest they are probably linearly ordered, with each link pointing a fixed distance forward from itself; by contrast, in a linked list that has been used for some time in ways that linked lists are made for (insertions, deletions, permutations of links), the nodes would tend to be scrambled in memory in a way that is detrimental to locality and therefore to cache performance. We also don't know whether the address arithmetic implicit in finding the address of p[i] is actually performed each time, or that (quite likely) the compiler has optimised this away by just incrementing the address of the previous array access. Unless we have some idea of what happens at the assembly level, we are just watching a race between contenders we hardly know at all.
    No mention made of locality of access, though this is the unique feature that makes the data cache relevant: in the example, no item in memory is ever used more than once (I'm assuming the few local variables are in registers) so the _only_ gain from cache is that it fetches not only the item requested by the CPU, but a whole cache line so that the next several requests might be cache hits.Caches (and many other things) have been designed specifically to optimise linearly repetitive access to arrays, so it is not so surprising that an array program benefits most from such optimisations. On the other hand, mention of the instruction cache is beside the point: both solutions are small loops that can reside entirely in the instruction cache.
    If you really want to compare intrinsic efficiency of array of linked lists as data structures, I would suggest a test that forces the array to be accessed non-linearly, and the list to be non-linearly ordered in memory. For instance, just insist that the items be visited in increasing order by the q-component while using the p-components (maybe doing something other than summation, for which order matters). In the array case, leave everything in place and pre-compute the permutation giving the order of visit, while in the list case (where we cannot do random access)pre-sort the list by increasing q, with a sorting method that only adjusts the links. The net effect would be that the linked list is scrambled in memory in the same way the array accesses are scrambled. Then the playing field is levelled in several respects (neither has much locality, so caching has little influence; array indexing cannot be optimised away), and the tests would be a lot more interesting.

  • Piero Ulloa
    Piero Ulloa Year ago

    You could also add than in the iMac, the array could fit entirely on cache as opposed to the linked list, and optimizing the code for loop unrolling gave it all the speed boost. So theoretically it might have never accesed the memory but to load the program from disk when running as array.

  • dragonore2009
    dragonore2009 Year ago +1

    8:11 "p inside p, were using p inside p, confusing variable names, but then I'm a C programmer" LOL.

  • Doru Min
    Doru Min Year ago +1

    Arrays could also be faster by dividing the work between multiple threads

  • MrCOPYPASTE
    MrCOPYPASTE Year ago +4

    MEMORY IS AN ARRAY!!!!!!!!!

  • MrCOPYPASTE
    MrCOPYPASTE Year ago +2

    Please give us the generated assembly for both cases... Thank you

  • MrCOPYPASTE
    MrCOPYPASTE Year ago

    One challenge for cp if I may... do the same thing in assembly... Thank you

  • Desmond Brown
    Desmond Brown Year ago

    Really the main reason anyone uses linked lists is to save space and save time when inserting new data where the size is unknown and removing it from an existing list. Compare that to an array and you will find it takes far longer to work with arrays in most cases since arrays must be completely rebuilt in order to add new pieces of data. Also arrays introduce waste in memory to protect the program from getting out of boundary exceptions.

  • Flavio Alarcon
    Flavio Alarcon Year ago

    I learned java programming from an institute, in there I basically learned the basics on how to program, develop websites and things like that, but they never even began to explain how things actually worked, it still worries me how little I actually know about computers, since all I know is basically what years of progress in the computer world have achieved, and I'm just using the tools they gave me without questioning, videos like this are very important for someone like me, I appreciate it.

  • reda abakhti
    reda abakhti Year ago

    a+b = b+a watch numberphile so you can understand

  • JiveDadson
    JiveDadson Year ago

    Cache misses are far more important these days than instruction count. Caches love arrays and hate links. Running tests to find out how fast a program can do nothing proves nothing. Testing how fast a program that does nothing will run on ancient hardware proves... nothinger? Academics. Meh.

  • Learning Electronics Cheaply

    Another problem with arrays is that if you need a big one it will be the same size if you use q element or all of them. You guess how big ya need it you might make it 1 gig so you always have room, but you would lost that mem in just that array much less your program .