Arrays vs Linked Lists - Computerphile

Share
Embed
  • Published on Jul 11, 2017
  • Which is faster? The results *may* just surprise you. Dr 'Heartbleed' Bagley gives us an in depth shoot-out - Arrays vs Linked Lists...
    Link to code can be found in Dr Bagley's tweet: t.co/4kg3AZDA57
    Sun Server: thexvid.com/video/c5qH-LW3tq8/video.html
    Megaprocessor: thexvid.com/video/lNa9bQRPMB8/video.html
    facebook.com/computerphile
    twitter.com/computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Comments • 771

  • Patrick Wallwork
    Patrick Wallwork 8 days ago

    20:10 am I the only one who gets annoyed watching someone type the same command over and over? This is what bash history is for! Just tap your up arrow and go lol! Anyways, great video and very interesting, too. Thanks!

  • senator
    senator 10 days ago

    the tlb (translation look aside buffer) should solve this issue entirely anyway

  • D P
    D P 25 days ago

    can we start a goFundMe to get this guy some pens and a notepad?

  • Valley Kid
    Valley Kid 27 days ago +2

    I'll be that guy...
    My obsessive compulsion wishes the list version of the first code example traversed using something more like
    for (p = first; p != null; p = p->next)
    It would just look more similar! =)

    • senator
      senator 10 days ago +1

      Valley Kid yet you still initialise the variable before the for loop

  • DeathToCockroaches
    DeathToCockroaches 29 days ago

    Int sum = 0, i what does this do? The comma after the 0 is what I don't understand

  • Graham Rice
    Graham Rice Month ago

    I do get excited for a decrement and branch if zero instruction 🤤

  • MJ_305
    MJ_305 2 months ago

    You can reverse traverse a linked list using recursion, no need to make another pointer point to the previous node. However, I can imagine this to be slower as calling functions is more expensive.

  • Arsen
    Arsen 2 months ago

    another important difference is locality

  • Rotten Brainz
    Rotten Brainz 2 months ago

    what about self indexing arrays, cross the two together.

  • Sonia Blanche
    Sonia Blanche 2 months ago

    what about arraylist vs linkedlist

  • Dr. Guy Madison
    Dr. Guy Madison 2 months ago +2

    Actually... you may want to look at the index calculation time on the array on the older machine. Some compilers didn't optimize for powers of two and used a multiply for the index. This was pretty painful back in the day.. remember the ARM processor had a serial multiplier on some models but a 5 or 6 clock multiply was also a possibility.
    People forget this today, typically its 1 clock.
    IIRC the 80486 had a 3 clock integer multiply

  • Stefan Wiese
    Stefan Wiese 2 months ago

    I like rhis guy, his programming style, and his results! And I really had to laugh, when I saw the empty bottles of coke in the background...\o/ A real programmer :)

  • Chubby Panda
    Chubby Panda 3 months ago +1

    Did he actually say "doog yrev"!? lmao

  • Kayla and Jim Bryant
    Kayla and Jim Bryant 3 months ago

    emacs ftw!
    btw: i woulda double-linked the list.

  • Ahmed Noman
    Ahmed Noman 3 months ago

    that was a nice shirt

  • Kyle McClay
    Kyle McClay 3 months ago

    TL;DW always use an array unless you have a PhD from Stanford and your a lvl99 wizard

  • M S
    M S 3 months ago

    Excellent video!!! I had Atari Falcon 030. It was extraordinary at its time. :)

  • Antonio Innocente
    Antonio Innocente 3 months ago

    I always wondered why Computer Science was called a science because most people who claim to be computer scientists just talk about writing code. However, this video is the first example I have seen where the scientific method is applied to computers. Well done!!

  • Rui Dian
    Rui Dian 4 months ago

    I actually wrote a program in java to test out between array, linked list, hash map and hash set with a data of size 1000 which consist of three letter each.
    Here is the result on my laptop. Everything is measured in milliseconds.
    Operation Insert Retrieve Determine is in collection Retrieve Nth item Remove
    ArrayList 0.023838 0.2400026 0.0724832 0.013792 0.073589
    LinkedList 0.1600976 0.4727642 0.0730988 0.0241258 0.0750658
    HashSet 0.6962236 0.4007606 0.0134912 not available 0.0200704
    HashMap 0.6954188 0.503499 0.0157814 not available 0.020857
    The format might run a little since I can't upload the table here.
    ps: Just for reference to determine which suits the operation you're doing

  • Jeffrey Black
    Jeffrey Black 4 months ago

    When you say the array is optimised, what is included in that?
    For example, you have it referring to element i and increment i.
    Is it increment i, then performing the math required for a random access of the ith element of the array; or is it internally storing a pointer to what element of the array you are looking at, and then increment that pointer by the required amount when you increment i?
    i.e. is it going say starting with i=10, does it go i=10+1=11.
    pointer to array pos[i]=i*size+offset=11*size+offset.
    etc.
    or is it going i=10, current pointer=10*size+offset, then go i=11, next point=current pointer+size?

    • elodens4
      elodens4 4 months ago

      @Jeffrey Black The point is: if optimisation is your goal, think cache locality. This means thinking in terms of contiguous memory. Doesn't matter if the whole vector can't fit in the cache, it's certainly going to result in far less cache misses than otherwise.
      It's not uncommon for cpu's now adays to spend 50% of their time waiting for data. A single memory reference costs about the same as 100 instructions. And if your program is vastly more instruction than data, that's usually an indication that you've designed your solution badly.

    • Jeffrey Black
      Jeffrey Black 4 months ago

      @elodens4 That highly depends upon if the array is small enough to fit into the available cache. It also isn't what I was a dressing.
      Even with loading into memory being more expensive, modern CPUs can do that in the background while other instructions are running.

    • elodens4
      elodens4 4 months ago

      Arrays are optimised because they are stored in one contiguous block of memory. Because of this, they can be loaded into the cpu's cache from RAM in one operation, after which the cpu can read from it's cache basically for free. Linked lists on the other hand require the machine to jump all over the place and load things into cache many times over. On modern cpu's now adays the most expensive operation by far is fetching from RAM.

  • Dénes Sebestyén
    Dénes Sebestyén 5 months ago

    With pointer arithmetics I bet you could get at least the same speed with arrays. Because you don't tave to always get the starting pointer and add the sizeof(record)*n to the array, you can just add sizeof(record) to a pointer and iterate over (and it's done by c, so instead of p[n] you can iterate it q=p; q++ style)

  • Fredo Corleone
    Fredo Corleone 5 months ago

    Traversal is O(n) in both cases, so you shouldn't really care if one is marginally faster than the other.
    If you don't bother watching 30 minutes of video *arrays are faster than linked lists* , the reason being that on modern computers you have CPU cache and being an array contiguous it can be cached while a linked list is all scattered in memory and cannot be cached (you need to query the next location in memory every step of your loop).

  • Abhay Raj Singh Rathod
    Abhay Raj Singh Rathod 5 months ago +1

    Acessing array through pointers will be even faster
    for( int* i = a; i

  • Jugginator 12
    Jugginator 12 6 months ago

    Woah! A floppy disc? What kind of budget do you guys have over there ?!?! That's indeed a very high tech.

  • boboala1
    boboala1 6 months ago

    Yeah baby! Atari! LOL! Reminds me...I got a 1040ST and MegaST boxed up in the basement. Flashback of grad school in the 1980s....compiler disk in A: and data disk in B: on a Sanyo MBC-550...compiling C code (mmrrap-mmrrap-mmrrap...mmrrap-mmrrap......)

  • Nick1wasd
    Nick1wasd 6 months ago +5

    "We're not gonna run this on the iMac, we're not gonna run it on the Pi, we're gonna run it on the Atari behind me" I love this guy so much.

  • chrissx Media
    chrissx Media 6 months ago +1

    well...you should really re-run it with c++11 iterative for-loops, because these have a kinda big impact on the cpu-performance of arrays

  • vcokltfre
    vcokltfre 6 months ago

    And if you start coding by using lua, arrays don't start at 0, they start at 1... Just to make life harder...

  • Timothy Stewart
    Timothy Stewart 8 months ago

    How were you able to compile on your iMac x86_64 cpu and run on Atrai Motorola chips?

  • Prez Golez
    Prez Golez 8 months ago

    Null =/= 0

  • Alexandre Vieira
    Alexandre Vieira 8 months ago

    Sweet mother of electrons... o.O I have not seen a floppy disk loaded into a computer since, can't even remember?! 12-18 years??

  • Ruffian Eo
    Ruffian Eo 8 months ago

    He should have done a multi-core version of that. Then, arrays hands down beat singly linked lists as you have an easier way to delegate chunks to the cores.

  • Alastair Montgomery
    Alastair Montgomery 8 months ago

    Dream selection of retro kit in that office :)

  • Alex Oakley
    Alex Oakley 9 months ago

    WHY are you explaining how to sum elements in a data structure!? This video is heeeeeeeaps longer than it needs to be. I could have read an entire chapter on data structures by now.

    • Peterolen
      Peterolen 9 months ago

      If all you want to do is learn why are you even watching these videos?

  • mspenrice
    mspenrice 9 months ago

    Hmm... another thing with the Falcon is that it makes use of the 68030's burst mode memory transfer mode (sort of an early attempt at EDO, which naturally makes loading large blocks faster than true random-access), as well as the cache. It'd be interesting to see the code run again on the MSTe at full 16MHz speed with cache enabled, to see which program is faster in that case, or if it ends up being pretty much 50:50.

  • Giovanni Carovanello
    Giovanni Carovanello 9 months ago

    I wonder why both the array and the linked-list started off with 42... Must be a coincidence, I guess.

  • RangeRover RanYouOver
    RangeRover RanYouOver 9 months ago

    great vids and stuff keep it up have just found this channel is v good watch :)

  • kenj
    kenj 9 months ago

    The most important thing that everyone should learn is this: BOTH ARE FAST ENOUGH! Use whatever method works for your application.

  • Matt T
    Matt T 9 months ago +2

    My guess: Arrays! Cache locality.

  • tonyvice6661616
    tonyvice6661616 10 months ago

    Dope shirt

  • Marc van Leeuwen
    Marc van Leeuwen 10 months ago

    While there is some amusement value to this video, it provides very little information of what is actually going on. We do not know how the nodes of the linked list are ordered in memory, and this is very relevant. The fact that the nodes were just all generated in one swoop just before the test would suggest they are probably linearly ordered, with each link pointing a fixed distance forward from itself; by contrast, in a linked list that has been used for some time in ways that linked lists are made for (insertions, deletions, permutations of links), the nodes would tend to be scrambled in memory in a way that is detrimental to locality and therefore to cache performance. We also don't know whether the address arithmetic implicit in finding the address of p[i] is actually performed each time, or that (quite likely) the compiler has optimised this away by just incrementing the address of the previous array access. Unless we have some idea of what happens at the assembly level, we are just watching a race between contenders we hardly know at all.
    No mention made of locality of access, though this is the unique feature that makes the data cache relevant: in the example, no item in memory is ever used more than once (I'm assuming the few local variables are in registers) so the _only_ gain from cache is that it fetches not only the item requested by the CPU, but a whole cache line so that the next several requests might be cache hits.Caches (and many other things) have been designed specifically to optimise linearly repetitive access to arrays, so it is not so surprising that an array program benefits most from such optimisations. On the other hand, mention of the instruction cache is beside the point: both solutions are small loops that can reside entirely in the instruction cache.
    If you really want to compare intrinsic efficiency of array of linked lists as data structures, I would suggest a test that forces the array to be accessed non-linearly, and the list to be non-linearly ordered in memory. For instance, just insist that the items be visited in increasing order by the q-component while using the p-components (maybe doing something other than summation, for which order matters). In the array case, leave everything in place and pre-compute the permutation giving the order of visit, while in the list case (where we cannot do random access)pre-sort the list by increasing q, with a sorting method that only adjusts the links. The net effect would be that the linked list is scrambled in memory in the same way the array accesses are scrambled. Then the playing field is levelled in several respects (neither has much locality, so caching has little influence; array indexing cannot be optimised away), and the tests would be a lot more interesting.

  • Piero Ulloa
    Piero Ulloa 10 months ago

    You could also add than in the iMac, the array could fit entirely on cache as opposed to the linked list, and optimizing the code for loop unrolling gave it all the speed boost. So theoretically it might have never accesed the memory but to load the program from disk when running as array.

  • dragonore2009
    dragonore2009 10 months ago +1

    8:11 "p inside p, were using p inside p, confusing variable names, but then I'm a C programmer" LOL.

  • Matt Cummings
    Matt Cummings 10 months ago +1

    Arrays could also be faster by dividing the work between multiple threads

  • MrCOPYPASTE
    MrCOPYPASTE 11 months ago +4

    MEMORY IS AN ARRAY!!!!!!!!!

  • MrCOPYPASTE
    MrCOPYPASTE 11 months ago +2

    Please give us the generated assembly for both cases... Thank you

  • MrCOPYPASTE
    MrCOPYPASTE 11 months ago

    One challenge for cp if I may... do the same thing in assembly... Thank you

  • Desmond Brown
    Desmond Brown 11 months ago

    Really the main reason anyone uses linked lists is to save space and save time when inserting new data where the size is unknown and removing it from an existing list. Compare that to an array and you will find it takes far longer to work with arrays in most cases since arrays must be completely rebuilt in order to add new pieces of data. Also arrays introduce waste in memory to protect the program from getting out of boundary exceptions.

  • Flavio Alarcon
    Flavio Alarcon Year ago

    I learned java programming from an institute, in there I basically learned the basics on how to program, develop websites and things like that, but they never even began to explain how things actually worked, it still worries me how little I actually know about computers, since all I know is basically what years of progress in the computer world have achieved, and I'm just using the tools they gave me without questioning, videos like this are very important for someone like me, I appreciate it.

  • reda abakhti
    reda abakhti Year ago

    a+b = b+a watch numberphile so you can understand

  • JiveDadson
    JiveDadson Year ago

    Cache misses are far more important these days than instruction count. Caches love arrays and hate links. Running tests to find out how fast a program can do nothing proves nothing. Testing how fast a program that does nothing will run on ancient hardware proves... nothinger? Academics. Meh.

  • Learning Electronics Cheaply

    Another problem with arrays is that if you need a big one it will be the same size if you use q element or all of them. You guess how big ya need it you might make it 1 gig so you always have room, but you would lost that mem in just that array much less your program .

  • Hydra Zerg
    Hydra Zerg Year ago

    I don't think your evaluation is complete. Making the sum is indeed a point to consider but how about deleting or adding elements? Wouldn't that be important to consider?

  • Joseph Cote
    Joseph Cote Year ago

    An array of structures is not the same as an array of primitive values. I would guess a method using the primitive array would come in between 1/3rd and 1/2 the time of the list.

  • Brugse Zot
    Brugse Zot Year ago

    What editor is he using? (On the Mac)...

  • Nathan Imig
    Nathan Imig Year ago

    For the array summation the fastest way is to initialize a pointer to the first element, and dereference the pointer, add to sum, and post increment the pointer to the next element in the array. I would have just written the code like this, instead of assuming the compiler would do this for you.

  • ♫♪Ludwig van Beethoven♪♫

    Locality of reference.

  • ♫♪Ludwig van Beethoven♪♫

    My toilet runs faster

  • Larry Gall
    Larry Gall Year ago +1

    I thought the array would have been faster, even on the Atari.. I was surprised by this. I think it comes down to memory speed and the way the memory controller addresses memory registers. The reason I thought an array would be faster is because it points to a specific register immediately.. and a linked list has to search and compare. But in the end.. *cache* is king!

    • Joseph Cote
      Joseph Cote Year ago

      What it looks like he's done is create an array of data objects. Access via an index first, THEN the pointer at that index into the object. Had he created a direct array of primitive values the array method might have been a third of the time.
      I wouldn't have even considered this a fair comparison either way. You have some problem you need to solve and you use the data structure that is appropriate for it. Maybe a linked list is best, maybe a double liked list, maybe an array of objects, whatever. And you live with the constraints of the chosen data structure.

  • nberedim
    nberedim Year ago

    this was worth it just for the spinal tap reference.

  • Brek Martin
    Brek Martin Year ago

    The platform might support zero overhead looping. Checkmate Atheists!

  • EuclideanPrime Numbers

    Proper data structures > all.

  • Vali Zeth
    Vali Zeth Year ago

    C is a high level language? lmao

    • Vali Zeth
      Vali Zeth Year ago

      Peterolen Ofc. But 99% of the times you speak of a high level language you speak of java, python etc. And low level is as C, C++ etc. Its like saying that $200 000 000 000 isn't much cause $2 000 000 000 000 is more.

    • Peterolen
      Peterolen Year ago

      Compared to assembly, it is.

  • Garrett Lappe
    Garrett Lappe Year ago

    Why is the focus of this analysis the difference between machines rather than the differences between linked lists and arrays and their architecture and niches?

  • William Newing
    William Newing Year ago

    Ok now why was it faster backwards?

  • Q Squared
    Q Squared Year ago

    But, if you need to reference specific items then the array is faster than the list, because you can reference any point in the array directly.

  • auxiliary-character

    Well, there's another problem here. Is your routine that measures time between each add operation pushing anything out of cache?
    I'd like to see the test run again on the iMac, but measuring the the total time to sum everything up, as opposed to the average time of each operation.

  • Bobby Page
    Bobby Page Year ago

    After 30 minutes not one comparison/benchmark on arbitrary insert/delete..? The entire reason linked lists were devised was to address that problem with arrays.
    The only thing this video did was test cache and compiler optimization on different pieces of hardware...

  • MrMalchore
    MrMalchore Year ago

    I guess...a lot depends on your cross compiler. It was compiled on a modern Mac but targeted for whatever CPU architecture is in the Atari. And if that cross compiler supported unrolling loops etc. By the way, I really want to insert a snarky comment about floopy disks and Apple computers, but it eludes me.
    I've always loved linked lists as a data structure.

  • TheJaguar1983
    TheJaguar1983 Year ago

    Didn't even think of the cache. I did figure out why the reverse was faster on the Atari, though.

  • TheJaguar1983
    TheJaguar1983 Year ago

    First impression. Depends on the operation. I tend to use arrays when I need contiguous memory, or I have a known length, and if I need random access. I use linked lists when building lists with unknown length and where random access is not required.

  • gnarlin
    gnarlin Year ago +1

    Maybe a simpler explanation of arrays is to just say: an array is like a spreadsheet for values in memory and you have to specify how large the spreadsheet is when you create it.

    • mspenrice
      mspenrice 9 months ago

      ...including the maximum length of data that you can put in each and every cell, which is at least as restrictive a thing as defining the array size itself

  • Dana Adalaide
    Dana Adalaide Year ago

    you didn't try i=100; while(i--)

  • Michael Pelz
    Michael Pelz Year ago

    I'm curious with the Atari, since it has to fetch so much from memory with the Array, since Arrays are generally stored in a block of memory you can create a pointer to the beginning of the array (or end) and increment the pointer through the array to add it up if that would speed it up, match it, or slow it down.... A question on how pointers are used and stored in that system and how it will compile it.

  • Buddy Cox
    Buddy Cox Year ago

    Unless I really need to insert/delete in the middle, I like a dynamic array a lot better than a linked list.

  • Mib
    Mib Year ago

    what atom skin is that?

  • ThePixelize
    ThePixelize Year ago

    This was ... fantastic!

  • Sourdoughbro
    Sourdoughbro Year ago

    Really nice video

  • MrSnowman
    MrSnowman Year ago

    14:00
    Well I'm certainly surprised.
    Good show.

  • firsy
    firsy Year ago

    wouldnt the index be in a register rather than the cache?

  • Mark Sims
    Mark Sims Year ago

    Love the Hobbit reference, time passes..................Nobody sits down and starts singing about gold though!

  • David Olsen
    David Olsen Year ago

    The problems with array optimizations are so pronounced that things with even fantastic guarantees like a Cuckoo hashing. Which has the best guarantees of any type of hashmap sort of implementation but because it bounces an insert so far away it guarantees it for a cache miss, so nobody uses them. Likewise as nice as Linked Lists are, you are better off using arrays for everything except very extreme examples wherein hashmaps are out. We even implement things like queues and stacks in arrays and queue and stacks are the quintessential thing linked lists are introduced talking about. You are way better off with an array with specific head and tail locations, looping around when they exceed the array size and then rebuilding the array when all values are used. It's one of those odd forced moves, since you can cache for an array, you have to use the array because the system is cached for the array.

  • val see
    val see Year ago

    Now, create an array, fill it then sort it, compare to linked lists, then, one each keeping the list sorted with each individual add. That's why linked lists are faster.

    • val see
      val see Year ago

      Peterolen source?... Cause it's not

    • Peterolen
      Peterolen Year ago

      The array seems to perform better in both situations.

  • Error
    Error Year ago

    You should be timing access to array and link list elements, as well as timing insertion.

  • Jérémie Guichard

    I do not have Atari to test this but I believe this implementation would make array faster in all cases.
    struct arrayItem* ptr=p;
    struct arrayItem* end=ptr+num;
    int sum=*ptr;
    while(++ptr

  • Joey Gray
    Joey Gray Year ago

    Watch your p's and q's

  • Fernando Garibaldi

    This entire video was a statement on you dont know until you know

  • Andre Allen
    Andre Allen Year ago

    Wouldn't the random values in either the array or linked list make the results skewed. For example: it is faster to add the number 2 to a running total than to add 257,300, or something like that. I think to definitively say which is faster the contents need to be the same.

  • REALSlutHunter
    REALSlutHunter Year ago

    hmm very interesting, I would not have thought that a aray is faster, when any element is processed. Most Time i use Linked Lists, but when i write something that use more as one thread i use arrays.

  • E La Voie
    E La Voie Year ago

    Awesome video

  • Robin Williams
    Robin Williams Year ago

    I spotted that "Adventure Game" reference in there! ;) :D
    Is that an Acorn Atom in the background?! :D

  • gnargnarhead
    gnargnarhead Year ago

    whoa

  • Transformers And Power Rangers Toy Reviews

    Being a C programmer is no excuse for confusing or non-descriptive variable names. As much as possible variable names should explain the purpose of their data unambiguously.
    I recognize there are times when this is not easy, but with modern IDEs there is no justification for bad variable names. Don't ignore the power of your tools.

    • Kenny Phillips
      Kenny Phillips 9 months ago

      I prefix my variables with type: datatypeVariableName

    • mspenrice
      mspenrice 9 months ago

      Maybe he's revealing a history where he cut his programming teeth using BASIC on the Spectrum. All your variables are single letters, there, and p and q were quite popular options for things which didn't need to be more obvious like x,y,z nor your primary operators (which got a,b,c..). Often i,j and k also.

    • Zak Unknown
      Zak Unknown 9 months ago

      I feel like your not that fun at parties.
      Dont worry qt ;) i still love you

    • checka
      checka 10 months ago +2

      He said "asleep" programmer, not a C programmer, so he had an excuse

  • Gummans Gubbe
    Gummans Gubbe Year ago

    Memory management is a thing as well. In case of fragmented memory it can be difficult to allocate larger memory blocks.
    The old Macintosh used "Handles" to deal with fragmented memory, the os could move the arrays in memory without the programs knowing.

  • StormCloud
    StormCloud Year ago

    I'd like to see some tests regarding arrays and lists where you're frequently or constantly adding and removing entries.
    Because the array would have to rebuild every time, while the list should just be able to skip a value and allocate another.

  • Aaron Wood
    Aaron Wood Year ago

    Locality of reference!

  • Christian Ivarsson

    I'm certain Mr. Bagley already know about this but on 68k specifically an array will ALWAYS be faster than a linked list when coding in assembly since you can predecrement/postincrement the address-pointer.
    (as in direction doesn't matter)
    ie:
    lea.l "address of array", a0
    clr.l d0
    clr.l d1
    move.l 1048576, d2 (Number of iterations.)
    Loop:
    move.b (a0)+, d1
    add.l d1, d0
    subq.l 1, d2
    bne.b Loop
    d0 will contain a checksum32
    you could also add 1048576 to the address of the array and do "move.b -(a0), d1" inside the loop to read it in reverse. Speed should be more or less the same.
    With a linked list you also have to fetch a new address for every iteration so there's no chance that is faster on these specifically :)

  • stellarcircle9
    stellarcircle9 Year ago

    16:39 ya done now

  • 233kosta
    233kosta Year ago

    Given that most of us will need more complicated data structures, it's safe to assume that a linked list will invariably be more suitable for a given task...

  • Brett Raugust
    Brett Raugust Year ago

    Does the UK not have whiteboards?