🌞
🌝

Efficient zip function in JavaScript

How Python does it?

The Python zip function is a clever analogy of a real zipper: It allows you to iterate over multiple arrays element-wise without having to worry about indices. It is quite handy when you have to manually pair elements coming from two (or more) data sources:

>>> scoreList = [5, 3, 6, 8]
>>> playerList = ['Mary', 'John', 'Emma', 'Gavin']
>>> list(zip(scoreList, playerList))
# [(5, 'Mary'), (3, 'John'), (6, 'Emma'), (8, 'Gavin')]

Please note the list() call is needed to convert the result to a list, otherwise it returns something like

>>> zip(scoreList, playerList)
# <zip object at 0x109b19d00>

which is an object implementing the iterator interface. Why the hassle with this object? Why is it not just a list? The iterator interface allows lazy-evaluation, meaning it won't create a list, only if it's asked. This object instead allows generating elements on-demand, which can be more efficient. Let's say I want to find the first player with score 6:

for score, player in zip(scoreList, playerList):
  if score == 6:
    print(f'player {player} has score 6')
    break

In this case, the iteration stops with ((6, 'Emma')) and the last pair ((8, 'Gavin')) would never be constructed.

The implementation is quite interesting, because zip not only works with two lists, it can more than two arrays as input parameters. This can be particularly useful when trying to transpose a list of lists (a matrix). If you're not familiar with the term transpose, it simply means a list of list is simply flipped by its diagonal. The transposition of

[
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]
]

is

>>> list(zip(*[
...  [1, 2, 3],
...  [4, 5, 6],
...  [7, 8, 9]
... ]))
[
  [1, 4, 7],
  [2, 5, 8],
  [3, 6, 9]
]

What happens if the input arrays do not have the same length? According to the Python docs it stops when the shortest input iterable is exhausted, meaning it will return as many elements as long the shortest array is:

>>> list(zip([1, 2, 3], [4, 5]))
[(1, 4), (2, 5)]

If all elements need to be returned, itertools.zip_longest should be used.

And just to spice things up the little, let me mention that as the zip function expects iterables as input, not specifically arrays. Iterables are more generic than arrays, without going too much into the details, they are objects that can be iterated over one element at the time, and they can signal if they are exhausted (iteration finished). In practice, it means that any object that can be iterated over can be used as inputs of zip: tuples, sets, dictionaries, range, or even results of other zips. Mind-blowing, right? 🤯 It is also possible to create an infinite zip. Let me use itertools.count to demonstrate it. It is very similar to range() except it has no stopping criteria, so if it is used in a for loop it keeps yielding values unless it is stopped.

>>> for a, b in zip(itertools.count(start=0, step=2), itertools.count(start=1, step=2)):
...     print(a, b)
1 2
3 4
5 6
...

I really hope I could convince you by now, how cool and versatile this Python standard library function is. Why can't we have nice things in JavaScript? Well we can, you just probably end up hunting for third-parties on npm1 or ready-made solutions on Stack Overflow. But is there anything more satisfying than using your home-grown utilities?

How JavaScript could do it?

Before jumping into the implementation, let's take a detour and inspect the differences between the Python and the JavaScript iterator interface.

In Python you can do:

>>> my_list = ['a', 'b', 'c']
>>> my_list_iterator = iter(my_list)
>>> next(my_list_iterator)
'a'
>>> next(my_list_iterator)
'b'
>>> next(my_list_iterator)
'c'
>>> next(my_list_iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The iterator of an iterable object can be obtained with the global iter() function, and the also globally available next() function can be used to fetch the current value and increment the iterator. The end of the iteration is signaled by the StopIteration built-in exception. The Python iterator mechanism is built with the side-effect of raising an error, which happens under the hod in every for-loop. This allows a very ergonomic workflow, although it might seem unorthodox to make use of exceptions in the expected behaviour of the code flow.

How would this look like in JavaScript?

> var myList = ['a', 'b', 'c'];
undefined
> var myListIterator = myList[Symbol.iterator]()
undefined
> myListIterator.next()
{ value: 'a', done: false }
> myListIterator.next()
{ value: 'b', done: false }
> myListIterator.next()
{ value: 'c', done: false }
> myListIterator.next()
{ value: undefined, done: true }

In JavaScript, in contrast to Python, there's no global function to get an iterator of an object. It can be obtained with the myList[Symbol.iterator](). The current value can be accessed and the iterator incremented with the iterator.next() method, which returns an object with two properties: value holds the current value and a done indicates the status of the iteration. When the iteration finishes, value is undefined and done is true. This interface can be somewhat more invconvenient to work with but this is a trade-off of not signaling the iteration state with an exception.

With the iterator interface under our belt, we can design a faithful implementation of the Python zip and itertools.zip_longest functions in JavaScript. Let's focus on supporting two elements first and stop as soon as one of the iterators is exhausted:

function zip(iterableA, iterableB) {
  // Get the iterators
  const iteratorA = iterableA[Symbol.iterator]();
  const iteratorB = iterableB[Symbol.iterator]();
  // Keep track of the current iteration state in `iterStateA` and `iterStateB`
  let iterStateA = iteratorA.next();
  let iterStateB = iteratorB.next();
  // result will hold the return values;
  const result = [];
  // Loop until none of the iterators are exhausted
  while (!iterStateA.done && !iterStateB.done) {
    // The current values of the iterators are stored in result
    result.push([iterStateA.value, iterStateB.value]);
    // The current iterator state is updated from the iterator
    iterStateA = iteratorA.next();
    iterStateB = iteratorB.next();
  }
  return result;
}

It seems to be working as expected:

> zip([1, 2, 3, 4], ['a', 'b', 'c'])
[ [ 1, 'a' ], [ 2, 'b' ], [ 3, 'c' ] ]

So far so good, but this implementation is eagerly-evaluated, which means no partial results are returned, both iterators are completely exhausted before the function returns. Lazy evaluation on the other hand requires returning partial results, that's where generator functions come handy. It's quite simple to change the previous implementation, a * needs to be added after the function keyword and instead of storing partial results to results they can be yielded:

function* zip(iterableA, iterableB) {
  // Get the iterators
  const iteratorA = iterableA[Symbol.iterator]();
  const iteratorB = iterableB[Symbol.iterator]();
  // Keep track of the current iteration state in `iterStateA` and `iterStateB`
  let iterStateA = iteratorA.next();
  let iterStateB = iteratorB.next();
  // Loop until none of the iterators are exhausted
  while (!iterStateA.done && !iterStateB.done) {
    // The current values of the iterators are yielded
    yield [iterStateA.value, iterStateB.value];
    // The current iterator state is updated from the iterator
    iterStateA = iteratorA.next();
    iterStateB = iteratorB.next();
  }
}

The same call to zip would now return a generator object:

> zip([1, 2, 3, 4], ['a', 'b', 'c'])
Object [Generator] {}

The actual results can be viewed by turning these results into an array with the Array.from method:

> Array.from(zip([1, 2, 3, 4], ['a', 'b', 'c']))
[ [ 1, 'a' ], [ 2, 'b' ], [ 3, 'c' ] ]

The implementation is getting really close to the Python one, but it still only supports two arrays. How could it be extended to work with an arbitrary number of arrays? The parameters of the function would have to be rolled up into a single iterables parameter, using rest parameters. Each operation on the iterators or on the iterStates is now wrapped in a map, except from the condition of the loop, which uses every to check if any of the iterators are exhausted.

function* zip(...iterables) {
  // Get the iterators
  const iterators = iterables.map((iterable) => iterable[Symbol.iterator]());
  // Keep track of the current iteration state in `iterStates`
  let iterStates = iterators.map((iterator) => iterator.next());
  // Loop until none of the iterators are exhausted
  while (iterStates.every(({ done }) => !done)) {
    // The current values of the iterators are yielded
    yield iterStates.map(({ value }) => value);
    // The current iterator states are updated from the iterator
    iterStates = iterators.map((iterator) => iterator.next());
  }
}

Matrix transposition is no longer an issue for zip:

> Array.from(zip(...[
...   [1, 2, 3],
...   [4, 5, 6],
...   [7, 8, 9]
... ]))
[ [ 1, 4, 7 ], [ 2, 5, 8 ], [ 3, 6, 9 ] ]

What would it take to also implement the zip_longest functionality? Let's change the implementation, such that there will be a zip and a zip_longest function exposed (exported), and an internal _zip function will capture the common functionality.

function* _zip(evaluator, iterables) {
  // Get the iterators
  const iterators = iterables.map((iterable) => iterable[Symbol.iterator]());
  // Keep track of the current iteration state in `iterStates`
  let iterStates = iterators.map((iterator) => iterator.next());
  // Loop until none of the iterators are exhausted
  while (evaluator.call(iterStates, ({ done }) => !done)) {
    // The current values of the iterators are yielded
    yield iterStates.map(({ value }) => value);
    // The current iterator states are updated from the iterator
    iterStates = iterators.map((iterator) => iterator.next());
  }
}

function* zip(...iterables) {
  yield* _zip(Array.prototype.every, iterables);
}

function* zipLongest(...iterables) {
  yield* _zip(Array.prototype.some, iterables);
}
> Array.from(zip(...[[1, 2, 3, 4, 5], [1, 2, 3], [1, 3, 6, 8]]))
[ [ 1, 1, 1 ], [ 2, 2, 3 ], [ 3, 3, 6 ] ]
> Array.from(zip_longest(...[[1, 2, 3, 4, 5], [1, 2, 3], [1, 3, 6, 8]]))
[
  [ 1, 1, 1 ],
  [ 2, 2, 3 ],
  [ 3, 3, 6 ],
  [ 4, undefined, 8 ],
  [ 5, undefined, undefined ]
]

And to reach feature parity with zip_longest, let's also add the fillValue:

function* _zip(evaluator, iterables, fillValue) {
  // Get the iterators
  const iterators = iterables.map((iterable) => iterable[Symbol.iterator]());
  // Keep track of the current iteration state in `iterStates`
  let iterStates = iterators.map((iterator) => iterator.next());
  // Loop until none of the iterators are exhausted
  while (evaluator.call(iterStates, ({ done }) => !done)) {
    // The current values of the iterators are yielded
    yield iterStates.map(({ value, done }) => (!done ? value : fillValue));
    // The current iterator states are updated from the iterator
    iterStates = iterators.map((iterator) => iterator.next());
  }
}

function* zip(...iterables) {
  yield* _zip(Array.prototype.every, iterables);
}

function* zipLongest(...iterables) {
  yield* _zip(Array.prototype.some, iterables);
}

function* zipLongestFill(fillValue = undefined, ...iterables) {
  yield* _zip(Array.prototype.some, iterables, fillValue);
}
> Array.from(zipLongestFill('a', ...[[1,2,3,4,5],[1,2,3],[1,3,6,8]]))
[
  [ 1, 1, 1 ],
  [ 2, 2, 3 ],
  [ 3, 3, 6 ],
  [ 4, 'a', 8 ],
  [ 5, 'a', 'a' ]
]

Thank you very much for reading my post. In case you have any suggestions or questions, ping me on Twitter!


1, The zip function is available in the most popular utility library, Lodash, as _.zip. There are two problems with the Lodash implementation:

  • It implicitly implements the itertools.zip_longest functionality without documenting it, or offering a padding value.
  • Not necessarily a problem, but it only operates on arrays and returns an array without any lazy evaluation. This probably serves an average JavaScript user, but not if you're hungry for scalability and performance.