﻿ Flattening nested lists or arrays in Python | Adam Dimech's Coding Blog

Dear Internet Explorer user: Your browser is no longer supported

Please switch to a modern browser such as Microsoft Edge, Mozilla Firefox or Google Chrome to view this website's content.

# Flattening nested lists or arrays in Python

There may be a requirement to flatten a list of lists in Python in order to extract useful information more easily. Here’s a procedure for flatting nested lists using NumPy.

Let’s say that we have a list of lists in Python that consists of 6 rows of three rows of 3 tuples:

my_list = np.arange(54).reshape(6, 3, 3)

array([[[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8]],

[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],

[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]],

[[27, 28, 29],
[30, 31, 32],
[33, 34, 35]],

[[36, 37, 38],
[39, 40, 41],
[42, 43, 44]],

[[45, 46, 47],
[48, 49, 50],
[51, 52, 53]]])

What I would like to do is move all of these into a single list. The method for doing this is as follows:

flat_list = []
for sublist in my_list:
for item in sublist:
app = item[0], item[1], item[2]
flat_list.append(app)


The result is as follows:

[(0, 1, 2),
(3, 4, 5),
(6, 7, 8),
(9, 10, 11),
(12, 13, 14),
(15, 16, 17),
(18, 19, 20),
(21, 22, 23),
(24, 25, 26),
(27, 28, 29),
(30, 31, 32),
(33, 34, 35),
(36, 37, 38),
(39, 40, 41),
(42, 43, 44),
(45, 46, 47),
(48, 49, 50),
(51, 52, 53)]

This will also work with float data:

my_list_float = np.random.uniform(low=0.5, high=13.3, size=(6,3,3))

array([[[ 7.64772875,  7.32261442,  5.8919348 ],
[ 7.67959047,  7.28771447,  7.11708364],
[ 5.95404959, 11.36673445,  3.17166127]],

[[10.04863019, 11.08430968,  6.2974855 ],
[ 6.74310243,  6.68051813,  8.13872051],
[ 3.91462296, 11.86625555,  2.41639895]],

[[11.6504668 ,  3.8087696 ,  5.88143769],
[ 7.67669979,  1.02997986,  4.39433466],
[ 8.26025775,  7.49887085,  6.45701349]],

[[ 4.60047673,  0.95547007,  4.6866289 ],
[ 4.4085488 ,  6.1421004 ,  0.74564934],
[ 1.4756955 ,  6.10955761,  4.54754207]],

[[ 8.83032188, 10.30319833,  1.41106863],
[12.74891056,  5.15276292, 11.88460046],
[ 2.15324622,  0.83888124,  4.90028548]],

[[ 2.69061118,  7.49136435,  2.06279712],
[ 5.22991475,  2.02261103,  5.01043034],
[11.41747231, 11.05886522,  9.52995719]]])

Transformation is as follows:

flat_list_float = []
for sublist in my_list_float:
for item in sublist:
app = item[0], item[1], item[2]
flat_list_float.append(app)

Here’s the result:

[(7.647728749846483, 7.3226144221046745, 5.891934804813722),
(7.679590473720329, 7.287714467211416, 7.117083640535063),
(5.9540495865025385, 11.366734450071698, 3.1716612742146952),
(10.048630187737741, 11.084309676161517, 6.29748550064935),
(6.743102426532033, 6.680518126429604, 8.138720514358157),
(3.9146229614581163, 11.866255553685654, 2.416398947954619),
(11.650466798166732, 3.808769602203046, 5.881437693784979),
(7.676699788355214, 1.02997986458537, 4.394334660197896),
(8.260257754471812, 7.498870852521678, 6.457013490088335),
(4.600476727528558, 0.9554700683253913, 4.6866289044845955),
(4.408548797822865, 6.142100397755145, 0.7456493387147191),
(1.4756955015730724, 6.109557605565641, 4.547542069988677),
(8.830321877112546, 10.303198331307952, 1.4110686293869592),
(12.748910562861633, 5.15276292165281, 11.88460046102817),
(2.153246215503066, 0.8388812405187267, 4.900285481193625),
(2.690611175324618, 7.491364351392079, 2.0627971220114674),
(5.229914753619622, 2.022611031625753, 5.010430342792709),
(11.417472312739736, 11.058865218349133, 9.529957189459033)]

This relies on one knowing the dimensions of the original list in order to identify how many items should be included in each row. To check this, use the np.shape() function:

np.shape(my_list)
(6, 3, 3)

This shows that there are six rows of three rows of three values (tuples of 3). Hence I have 3 items to place in each row.

## Flattening image data

Let’s say that I have loaded a 4000×6000 pixel RBG image and I want to examine the RGB values. The data structure for images can be somewhat complex:

img.shape
(4000, 6000, 3)

Let’s take a closer look:

array([[[123,  84,  46],
[124,  85,  47],
[122,  86,  46],
...,
[120,  85,  42],
[118,  82,  42],
[118,  79,  40]],

[[122,  84,  42],
[122,  84,  42],
[122,  86,  46],
...,
[125,  83,  46],
[123,  82,  43],
[123,  82,  43]],

[[124,  86,  44],
[125,  87,  45],
[123,  87,  47],
...,
[125,  84,  45],
[121,  85,  45],
[120,  83,  45]],

...,

[[127,  90,  52],
[124,  87,  49],
[126,  87,  48],
...,
[116,  81,  41],
[114,  81,  42],
[113,  80,  41]],

[[130,  89,  50],
[129,  88,  49],
[127,  88,  49],
...,
[118,  79,  40],
[117,  80,  42],
[116,  78,  43]],

[[125,  88,  50],
[125,  88,  50],
[130,  92,  50],
...,
[119,  79,  44],
[119,  79,  44],
[119,  79,  44]]], dtype=uint8)

Again, this can be flattened:

img_vals = []
for sublist in img:
for item in sublist:
app = item[0], item[1], item[2]
img_vals.append(app)

Let’s look at the result (just the first 10 rows to keep it succinct):

img_vals[1:10]

[(124, 85, 47),
(122, 86, 46),
(125, 89, 49),
(124, 89, 46),
(124, 89, 46),
(127, 88, 49),
(126, 87, 48),
(129, 88, 49),
(126, 85, 46)]

Having the data in this format can make it more amenable to analysis.

## Arrays

The procedure also works with nested NumPy arrays:

my_array = np.random.rand(6, 3, 3)

array([[[2.68938078e-01, 3.03029652e-01, 4.99954708e-04],
[3.87501333e-01, 3.14020133e-01, 9.08430133e-01],
[4.58669042e-01, 2.55191827e-01, 5.95984591e-01]],

[[3.26772058e-01, 5.32847684e-02, 6.95896513e-01],
[7.79180327e-01, 5.39470466e-01, 3.89746699e-01],
[8.85186760e-01, 1.68279158e-01, 8.22998418e-01]],

[[5.42985981e-01, 2.29909852e-01, 5.10764123e-01],
[1.62395196e-01, 2.64931714e-01, 3.30373878e-01],
[3.86584975e-01, 9.13907766e-01, 9.36258437e-01]],

[[2.65445225e-01, 5.34949759e-01, 6.73359483e-01],
[7.10202629e-01, 9.13562863e-01, 2.41559349e-01],
[4.31798210e-02, 2.76302599e-02, 2.08922124e-01]],

[[2.59812024e-01, 8.13658620e-01, 8.72425702e-01],
[1.87496490e-01, 5.32630002e-01, 3.97260842e-01],
[2.64988944e-01, 1.52673853e-01, 5.20502347e-01]],

[[8.54809065e-02, 7.51949747e-01, 8.17904626e-01],
[1.71576534e-01, 8.63536326e-01, 9.50612491e-01],
[3.89926892e-01, 2.73156473e-01, 5.16339903e-01]]])

The procedure is the same:

flat_list_array = []
for sublist in my_array:
for item in sublist:
app = item[0], item[1], item[2]
flat_list_array.append(app)

And the result is also flat:

[(0.26893807759106747, 0.3030296515665212, 0.0004999547077128019),
(0.3875013332390177, 0.31402013341015966, 0.908430132728318),
(0.4586690417466516, 0.25519182664367424, 0.595984591253244),
(0.3267720575588311, 0.053284768372558466, 0.6958965128312675),
(0.7791803270645966, 0.5394704662655452, 0.3897466991862212),
(0.8851867597853157, 0.16827915830897533, 0.822998417712234),
(0.5429859809720041, 0.22990985183666102, 0.5107641233910157),
(0.16239519577914163, 0.2649317139278563, 0.3303738780556117),
(0.3865849751251127, 0.913907766153359, 0.9362584368229232),
(0.26544522534351045, 0.5349497592723044, 0.6733594833336637),
(0.7102026289864715, 0.9135628630498965, 0.24155934896837317),
(0.043179821044459055, 0.027630259948245195, 0.20892212448339365),
(0.25981202363320566, 0.8136586203186404, 0.8724257017993466),
(0.1874964899309367, 0.5326300022759225, 0.39726084230189074),
(0.2649889435674385, 0.15267385292512803, 0.5205023466060287),
(0.08548090645755879, 0.7519497470401438, 0.817904625985884),
(0.17157653433182307, 0.8635363260966877, 0.9506124914553552),
(0.3899268922677569, 0.2731564734230294, 0.5163399026175653)]

<a href="" title=""> <b> <blockquote cite=""> <code> <em> <i> <q cite=""> <strike> <strong>