Nhan Phan

Nhân Phan is an artist and educator whose practice is centered around the interplay between programming and other mediums, including photography, prints, and installation. Focusing on machine learning and critical algorithm studies, his work examines humanity through poetic computational interventions.

“I see computers as living entities—ones that can feel, create, and react. My role as an artist is to collaborate with them in the most poetic way. Together, we become a vessel for emotion to flow through.”

Nhân is a recipient of the Processing Fellowship and the founder of CodeSurfing, a science club for artists in Vietnam.

Gmail @ nhaninsummer
IG @ nhaninsummer

⛲️

⎯ Selected Works

Two agents.
Everyday, one wakes up and writes a love poem.
Six hours later, the other wakes up, erases the memory.

Love letters are generated with OpenAI API, re-trained on artist's personal conversations of his past relationship. /// The rate of forgeting (number of characters to be deleted everyday) is based on Ebbinghaus Forgetting Curve, a mathematical model that describes how quickly people forget information over time.

Fellow artist: thou

Technologist: Nhân Phan

Research advisor: Yui Nguyễn

Technical support: Khôi Nguyễn

để quá khứ sẽ (tiếp diễn) (loosely translated: for the past will (continue)) delves into the layered intersections of memory, language, and urban landscapes. Through an autogenerated poetry mechanism, shadows of trees cast on walls by sunlight serve as a metaphor for the way the past unfolds and lingers in the present. This work is a manifestation of tree’s “freedom of speech’; it allows the trees to

utter the things they cannot talk, about the tree lines that had fallen, for

the bridges and pillars to come, about

the scorching misery that bestowed, upon

the denuded solitude that casted its shadow, over

the trees enlisted to be chopped, off

the promises to re-plant, at

the promises that decaying on paper, about

the continuation of the past continuous.

Drawing from over 50 news articles about Metro Line No. 2, the artist selects 49 key words and phrases, constructing a lexicon of the trees. These words are inserted into a repetitive syntax—"để...sẽ...để...sẽ..." (to…[we] will…to…will) —that loops endlessly, pushing language to its breaking point, where meaning dissolves into oblivion and promises become empty shells. The work is a poetic confrontation between technology and nature, randomness and structure, where sunlit shadows evoke the violent history of urban deforestation. It highlights the tension between the natural world and the relentless forward march of urbanization, exposing the fragility of both memory and language in the process.

Article headlines about the project of Metro 2.

Keywords are randomized into a grid.

Generated poems in the syntax "để...sẽ...để...sẽ..."

thou explained the project to the audiences at 3nam Studio.

Marks of sun spot for performance at 3nam Studio (09/2024).

Audiences observed the poems at 3nam Studio (09/2024).

Go to app ⎯ https://tiengviet.netlify.app/

Typing is a key interaction between us and computer. Typing is a way of using language. Users forming the words in the mind and put them into life by physically pressing on keys. Typing reflects how we use language to rationale and navigate in digital era.

Early 2024, CodeSurfing hosted a research called "Our Vietnamese Project". From the research, we built a dataset of all the mono-syllables of Vietnamese, extracted from "Từ điển tiếng Việt" (Vietnamese dictionary) by Prof. Hoàng Phê. From 30.000 Vietnamese words appear in the dictionary, we created a dataset of roughly 6,000 mono-syllables. While analyzing that monosyllabic dataset, we recognized a structure of Vietnamese. Despite the alphabetic outerwear, the Vietnamese body is actually constructed by:

consonant + vowel + consonant

The current use of alphabetic keyboard (QWERTY) erases that structure and forces Vietnamese to become a character-based language, instead of a component-based language. Each part of a Vietnamese syllable carries certain pronunciation and certain feeling. For example: “phập” starts with “ph” that force us to breath out briefly, “ậ” carries the air back in to fill the mouth, then “p” force the lips and teeth to close abruptly. All of them together to create “phập”, which means a quick and exact bite. Think vampire. The QWERTY keyboard erases that syllabic functionality. And worse, with rules and automation, users are not allowed to form words that are possible to form and pronounce in Vietnamese but not yet appear in the dictionary. There are more than new 70,000 syllables like that, compared to 6,000 official syllables. This proves the abundance and flexibility of our language and suggest endless potential for the growth of Vietnamese.

Display of keyboard together with its research in Ho Chi Minh City (09/2024)

Python snippet to extract initial consonants (phu am dau), ending consonants (phu am cuoi), and group vowels (nguyen am) from all the words in Vietnamese dictionary.

A snapshot of the Vietnamese 30,000-word corpus from Vietnamese dictionary by Prof. Hoang Phe.

和 (hoà) is the state of this universe where everything aligns. — “Thiên thời, địa lợi, nhân hoà” (As above, so below).
和 (hoà) calls for the mediation between human and non-human.
和 (hoà) calls for peace.

"Thu ăn măng giá, đông ăn trúc.
Xuân tắm hồ sen, hạ tắm ao."

和 (hoà) is my ever-echoing chanting for us to immerse back into the nature.

First thing I said when a guy took my shirt off was always ⎯ “Am I too skinny?”

From 2019 to 2022, my best friends and I made occasional getaways to the beaches in Vietnam. There we swam, sunbathed, read, danced, and radiated under the sun. There I found comfort in my own skin. There I realized the beauty of our shapes. The whole process is a healing journey for me.

This pocket notebook includes all the portraits and self-portraits that I documented during the time.

“My boyfriend once said that I was so tiny
That he could carry me in his pocket anywhere
So put me in your pocket
Use me as your time goes by
Use my body as your late-night canvas
Write on me
Compose on me
Fast on me
Slow on me
Release on me
Spit on me
Piss on me
Bleed on me”

✣＊✣
Produced by wedogood.
64 pages on risograph using aqua, yellow, flourescent pink.

Every year when the cicadas start to sing, I miss Japan dearly, as if a part of myself had been buried under the Minami Ishigaki park, where we hung out by the swings, singing, and smoking.

This summer, as the cicadas are singing again, I invited Cao Mieu to join me in a conversation about our Japanese memoirs. But instead of texts, we would reply to each other with artworks. Every page is a response to the previous. All communication takes place only within these pages.

I lost my residence card years ago. Mieu still has hers, so she will hereby board the page first.

Back in April, wedogood invited me to join their zine with the theme of “Love Machine. Machine Love”. This poster is a stand-alone version of my work in the zine. More than a collection of quirky-looking toys, It reflects our current perception of sex toy design (dildos and butt plugs in particular) while suggesting new boundaries for toy design.

After being trained with 3000 photos of toys, the generative model clearly gets the idea that a sex toy needs to be pointed (of course). But it takes the idea further by re-imagining toys with multiple heads, and toys with irregular shapes or shapes that are different from cylinders. Several generated samples also include toys that are bound together since e-commerce often places their toys next to each other in product photos. If such an arrangement stimulates the buyer, then why not include them in the real product design? Many of the generated samples also propose getting rid of the inside of the toys as it is not a significant feature. They suggest void, disjoint parts, transparent material, and anything else but the common solid shape. Pleasure has its own curiosity. And maybe toys for pleasure should also be more suggestive, rather than adaptive.

This project is built on my custom GAN model, inspired by StyleGAN2. The StyleGAN2 architecture itself is gigantic. To afford training, I made multiple adjustments in the architecture, including downsizing the output image size to 128x128. This seriously damaged the print quality but Risograph helped me bypass that. I also divided the training into multiple sessions + used the Tensorflow Data Dataset & Tensorflow Record to optimize the whole training speed. All are for this project to be run on the free resource of Google Colab, which has a limited quota every day. So much engineering just to have more dicks while paying less 🥴

WHY RISOGRAPH?

Generative art is not for size queen. Artworks generated from ML model struggle to have a good resolution. A simple image of 300x300 would take 90,000 units when being flattened. It means that a larger output images come with a larger cost of computation. It often requires days of training on expensive GPU. When it comes to printing, this limit in output results in pixelating details, blurry edges, and inconsistent separation between object and background. Not only that, generative images oftentimes have the checkerboard effect, as a result that the machine “painted” each pixel independently and lack of perception of the image as a whole.

In order to produce this digital artwork in high-quality print (A3), we first put the 128x128 generated images through a half-toned treatment - a technique to simulate the image tone through dots. By carefully adjusting the dot size, we gave the pixelated images a sharper optic illusion in general. A subtly similar pair of aqua ink and purple paper were then chosen to let the half-toned dots blend smoothly with the background. The various size of dots + different % of ink embrace the blurry edge. The aqua ink also expands optically when we tilt the poster to different light direction. Object edges “fade” gradually into paper like chalk. The drawback of pixelating and not having sharp edges is now a compliment toward the initial inspiration of stains.

⎯ Early Programming Practice

The normal way of watching a movie involves experiencing it frame by frame, unfolding linearly over time as visual elements build upon one another. This project reimagines that approach, offering a single, comprehensive view to explore the visual essence of Wes Anderson's films.

To achieve this, each frame of the film was transformed from its original rectangular shape (720x1280) into a long, compressed strip (1x921,600 pixels). These strips were then stacked vertically to create the final artwork. As a result, the movie unfolds vertically from top to bottom, representing the progression from beginning to end. Horizontally, our eyes trace a zigzag pattern, moving through individual scenes from left to right and top to bottom.

Moonrise Kingdom (2012)

Moonrise Kingdom is divided into two distinctive palettes: before the storm and after the storm. The “before the storm” embraces the warm colors of yellow, green, and brown, with scenes mostly shot in bright sunlight while the “after the storm” rages in cooler shades of blue and teal, with lots of scenes without the sun or even in the dark.

The transition from the bright color to the darker one doesn’t follow the change of nature (the arrival of the storm) in the movie. It, however, follows the transition of the characters’ emotions. The change started right after Sam and Suzy got caught by the beach. The following scene of Suzy’s conversation with her mom immediately takes the sunlight out and drowns the movie in the cold tub. Perhaps, the movie’s real storm already raged after that conversation.

The Grand Budapest Hotel (2014)

The movie has many noticeable black columns that run vertically. Their widths vary in the beginning, then become consistent as the movie goes on. These black columns are created from the black margin of the frames. Different size of black margin signifies different screen ratios. In fact, Wes Anderson intentionally used different screen ratios to mimic different eras’ cinematic styles. The 80’s ⎯ 1.85 : 1, The 60’s ⎯ 2.40 : 1, The 30’s ⎯ 1.37:1.

The movie is clearly divided into blocks of colors. Each group of scenes is in one distinctive palette of color. The transition of color is both more extreme and playful than in his early works - Moonrise Kingdom and The Life Aquatic of Steve Zissou.

Isle of Dogs (2018)

Continued with the idea of using colors to define space for characters’ emotions, Isle of Dogs used extreme colors, black and white, to depict two different groups of scenes: the trash island and the city hall. Yellow strips that run horizontally between them are Tracy Walker. She brings light to the revolution of Atari and the dogs.

Several groups of scenes in Isle of Dogs maintain a fixed layout, with both characters and the camera making minimal moves. For example, in the white strip area in the middle, we can see that the black area (the characters) stays in place for several continuous scenes. This can be the effect of stop motion, where continuous scenes have very subtle changes, so the audiences can really focus on such change and the “stop-motion delay” between the change. For example, the making sushi scene. However, when considering other movies by Wes Anderson, The Grand Budapest Hotel also has this same pattern. Many scenes in the movie, especially scenes where characters discuss, have very minimal camera movement. So rather than highlighting the effect of stop motion, in these scenes of Isle of Dog, Wes Anderson is leveraging stop motion to achieve his own distinctive technique. They play out so well and compliment each other.

Isle of Dogs

The Grand Budapest Hotel

View project on Github

VISION2020 aims at recovering a high resolution image from a low resolution one. The project is based largely on the excellent research of Xintao Wang, et al. on ESRGAN (2018) and their implementation using Pytorch. Inspired from the research, my version of ESRGAN is optimized and built entirely on Tensorflow 2.0. It successfully resizes the image up to x64 on square area.

Single image super-resolution (SISR), as a fundamental low-level vision problem, has attracted increasing attention in the research community and AI companies. SISR aims at recovering a high-resolution (HR) image from a single low-resolution (LR) one. Since the pioneer work of SRCNN proposed by Dong et al., deep convolution neural network (CNN) approaches have brought prosperous development. Various network architecture designs and training strategies have continuously improved the SR performance.

The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN.

fig1 ⎯ (x4 per dimension) generated image successfully retains small detail like the strip at the shoulder area and the human head.

fig2 ⎯ (x4 per dimension) Natural features like eyes are well reconstructed.

fig3 ⎯ (x8 per dimension) Double challenging, then model successfully reconstruct pattern and lines.

fig4 ⎯ (x8 dimension) Letters are brought back to vision.

View project on Github 👾

Optical Character Recognition is one active field that bridges between computer vision and natural language processing. As much as the field emerges within machine learning community, it still performs poorly on local language, including Vietnamese with our distinctive symbol (ễ, ẩ, ứ for example). The lack of data is one of the main reason behind it. This project is a submission to a hackathon hosted by Cinnamon AI on that challenge of Vietnamese OCR. The dataset includes all the address written in Vietnamese. The model can be immediately apply in post service to alleviate the need of manual input.

⎯ EVALUATION

Character Error Rate: 0.04
Word Error Rate: 0.14
Sentence Error Rate: 0.82

The hackathon's leader score is 0.1x on the Word Error Rate. Other metric results were not disclosed.

⎯ SAMPLE PREDICTIONS (T: True Label, P: Prediction)

⎯ DATA PREPROCESSING: Using OpenCV with 3 phases

Thresholding
Resize to 128x1024
Remove Recursive (reference to A. Vinciarelli and J. Luettin)

Raw data

Processed data

⎯ MODELLING

CRNN + CTC Loss is used to solve this challenge.
CNN blocks with skip connections (inspired by ResNet50) are used to extract the features from input images.
The extracted feature map will be then passed through the LSTM layers.

Training Log

View project on Google Colaboratory 👾

This project is an entry of the corresponding Kaggle competition.

Every minute, the world loses an area of forest the size of 48 football fields. And deforestation in the Amazon Basin accounts for the largest share, contributing to reduced biodiversity, habitat loss, climate change, and other devastating effects. But better data about the location of deforestation and human encroachment on forests can help governments and local stakeholders respond more quickly and effectively.

From a dataset contains more than 40.000 images, this analysis uses Deep Learning to classify the spatial images of the Amazon forest taken by the satellite. From that, it hopes to shed a light on understanding how the forest has change naturally and manually. Thus, help preventing deforestation.

⎯ SAMPLE PREDICTION

Evaluation

⎯ SOME CHALLENGES

Multi-label: Each image is labeled with multiple tags (at least 2, at max 9). The tags fall into 17 categories, which are the forest landscape types. Since the tags in each label are mutually exclusive, they are treated as multiple binary classification problems. Thus, binary cross-entropy are chosen to be the loss function.
Imbalance: The dataset is severely imbalance with tags like Primary or Agriculture appear in 90% of the dataset. While other tags like Blooming or Conventional Mine can only be seen in less than 500 observations (even less than 100 for Burn Down).

In the first base-line experiment, the model was totally bias toward the major tags. It predicts the major tags to appear in every data and almost never made a prediction with the minor tags. To tackle the problem of imbalance dataset, evaluation metrics must be chosen carefully. F2 is chosen to be the main metrics to evaluate the training. It watches over the harmonic mean between the Precision and Recall while favors Recall specifically. In other word, it is the attempt to reduce the number of False Negative, where the model fails to identify the absence of a tag.
Optimization: 400.000 images, a CNN model, and Google Colab's limited resource do not seem to mix well together. The training was slow at first and interupted often. Several improvements, mostly on the Tensorflow pipeline, were conducted to speed up the training: Using TFRecord to convert the raw images into byte-like data to reduce the amount of time spending on reading data from their paths.

Using tf.data.Dataset with shuffle, map, batch, prefetch to optimize the reading data process by redistributing the tasks for agents to work concurrently, thus, avoid bottleneck. An attempt to use cache was also made but failed due to the limited RAM.

Processing image with Tensorflow: The dataset contains images in JPG - RGBA. The built-in decode function tf.io.decode_jpeg only works on 1 or 3-channel image. Attempt on encoding a JPG RGBA image returns black black and black. We need a tensorflow encoding function to work in this part because the pipeline is built entirely on Tensor for the optimization purpose. To tackle the problem, the raw images were first read by Matplotlib then converted into byte-like and wrote into TFRecords. When reading the data from TF Record, instead of using the built-in decode image function, we use tf.io.parse_tensor following with reshaping.

Nhân Phan

⎯ Selected Works

(2024) Erase.

(2024) để quá khứ (sẽ tiếp diễn)

(2024) Viet Syllable Keyboard

(2024) 和

(2022) Beach Pocket

(2022) ガイジャ別府

(2022) live. laugh. dick.