Cluster Animation

Week 18
Week 17
Week 16
Week 15
Week 13
Week 12
Week 11
Week 9
Week 8
Week 7
Week 5

Week 18

Final full-screen highlight to identify gene macropixel being changed

Test bullseye to better identify gene macropixel being changed

Animation updated visual running with water system

Updated animation visuals to better emphasize the gene macropixel getting changed and the color change
Adjusted the max number of bases in a sequence that can be changed by any one data input
Testing complete program (animation and water system)
- Fixed screens going out of sync due to time delays between Arduino data input and animation pauses
- Plan to adjust initial main tank water level to be above middle level slightly (to ensure cyan dye does not drain before other behaviors can occur)
- Note that starting the animation program causes the Arduino to reset.

Week 17

Arduino connected to raspberry pi by serial port

Isolated gene updated color (row 2, column 3 monitor)

Gene grid with updated gene color close-up

Animation with randomized gene selection

Running on 8-screen cluster

num_of_processes = 9

Randomized gene that gets updated for each data send

gene_rowid = random.randint(1, len(sequence_list)) #rank 0

Testing different animation background colors and room lighting for visibility (sticking with dark/neutral gray)

Week 16

Animation Composition Adjustment

Gene macropixel grid across full screen space (2x4 monitor set-up), centered, not subdivided per screen.

if rank != 0:
    #...#
    # create base pixel screen (separate surface)
    base_genome_surface = pygame.Surface(universe_size)
    base_genome_surface.fill(bg_color)
    for row in range(universe_grid_y): # left to right, top down ordering of genes from dataset
        for col in range(universe_grid_x):
            color_id = color_id + 1
            if color_id >= len(sequence_colors):
                color_id = color_id_start
            try:
                pygame.draw.circle(base_genome_surface, sequence_colors[color_id], (upper_left_corner[0]+(pixel_width//2)+(pixel_width*col), upper_left_corner[1]+(pixel_height//2)+(pixel_height*row)), pixel_radius)
            except:
                print("Exception: color_id set out of bounds of sequence_colors at index", color_id)


    # create screen/surface for showing just the updated element
    update_gene_surface = pygame.Surface(universe_size)
    update_gene_surface.fill(bg_color)

Periodic constant Arduino data sent by serial into animation, including tweaks for rate of sending due to difference between data sending speed and animation updating speed.

while True: # rank 0, code based on https://roboticsbackend.com/raspberry-pi-arduino-serial-communication/
        if ser.in_waiting > 0: # according to the arduino sends, should occur at the same rate as the arduino timer (5s, maybe make greater)
            line = ser.readline().decode('utf-8').rstrip()
            print(line)
            arduino_data = line.split(",") # ["#", "#", "#", "\n"]

// from SafeString V3 library https://www.forward.com.au/pfod/ArduinoProgramming/TimingDelaysInArduino.html#examples
#include "loopTimer.h"
#include "millisDelay.h"
#include "BufferedOutput.h"
createBufferedOutput(bufferedOut, 80, DROP_UNTIL_EMPTY);
millisDelay printOutTimer;


void setup() {
  Serial.begin(9600);
  bufferedOut.connect(Serial);
  printOutTimer.start(1000*10); // every 10 sec
}

void loop() {
  bufferedOut.nextByteOut();
  if(printOutTimer.justFinished()){
    printOutTimer.repeat();
    bufferedOut.print("400");
    bufferedOut.print(",");
    bufferedOut.print(220);
    bufferedOut.print(",");
    bufferedOut.print("600");
    bufferedOut.print(",");
    bufferedOut.println();
  }
}

Week 15

Animation Test on Cluster without serial data

Mapped sensor data ranges into colors and bases that directly replace bases in the given gene sequence of the genome data.

m_instances = int(np.interp(arduino_data[0], [0, 3000], [0, (len(gene_edit)-1)//3])) # number of bases to replace with M, direct map higher val = more bases
c_instances = int(np.interp(arduino_data[1], [120, 362], [0, (len(gene_edit)-1)//3])) # number of bases to replace with C, map lower water level w more bases (more dye added)
y_instances = int(np.interp(arduino_data[2], [0, 1023], [0, (len(gene_edit)-1)//3])) # number of bases to replace with Y, map only above threshold values; use if statement to make negatives into 0
print("# of bases to change m c y:",m_instances, c_instances, y_instances)
# get nonrepeating indices in sequence to change
replace_indices = random.sample(range(len(gene_edit)), k=(m_instances+c_instances+y_instances)) # gets nonrepeating set (equal to total bases to be replaced) of indices in sequence to change num
## https://stackoverflow.com/questions/1228299/changing-a-character-in-a-string/22149018#22149018 -> rec (for speed), splice a replacement in immutable string
# change the base at the specified index
for m in range(m_instances): # go over the first m_instances # of indices to replace, remove them each iteration to cont. use 0 index of sample list
     gene_edit = gene_edit[:replace_indices[0]] + "M" + gene_edit[replace_indices[0]+1:] # splices around the index to change (i), and concatenates the new "base" in between
     replace_indices.pop(0)
for c in range(c_instances): # go over the 1st(next) c_instances # of indices to replace, remove them each iteration to cont. use 0 index of sample list
     gene_edit = gene_edit[:replace_indices[0]] + "N" + gene_edit[replace_indices[0]+1:] # splices around the index to change (i), and concatenates the new "base" in between
     replace_indices.pop(0)
for y in range(y_instances): # go over the 1st(next) y_instances # of indices to replace, remove them each iteration to cont. use 0 index of sample list
     gene_edit = gene_edit[:replace_indices[0]] + "Y" + gene_edit[replace_indices[0]+1:] # splices around the index to change (i), and concatenates the new "base" in between
     replace_indices.pop(0)

Divided portion of genes to each screen (approximately 18k gene macropixels per screen).
- Ideating how to divide genes continously across all screens (instead of assigning a specific subset to individual screens).

color_id_start = (grid_x*grid_y)*(rank-1) # set the index in the list of gene sequence colors for the given screen to start on (rank)
color_id = color_id_start

Scattered gene sequences across cluster nodes to process gene colors faster.

if rank == 0:
    #.....#
    num_of_processes = 4 # CHANGE WHEN PROGRAM IS ADJUSTED FOR NUM OF PROCESSES IN MPIRUN
    split_gap = len(sequence_list) // num_of_processes
    sequence_list_scatter = []
    for i in range(num_of_processes): # 0 1 2 3 4 5 6 7 8
        if i != (num_of_processes-1):
            sequence_list_scatter.append(sequence_list[split_gap*i:(split_gap*i)+split_gap])
        else:
            sequence_list_scatter.append(sequence_list[split_gap*i:]) # to end


# scatter a 2d list of num_of_processes length
print("rank", rank, "waiting for sequence_list scatter")
sequence_list_scatter = comm.scatter(sequence_list_scatter, root=0)
print("rank:", rank, "received portion of sequence_list of length", len(sequence_list_scatter))

Using MPI broadcast function and sleep function to control speed of screen changes in animation.

print("new color:", sequence_colors[gene_rowid-1])
print("rank", rank, "sending new sequence_colors to ani")
comm.bcast(sequence_colors, root=0) # send out list again (could alternatively send out the sequence alone + rowid but idk if thats any faster)
# ^ at this point, the animation updates color for a moment on black screen still, then returns to normal full display; and rpi5 waits for new serial again
print("rank", rank, "sent new sequence_colors to ani")

Animation running while excluding the primary node from showing animation.
- 8-screen cluster will have the primary node without a screen, so the command separates it from the animation and makes it only handle data processing.

>mpirun --host node1,node2:2,node3:2,node4:2 python3 gene_animation.py

Week 13

Planned order of animation for MPI and adjusting database directly.

Notes on MPI code outline for updating animation page 9.

Tested broadcasting lists with MPI as potential format for distributing genome and Arduino data across all 5 computers.

Function for generating color of a gene sequence:

def sequence_color(sequence):
    # generates the sequence color based on % of certain nucleotides in the sequence
    len_sequence = len(sequence)
    red_count = 0
    green_count = 0
    blue_count = 0
    for n in sequence:
        if n == "A":
            red_count += 6
        elif n == "C":
            green_count += 6
        elif n == "G":
            blue_count += 6
        elif n == "T":
            red_count += 2
            green_count += 2
            blue_count += 2
        elif n == "M":
            red_count += 3
            blue_count += 3
        elif n == "N": # cyaN (since c is already used by genome)
            green_count += 3
            blue_count += 3
        elif n == "Y":
            red_count += 3
            green_count += 3
        else:
            pass # not one of the bases, could be U? otherwise an error
    # count of each nucleotide is worth 6 total so need to scale up sequence to 6 to get proper percentage i think..
        # percentage of each color to be part of the RGB color of the overall sequence
    red_per = red_count / (len_sequence*6)
    green_per = green_count / (len_sequence*6)
    blue_per = blue_count / (len_sequence*6)
    # print(red_per, green_per, blue_per)

    # actual color
    red_val = 255*red_per //1
    green_val = 255*green_per //1
    blue_val = 255*blue_per //1

    # return color
    # print(red_val, green_val, blue_val)
    return (red_val, green_val, blue_val)

Week 12

Notes on total number of pixels and number of sequences in data page 2.

Notes on gridding screen space for individual gene sequences page 3.

Notes on new visual mappings for each gene page 5.

Concept notes on new animation visuals.

Week 11

Using PySerial to communicate Arduino's sensor data to Raspberry Pi program.

Reference for PySerial: https://roboticsbackend.com/raspberry-pi-arduino-serial-communication/

Confirming that BufferedOutput library can be used to send data across serial port in place of Serial (Run PySerial program from roboticsbackend):

// from SafeString V3 library https://www.forward.com.au/pfod/ArduinoProgramming/TimingDelaysInArduino.html#examples
#include "loopTimer.h"
#include "millisDelay.h"
#include "BufferedOutput.h"
createBufferedOutput(bufferedOut, 80, DROP_UNTIL_EMPTY);
millisDelay printOutTimer;

void setup() {
  Serial.begin(9600);
  bufferedOut.connect(Serial);
  printOutTimer.start(5000); // every 5 sec
}

void loop() {
  bufferedOut.nextByteOut();
  if(printOutTimer.justFinished()){ // print out the millis every 5 seconds
    printOutTimer.repeat();
    bufferedOut.println(millis());
  }
}

Send Arduino sensor data (through the global behavior variables) to the Raspberry Pi in the main code (snippet) [Removed extreme cases created in Week 10 due to errors making it impossible to properly test PySerial otherwise, see Week 10 Water Environment notes]:

void printData(){
  if(printOutTimer.justFinished()){
    printOutTimer.repeat();
    bufferedOut.print(socialize);
    bufferedOut.print(",");
    bufferedOut.print(hydrate);
    bufferedOut.print(",");
    bufferedOut.print(create);
    bufferedOut.print(",");
    bufferedOut.println();
  }
}

Week 9

WIP Genome Animation on 8-screen cluster set up

import os
import os.path
import sys
import pygame
from pygame.locals import *
import time
import random
from mpi4py import MPI
import math
import sqlite3

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# set window position
os.environ['SDL_VIDEO_WINDOW_POS'] = "0,0"
# get this display
#print(dict(os.environ))
os.environ['DISPLAY'] = ':0.0'
# with caffeine installed (sudo apt install caffeine) we can wake the screen
# os.system('caffeinate sleep 1') # passes caffeinate out to shell to wake screensaver

# init clock and display
clock = pygame.time.Clock()
pygame.display.init()
pygame.mouse.set_visible(False)

# get the screen hight and width
disp_info = pygame.display.Info()
width = disp_info.current_w
height = disp_info.current_h
screen_size = (width,height) # adjust screen size manually is monitor is not the same dimensions as the main monitor
universe_size = (width*2, height*2)
screen = pygame.display.set_mode((screen_size), pygame.FULLSCREEN)

connection = sqlite3.connect('genome_human.db') # change to full path name?
cursor = connection.cursor()

cursor.execute("SELECT sequence FROM genome_human WHERE protein LIKE '%uncharacterized%'")
# SELECT * will return [(gene, protein, sequence),...]
    # () <- tuple so even if 1 its still (value,) at least
    # obtain thru tuple indexing: seq1 = var[0][0] (1st list item, index 0 of tuple)
var1 = cursor.fetchall()
connection.close()

def nucleotide_color(nucleotide):
    if nucleotide == "A":
        return (255,0,0) #red
    elif nucleotide == "C":
        return (0,255,0) #green
    elif nucleotide == "T":
        return (0,0,255) #blue
    elif nucleotide == "G":
        return (100,0,255) #purple?
    else:
        return (0,0,0)

def sine_wave(shift,sequence):
    len_sequence = len(sequence)
    for i in range(height):
        # sine transformations -> bsin(ax-w)+h (b increases amplitude, a gives inverse frequency increase, w shifts horizontally, h shifts vertically)
        # pygame.draw.line(screen,(50,100,255),(0,i),((width/2)+((width/2)*math.sin((height/4)*(i-shift))),i))
            # moves down the screen
        # pygame.draw.line(screen,(50,100,255),(0,i),((width/2)+((width/2)*math.sin((height/4)*i)),i))
            # this (width/2)+((width/2)*math.sin((height/4)*i)) did NOT do what i expected at all but its GORGEOUS
        if i >= len_sequence:
            pygame.draw.line(screen,(255,255,255),(0,i),(int((width/2)+((width/2)*math.sin((height/4)*i))),i))
        else:
            pygame.draw.line(screen,nucleotide_color(sequence[i]),(0,i),(int((width/2)+((width/2)*math.sin((height/4)*i))),i))
    return

framerate = 30
runtime = 10 #seconds
protein_index = 0
# time to run
# timeToRun = framerate*runtime
timeToRun = len(var1)

for f in range(timeToRun):
    # screen.fill((50,30,255))
    screen.fill((0,0,0))
    sine_wave(f,var1[protein_index][0]) #shift later to go thru each sequence, this would be per frame but i could try other methods later
    protein_index = protein_index + 1
    if protein_index > len(var1):
        protein_index = 0
    comm.Barrier()
    pygame.display.update()
    clock.tick(framerate)

Week 8

Preliminary animation of human genome uncharacterized proteins. Each frame is an individual sequence with the nucleotides mapped individually in color. (See code below)

# from oneball.py (should also copy onto cluster) of VIS 141A
import os
import os.path
import sys
import pygame
from pygame.locals import *
import time
import random
# from mpi4py import MPI
import math
import sqlite3

# comm = MPI.COMM_WORLD
# rank = comm.Get_rank()
# size = comm.Get_size()

# set window position
os.environ['SDL_VIDEO_WINDOW_POS'] = "0,0"
# get this display
#print(dict(os.environ))
os.environ['DISPLAY'] = ':0.0'
# with caffeine installed (sudo apt install caffeine) we can wake the screen
# os.system('caffeinate sleep 1') # passes caffeinate out to shell to wake screensaver

# init clock and display
clock = pygame.time.Clock()
pygame.display.init()
pygame.mouse.set_visible(False)

# get the screen hight and width
disp_info = pygame.display.Info()
width = disp_info.current_w
height = disp_info.current_h
screen_size = (width,height) # adjust screen size manually is monitor is not the same dimensions as the main monitor
# universe_size = (width*2, height*2)
screen = pygame.display.set_mode((screen_size), pygame.NOFRAME)

# sqlite search cross with csv --> get list of uncharacterized protein codes, find each of those in sqlite database
    # wondering if i should just put the protein names into sqlite db as well to better describe the protein and just search that way
    # updated database: genome_human (gene VARCHAR(14), protein VARCHAR(132), sequence VARCHAR(107976), changed_sequence INTEGER)
        # defaults changed_sequence to 0 (represents false), changes to 1 if serial data being read is giving into a change to the sequence (duplicate database or dataset to read)

connection = sqlite3.connect('genome_human.db') # change to full path name?
cursor = connection.cursor()

cursor.execute("SELECT sequence FROM genome_human WHERE protein LIKE '%uncharacterized%'")
# SELECT * will return [(gene, protein, sequence),...]
    # () <- tuple so even if 1 its still (value,) at least
    # obtain thru tuple indexing: seq1 = var[0][0] (1st list item, index 0 of tuple)
var1 = cursor.fetchall()
# cursor.execute("SELECT protein FROM genome_human WHERE protein LIKE 'uncharacterized%'")
# var2 = cursor.fetchall()
# print(len(var1)) #1638 (for protein)
# print(len(var2)) #1564 (for protein)

# check proteins that have augment before "uncharacterized"
# for p in var1:
#     if p not in var2:
#         print(p)
# described uncharacterized:
#     ('LOW QUALITY PROTEIN: uncharacterized protein',)
#     ('putative uncharacterized protein C3orf49',)
# do i want to include those above or even make them smth different based on being putative or low quality?
# print(var)
framerate = 24
runtime = 10 #seconds

def nucleotide_color(nucleotide):
    if nucleotide == "A":
        return 50
    elif nucleotide == "C":
        return 100
    elif nucleotide == "T":
        return 150
    elif nucleotide == "G":
        return 200
    else:
        return 255

def sine_wave(shift,sequence):
    len_sequence = len(sequence)
    for i in range(height):
        # sine transformations -> bsin(ax-w)+h (b increases amplitude, a gives inverse frequency increase, w shifts horizontally, h shifts vertically)
        # pygame.draw.line(screen,(50,100,255),(0,i),((width/2)+((width/2)*math.sin((height/4)*(i-shift))),i)) 
            # moves down the screen
        # pygame.draw.line(screen,(50,100,255),(0,i),((width/2)+((width/2)*math.sin((height/4)*i)),i))
            # this (width/2)+((width/2)*math.sin((height/4)*i)) did NOT do what i expected at all but its GORGEOUS
        if i >= len_sequence:
            pygame.draw.line(screen,(255,255,255),(0,i),((width/2)+((width/2)*math.sin((height/4)*i)),i))
        else:
            pygame.draw.line(screen,(50,100,nucleotide_color(sequence[i])),(0,i),((width/2)+((width/2)*math.sin((height/4)*i)),i))
    return

protein_index = 0
# time to run
# timeToRun = framerate*runtime
timeToRun = len(var1)

for f in range(timeToRun):
    # screen.fill((50,30,255))
    screen.fill((0,0,0))
    sine_wave(f,var1[protein_index][0]) #shift later to go thru each sequence, this would be per frame but i could try other methods later
    pygame.display.update()
    protein_index = protein_index + 1
    if protein_index > len(var1):
        protein_index = 0
    clock.tick(framerate)

connection.close()

Week 7

2x4 screen display setup with vertical screen placement

Rearranging screen display for cluster: 3x3 horizontal screens to 2x4 vertical screens. Intended to improve scaling and utilization of wall space.

Fixed genome .fna file reading: Wrote out gene ID, protein name, and sequence to SQLite database. (See code below)

Fixed PySerial communication and ran sample code to print out simple Serial output from Arduino:

https://roboticsbackend.com/raspberry-pi-arduino-serial-communication/

# reading the human genome fna file

# code ref from ch 19 https://coding4medicine.com/backup/Python/reading-fasta-files.html
import sys
import time
import sqlite3

f=open("ncbi_dataset/data/GCF_000001405.40/cds_from_genomic.fna", "r") # open file with the genome data

connection = sqlite3.connect('genome_human.db') # change to full path name?
cursor = connection.cursor()

cursor.execute('CREATE TABLE genome_human (gene VARCHAR(14), protein VARCHAR(132), sequence VARCHAR(107976), changed_sequence INTEGER)') #determine the appropriate varchar max for each of these by reading through the file
# https://dba.stackexchange.com/questions/190674/can-a-database-table-not-have-a-primary-key 
    # use rowid to see primary key
# changed_sequence is boolean as integer (0 false, 1 true) --> indicates whether the sequence item has been changed or not by external adjustments to its nucleotides

connection.commit() # commit to creating the file before trying to insert into it

for line in f:
    # read over each line of the file
    if(line.startswith(">")):
        if(new_gene):
            new_gene = False
            sequence_list = sequence_string.split("\n") # remove the \n from btwn each sequence subpart from the fna format and any blank lines
            sequence_string = "" #clear out sequence so that it doesnt concatenate old sequences w new sequences
            sequence_output = "".join(sequence_list) # join together that list to form one continuous sequence line

            cursor.execute('INSERT INTO genome_human VALUES (?,?,?,0);', (gene_id, protein_name, sequence_output))

        identity = line.split(" [") # split the elements of each into 1c1, gene], db], pro], proid], loc], gbkey] (since protein= can have spaces in btwn)
            # gene always gene=###] --> same way to obtain as protein_id with splice or split and cut 5 (after =) to length
            # protein_id will always be protein_id=#####] so can obtain number as split at = and get num minus ending bracket or splice index 11 to length (to keep out bracket)
        # get gene
        for i in identity:
            if(i.startswith("gene=")):
                gene_id = i
                break
        gene_id_len = len(gene_id)
        gene_id = gene_id[5:(gene_id_len-1)] # cuts out "gene=" and "]" ; alternately can put len("gene=") in place of 6

        for i in identity:
            if(i.startswith("protein=")):
                protein_name = i
                break
        protein_name_len = len(protein_name)
        protein_name = protein_name[len("protein="):(protein_name_len-1)]
        
    else:
        new_gene = True
        sequence_string = sequence_string + line # once starting a new gene, the string will be split at "\n" and pieced back together as a string

print("Done")
f.close()
connection.commit()
cursor.close()
connection.close()

Week 5

Researching FASTA format of human genome data:

Testing reading out genome data in Python:

Issue: CSV file ended up being over 9GB!! (from a 2GB FASTA .fna file)
Resolution: The variable holding each sequence was not being cleared for each new sequence and continuously concatenated the sequences (memory leak...).

Current issue: PySerial not installing on Raspberry Pi 5 (Spoiler Alert: It was already pre-installed)

Table of Contents