Q-Learning-APL

1 minute read

Warning : Attempt failed. Detail description here.

Source Code

Q

Concept

prototype of Autonomous Parking Lot

Develop Environment

Followed two tutorial(link)

Toy Car: Yahboom Car(link) with Raspberry pi 4b

GPU: GTX 1080ti 11GB

Summary

simple final

Parked state as sign(red triangle with 4 blue stripes)
car moves 10 * 10 virtual space

up
down
right
left

concept_painting

Code Explanation

Developed as three parts

operation of car(direct control of motors)
Q-learning part
inform that car is parked(terminal state)

car operation

link to code line

below method move the car.

def Car_Action(self, act_num):
    if(act_num == 0):
    #forward
    elif(act_num == 1):
    #back
    elif(act_num == 2):
    #left
    elif(act_num == 3):
    #right

Q-learning

import numpy as np

EPISODES = 15
MAX_STEPS = 10

LEARNING_RATE = 0.81
GAMMA = 0.96

Q = np.zeros(100, 4)# 10 * 10 states, 4 actions
for episode in range(EPISODES):
    state = 5#initial state

    for _ in range(MAX_STEPS):
        car.Car_Action(action)
        next_state = step(state, action)

        Q[state, action] = Q[state, action] + LEARNING_RATE * (reward + GAMMA * np.max(Q[next_state, :]) - Q[state, action])

        state = next_state

Simplified Q-learning code(full code here)

reward will be explained below

parked state

from tflite_runtime.interpreter import Interpreter

  interpreter = Interpreter('detect.tflite')
  interpreter.allocate_tensors()
  _, input_height, input_width, _ = interpreter.get_input_details()[0]['shape']

  res = detect_objects(interpreter, img, 0.8)

Using tflite model, detect Sign.

Parked_State_Reward = 188200

reward = ((resMat[small][0] - resMat[small][1]) **2 + (resMat[small][2] - resMat[small][3]) ** 2) / Parked_State_Reward

Assume car is parked when detected sign is biggest.

“resMat” can have multiple detections(rows), due to model’s inaccuracy or threshold of 0.8. Model’s most accurate detection was smallest one.

Material

TensorFlow 2.0 Complete Course - Python Neural Networks for Beginners Tutorial

Tensorflow Object Detection in 5 Hours with Python

https://github.com/nicknochnack/TFODCourse

After thoughts

Why it failed

First, driving is contiunous action, which using q-table is doomed to fail. Second, car can’t spin in place. Last, can’t operate more than one car.

All of the above problems stem from Q-learning using q-table.

Why, why it failed

Should’ve focus on problem, not solution.

In other words, I was too exicted that i learned something, i’ve become somekind of zealot.

Such as “Q-learning is only and best solution, which can be applied to all problem!” or “Even considering another aproach is HERESY!”

J, Park