Understanding the Problem and Stack ( Litecoin Inference )
I decided to rip off the bandaid and use my Python skills to do some Data Science and retire R, as it really wasn't serving me to learn neural networks. The first thing I had to do was to figure out all the Python libraries and this is what I settled on: a client library for Twitter, a current stock data library, and a stock technical analysis library. After that, I found a great open API for NLP sentiment analysis. The next library I needed was data storage and manipulation. I used a db library to store the data and, finally, Numpy and Panda to manipulate the data while experimenting.
This is what I built to solve the problem.
I took around 1,000 tweets daily with the search term Litecoin and did sentiment analysis on them to find out of the 42 tweets per hour how many of those are positive, negative, or neutral, and stored that in a MySQL database table.
Next, I took the stock data from Yahoo Finance and ran it through technical analysis software, looking for candle stick graph patterns like three white nights. I saved all the standard stock data, like open price, closing price, and volume, along with the five candle stick patterns I was looking for, into a table with a shared column name for the timestamp between the sentiment and stock tables.
I would build the model based on this data selecting the most relevant features.
The following are the libraries I used to collect and build the model.
The Python Stack:
Tensorflow: A full features ML / AI stack from Google to build and run models in Python.
Transformers: A open ML API from Huggingface, an Open AI company, with the open twitter-roberta-base-sentiment-latest model and API to perform sentiment analysis.
Tweepy: Twitter client for Python that allows searches via API 2v ( worked until Mid June )
Yahoo Finance: Solid stock data from Yahoo Finance
Ta-lib formwork: Technical analysis framework that took some compiling but worked great https://ta-lib.org/ something to find candle stick patterns in stock data.
Pymysql/MariaDB: libraries used to work with my Marainda DB database.
Pandas / Numpy for data and ETL handling.
The biggest challenge was to get tensor flow working on my ARM Kubernetes hardware. I built my stack as docker containers and used docker build for container building, and then I uploaded my containers to Dockerhub in a private repo. Next, we will get into my ARM K3s Kubernetes cluster, the platform I chose for this project.
Comments
Post a Comment