## PREDICTING MARKET MOVEMENT USING MACHINE LEARNING

When J.P. Morgan was asked what the market will do he responded in three simple words. “It will fluctuate.” He has been proven correct time and time again. The answer we all want to know is will it fluctuate up or down. This article will take a look at using machine learning to get one step closer to that answer.

Collecting large quantities of data to manipulate for free is no simple task. Yet the more data a machine learning algorithm is fed the better it performs. For this reason I decided to find the most readily available data and try to create meaningful Machine learning algorithm around it. The data I selected is simple historical stock open and close prices for the S&P 500 from the 1950’s to this past week. This data is provided by Yahoo Finance as a CSV for all stocks and indices.

A graph of the S&P 500 is essentially all the data we have to work with. So what insights can we make from this? The first and second derivative are commonly used as indicators when dealing with functions. The first derivative is the instantaneous slope at any given point. The second derivative indicates concavity. If velocity is the first derivative acceleration is the second. The first derivative can be represented by taking the change in slope, or difference in price from the previous day. The second derivative is the difference in that price change.

We will be using one simple benchmarks to identify how well the algorithm is performing. The benchmark will be an entirely bullish outlook. Since 1950 if you assumed the S&P 500 would go up every day you would be correct 52.98% of the time. So for this algorithm, anything above 52.98% will be considered a win. That being said, lets shoot for the moon and try to be right 60% of the time.

The Machine Learning algorithm we will be using is Sklearn’s K-Nearest Neighbors python module. For those of you unfamiliar the K-Nearest Neighbors identifier works by taking a binary value assigned to acquired data and plotting it in an N-dimensional space. When new data is presented, the algorithm plots the new data and identifies the closest K points. The new point is then assigned to the value that the majority of its neighbors hold.

For this project I decided to start with information collected from the Adjusted close and a 30 day moving average. Below is a graph showing what the slopes of both of these look like over a one year period. The idea is that the Adjusted close, illustrated in red would more accurately capture the actual market trends while the moving average, in blue, would highlight the market trend overall.