My journey from stock trading to real estate investing

I was an avid stock investor for a limited time (2009 to 2015). I subscribed to investing in biotech companies prior to phase 1 to 3 clinical trials and FDA approval review dates (prior to 2012, growth stock investing, Mark Minervni SEPA and IBD strategies) where on a timeline of around 3 to 6 months ahead of the future event date, attracted buyers and stock price appreciation (mini bubble cycles).  The goal was to buy within 3 to 6 months of a clinical trial or FDA approval date, using technical analysis (studying price and volume action) and sell for a hopeful profit before the clinical trial or FDA review date.

I had success with this type of mid term trading strategy growing a 10k account close to 20k at its peak for a + 98% profit. This occurred from July 2013 to early 2014:

returns

The excel file below tracks my trading numbers, daily diary, post analysis review. It is a very specific and disciplined process. There are some gold nuggets in here for any aspiring biotechnology investor and my hat is off to anyone who delves into this most difficult profession of stock speculation!

Andrew_J_Bannerman_June_24_2013_April_07_2014

Catalyst Run Up Calander and DD Revise NEED NEW PLAN

What happened next? 

After this success I began to trade crude oil futures with too much trading size (too many contracts relative to account balance – leverage!) and gave back what I had made. I ended up closing my account down after this period.

Now that I was out of the market – I began researching and developing quantitative trading strategies and designing data driven models. I learned two programming languages during this time, R and Julia and developed the Julia package MarketCycles.jl. The direction and remains of my quantitative research can be found on this blog.

Why I decided to invest in real estate?

In designing intra-day (day trading) futures trading strategies for the ES mini using evolutionary machine learning grammatical evolution / genetic programming  algorithms. The probabilities of success and the level of risk became apparent – In most cases I was taking on a large degree of uncertainty and risk for potentially a $10,000 per year profit for any 1x ES mini trading strategy. This is on a good performance year and at any given time my strategy could go out of style and stop working or 100% lose money altogether. On the probabilities of success side, it appeared to be random if a strategy would hold up or generate returns at all. Post learning note: to manage both risk and probabilities it would be prudent to layer multiple trading strategies diversified over anti correlated markets.

In any case, my wife mentioned investing in real estate and we had recently just flipped our current house we lived in. We lived in our house (single family home) while we renovated. Post renovation we assessed our annual expenses against comparable market annual rental income – we learned if we rented out our house we would net $6,000 per year.  This ignited further research into real estate. We thought we could purchase another single family home renovation project and repeat the process.

One day whilst laying on the couch talking to my wife – multi-family apartments came to thought – in relation to single family homes it was more doors per property – perhaps this was a viable course to take. Now this is March 2018. We started to look at duplex to fourplex properties (conventional loans apply).

While searching the public real estate listings we saw a triplex property on the market listed for 265k. We messaged the broker for the expense and income statement – after running the numbers with a 25% down payment – after all insurance, mortgage, property tax, operating costs we would net 12k per year. For a 25% down payment (investment = .25 * 265,000) 66,250 + closing costs for a total investment of 73,518, the net cash flow of 12,000 = 12,000 / 73,518 = 16% annual returns. We did not purchase as we needed 10k more to invest and by the time we had the money the seller removed their property from the market.

16% annual cash flown returns is an amazing % return on this single triplex property. The risk profile was totally different from the stock market. This was a real asset which historically goes up in value over time and where we could have vacancy for a portion of the year and still meet our expense obligations (breakeven point for the property managed by total 3 door revenues). Note: if you have to pay for a real estate investment and has negative cash flow, this is a liability (may turn into a asset over time when mortgage is paid down) . We also noted that rents tend to be unchanged during economic recessions. We came to now understand the power of real estate investments. A more detailed look can be found on the post Stocks vs Real estate: Post

What happened next? 

In March 2019 we invested in a triplex that was priced below market on a price per square foot and rent per sq foot basis. We found the property listed on loopnet and it was in need of repairs and upgrades. We fixed the property to modern standards and subsequently raised the rents thus increasing net operating income and the valuation of the property.

The before and after photos can be found in the .pdf below as well as the financials of the deal.

Triplex Repositioning Summary

Our initial investment for 25% down payment, closing costs and renovation costs = $100,000 with a net cash flow of 18,000 yields an annual cash flow return of 18%.

What I learned about investing in value add real estate from my stock trading journey? 

  1. Always respect risk – buy for cash flow from day 1. If the deal cash flows from day 1 there is no pressure to pay any expense obligations out of pocket.
  2. Be conservative in investment analysis – set profit margin expectations below market comparables by some degree of error – hence there is buffer to protect from any oversight during investment selection process.
  3. Have a cash cushion for any unexpected surprise’s during the renovation process.

If manage 1 to 3 well- you will more than likely come out ahead.

From this point forward we are 100% committed to real estate investment. If you are interested in viewing our analysis of advantages of real estate investing and stock index investing see the following post: Is stock index investing or real estate a better investment?

Best to you and continued success!

Want to discuss real estate / stock investing do not hesitate to reach out.

 

 

 

 

 

 

 

Is stock index investing or real estate a better investment?

Lets look at some important metrics of each investment vehicle and one can determine for themselves which investment vehicle fits their needs.

# SP500 Returns (Stock Index)

First lets look at the long term average of the SP500 using daily data downloaded free from yahoo finance under ticker symbol ^GSPC. This data sets begins from 1/3/1950 and whilst its not the full continuous sample its gives us a general idea of the returns over the past 69 years:

Rplot03

rolling_trading_year_drawdown

The average annual return over the SP500 69 year period from (1/3/1950) to present (9/10/2019) is 7.74% with a maximum draw down of -56%. Periodic down turns amounted to an average draw down of -16%. Based on the data we see that the 7.74% annualized returns come with gut wrenching draw downs.

# Real Estate

Lets us view real estate.

First lets understand how real estate generates investor returns:

  1. Real estate appreciation – inflation adjusted long term rise, local market appreciation or forced appreciation.
  2. Cash flow from rental income.
  3. Mortgage amortization (rental income pays down mortgage).
  4. Property depreciation.

# 1. Real estate appreciation

Lets plot the long term national home price index to get a sense of how homes have risen over time:

case_shiller

Home prices have retained an upward curve with exception to the down turn in 2007 to 2010. The home value index lost -22% of its value. Besides this, the index has on average remained steady with an average loss of -0.12%. What is striking is that unless real estate was the causation of economic collapse real estate largely retained its value during other economic recessions.

We may observe that on an annual basis the home price index has increased on average +3.4%. This is in relation to the annual increase of inflation +2.51% using the Consumer Price Index (All). This means on average, real estate grows with or exceeds the national average inflation rate. This is a net positive as static money in time tends to become worth less as inflation increases over time.

# Multi family appreciation

Lets also discuss how multi family real estate is valued. You may have heard of a valuation metric called cap rate. Cap rate just means the % return one will earn on their real estate investment exclusive of mortgage debt. Its the free and clear return post all operating expenses. Property valuations are based on the net operating income the property produces. Increase the net operating income by renovations, upgrades and the valuation of the property will rise – also known as forced appreciation.

To obtain a sense of possible cap rates and thus the risk/reward profiles per multi family building class we may observe Class A through D building class designations and typical associated cap rates.

Class A buildings are new and have high income earning tenants and command the highest rental rates, often cited are 4 to 6% cap rates which is dependent on geographical location. On the other end of the spectrum, Class D buildings typically over 30 years old are of low construction quality with deffered maintenance commanding lower market rents and around 9%+ cap rates. This short table is a quick summary of often cited cap rates:

Class A = 4 to 6%

Class B/C  = 6 to 8%

Class D  = >9%

The Class A to Class D building class and associated cap rates highlight the reward/risk profile. Investors can earn 4 to 6% on a Class A building with less vacancy risk due to higher quality tenants. The flip side can be said for Class D buildings, the risk profile is higher due to carrying additional vacancy risk.

To touch on forced appreciation one may purchase units with below market rents and perform modifications, upgrades and renovations raising rents to comparable local rental values. Raising the rents increases the net operating income and increases the property valuation.

The key difference between real estate forced appreciation and the stock market appreciation is that the market is an external force independent of anyone’s control. Forced appreciation is 100% in your hands. You control the entry point (below market) and the exit point (up to rental comparibles). If renovating multi family properties – the result of the appreciation is down to your own hard work and determination and 100% in your control. What better way to realize profits!!

# 2. Cash flow from rental income

Cash flow is the goose that lays the golden eggs. It is the profits remaining after all operating expenses and mortgage payments and is generally a passive income stream. As cash flow is dependent on monthly rental values, as our proxy for rent values, we may observe the HUD fair market rent (FMR) data set:

hud_mean_national_fmr

The data set consists of fair market rent data from 1983 to present for all counties in the US for 0 (efficiency) to 4 bedroom units. We simply took the average across all counties to obtain the national average rent value.

We see an up trend in the fair market rent value over time. The recession dates are overlay-ed and we may see that rental values, on average, do not decline, even in times of economic recession. Conversely, dips in the fair market rent value seem to be independent from economic recessions.

To put this in context: If one acquires a property using a conventional loan, 30 year fixed and locks in a 4% interest rate and is cash flow positive generating a 8% annual return on their investment. As rents do not decline during economic recessions and providing the property meets target occupancy targets, the 8% return shall still be realized even though other asset classes are losing value. We learned the SP500 during the 2007/2008 recession lost -56% of its value. This is massive!! The cash flow generates the annual % return even during economic recessions! People still need somewhere to live regardless of economic activity (demand dynamic)!!

Lets quickly look at average annual rent increases ($,%) and compare this to the average rate of inflation over the same period:

Rplot

Rplot01

We see the spectrum of average % rental increases with in a range of 2.5 to 2.91%. This is relation to the CPI inflation rate of 2.55% over the same period and like equity appreciation, rise with inflation. This is good news as we do not wish to see our future cash flow returns eroded by inflation.

# 3. Mortgage Amortization

When applying a mortgage to a property we typically have conventional and commercial loans which are amortized over a mortgage term.  This equity / interest relationship of the mortgage is called the amortization schedule. We may enter a deal with 25% down and finance the other 75%, meaning the bank/financer has the largest equity stake in the deal! Over time as the mortgage principal is paid down – our share of the equity (started at 25%) increases over time and bank/financer who loaned the money over time slowly exits the deal (started 75%). At the end of the mortgage term – we have 100% in the deal and the bank/financer has 0%.

Here is an example of the process and we may see how the owner equity stake vs the bank/financer stake increases over time:

ownership_in_deal

The cash flow of the property pays off the loan and principal is accumulated over the loan term. We may see this mechanism by example:

# Details of the deal 

Property Value = 100,000 ($)

Loan Term = 360 months (30 years)

Down Payment = .25 (%)

Loan Amount = .75 (%)

Annual Interest Rate = 0.06 (%) # 6%

Lets plot what the cumulative principal payments accrued over the loan term (months):

total_equity_in_deal

We borrowed 75,000 in this deal – through amortization, this 75,000 debt has been paid by the cash flow of the property and over the loan term the 75,000 has been realized.

# 4. Property Depreciation 

The IRS deems the useful life of a rental property as 27.5 years. After 1 year of ownership, the cost of the building,  not the land, can be depreciated over a 27.5 year period.

The crux of depreciation is as follows: 

Step 1: Determine the building cost and useful life of the property.

Step 2: Calculate the partial first year depreciation value, calculate the full year depreciation value (building cost / 27.5 useful life) and calculate the partial last year depreciation value.

Step 3: Multiply the depreciation value per year by the marginal tax rate

We can see the tax savings over the useful life of the property at a 24% marginal tax rate:

tax_savings_depreciation

Lets summarize the investments in both assets:

Stock Index Investing: 7 to 8% annual return and comes with hefty intermediate draw downs, on average -16% with maximum gut wrenching periods of -56%. Appreciation relies on external factors outwith anyone control. Appreciation is inflation adjusted.

Real Estate: Does not carry the same volatility as stocks. Forced appreciation can be controlled by the individual. 4x investor return streams through appreciation,  cash flow, amortization and depreciation. Real estate displays investor return robustness during economic recessions where cash flow, amortization, depreciation remain statically intact. We also observed during economic recessions if real estate sector was not the causation of the recession then real estate valuations retained their value.

# Assessing the performance of $100,000 growth in both assets

Stock Index Investing: Lets assess the growth of 100,000 dollars in the SP500 index. The annualized return from 1986 to present day is 8.16%. 100,000 grows to $1,331,068 over the 33 year time period.

Real Estate: Let us consider purchasing a property that was worth $400,000 in 1984 with a 25% down payment equating to $100,000 initial investment (absent of closing costs).

  1. A single family home appreciates on average 3.4% a year – the increase in value = $489,600
  2. The rental income on $100,000 invested returned an initial cash flow return of 8% with the rents raised inline with the average % increase 2.5% per year = $304,650
  3. The 30 year mortgage has recently been paid off and the added equity in the deal = 300,000
  4. At a marginal tax rate of 24% the tax savings on the property = 66,000

The total $ return = $1,160,250

# Final thoughts 

In our 100k investment example. Stock index investing in the SP500 over 33 years yielded a profit of $1,331,068 vs $1,160,250 for real estate. The difference $170,818.

So which investment vehicle is superior? That is all dependent on the investor and if one can emotionally handle gut wrenching draw downs of up to 56% or on average 16% then index investing does a fine job. Note index investing is 100% independent of stock speculation in individual stocks on short and medium term time frames. That type of trading or shorter term investing lends itself to more of a random chance in results and the law of large numbers are at work – the house typically wins!

Over all real estate is a solid performer yielding a return of $1,160,250 over the course of 33 years. It achieved this return with very little draw down because during economic recessions fair market rents continued to rise and the mortgage numbers/expenses were fixed, securing cash flow annual returns. Depreciation tax savings and amortization were also fixed return numbers in the face of recessions.

In terms of the ratio between risk and reward – real estate is less volatile and produces great long term returns. 

Code for this study can be found on my github

Parameter Sensitivity Analysis

In this post we demonstrate ideas to test for parameter sensitivity.

Here we have a strategy with 5x parameters. 3x being look back periods for a specific indiactor. The other 2x being an entry level threshold and an exit level threshold.

I decided to change the original parameters by up to 50% in either direction of the original. It might look something like this:

# Original 
param1 = 116
param2 = 10
param3 = 69
entry_level = 6
exit_level= 85

So if you multiply any of the params by 0.5… you obtain a value… and in this example lets multiply the first parameter, param1: 116 * 0.5 = 58. This provides the new upper and lower limit. The range now becomes:

58 to 174.

This has created a range of parameters up to a maximum of 50% change in either direction of the original. Do this for all parameters. Note if your indicator only goes to 100 for example then that would be your maximum versus the full 50% range. Same in the opposite direction. Parameter lower limit obviously 0.

Next run a monte carlo running randomly through the parameter ranges. I tend to do 5000 iterations then look at the distribution. An example output of this process is below:

1 (1)1 (2)1 (3)

Here we show some key metrics. Average trade profit. This is an important value because on average we want to know what we make per trade and it has to be high enough to withstand slippage and commissions.

Next we have the win % where depending on which money management system you are using, often this is an input variable in money management calculations. An example would be half Kelly formula.

And of course we have the net ending balance of the strategy itself less commissions (I didnt add slippage in this example).

The plots above show my original parameter setting results by marking a straight vertical red line on the plot.

This red line was some optimal combination again found through a random search. As we can see its not THE optimal as to the right side of the red line we have parameter sets which beat the original.

Besides this what can we learn – we see that by running 5000 iterations and randomly selecting a parameter from a range which includes up to a 50% change in either direction from the original. That we land in positive expectancy across our selected trading metrics. We may observe the mean values below:

params

Needless to say. The random parameter selection might not choose the most extreme 50% parameter change at any given iteration. Thus across the 5 parameter sets we would be combining a range of changes in either direction up to a 50% maximum. One iteration might might look like:

Param1 = +/- 10% change from original

Param2 = No change from original

Param 3 = +/- 40% change from original

Param 4 = +/- 12 % change from original

Param 5 = +/- 5% change from original

And run another:

Param1 = +/- 45% change from original

Param2 = +/- 8% change from original

Param 3 = +/- 2% change from original

Param 4 = +/- 22 % change from original

Param 5 = +/- 34% change from original

So on and so forth for 5000 iterations.

How do you obtain confidence that you can protect your initial strategy investment when going live?

This type of study essentially says we may place a strategy live and have a degree in confidence in changing parameters.

What would I do in real life?

I would place 3 parameters bands in live trading. Whichever parameter set triggered first is the trade that is taken. Thus I am covering a parameter band and I am covered if the market changes away from my original parameters…..

Food for thought!

If you have an interest in this type of research feel free to drop me a line.

Follow me: @flare9x / Github

SPY Mean Reversion With John Ehlers Adaptive RSI

It has been a busy few months. I have been exploring market indicators that John Ehlers has created which he publicly made available in his book: Cycle Analytics for Traders : Advanced Technical Trading Concepts.

The key theme of his book is applying digital signal processing filters to better process market data. He utilizes various high and low pass filters which only allow certain frequencies to pass at specified look back periods and thus make better attempts to reduce signal to noise and perhaps extract a dominant cycle from the data.

Key ideas from the book:

  1. Traditional indicators have lag and assume normal probability distribution
  2. His indicators reduce lag
  3. Extracting the dominant cycle over a range of periods can be used to calibrate traditional indicators such as RSI, stochastic etc….
  4. Has methods to convert indicator output to as close to normal probability distribution

I’m in the process of making his indicators from his Cycle analytics for traders book available on my github in a package called MarketCycles.jl. Please refer to the package README.md for further information.

The package will be used in the next part of this post where we use an Adaptive RSI to time the mean reversion tendency of the daily S&P500.

First install the package:

Pkg.clone("https://github.com/flare9x/MarketCycles.jl")

The Adaptive RSI is adaptive because the indicator extracts the dominant cycle using the autocorrelation periodogram. At any given point in time [i] half the dominant cycle period is used to calibrate the lookback length of the RSI.

I want to point out a few stark differences with the below strategy back test code. First, the code is setup to BUY NEXT BAR AT OPEN. This means when a signal is triggered at the close of a bar. A trade is then entered at the next open price.

Secondly, rather than calculate the cumulative % close to close returns as so many do. We specify an initial starting capital and add the $ gain to a running total adjusting for trading commissions which provides us with a result that is more in line with how it might work in the real world.

Please see top of code snippet in order to adjust the following input variables for the strategy:

# Strategy input variables
starting_capital = 10000
comms = 10
entry_level = .3
exit_level = .8

The rest is best described by following the below julia language code. Note if you do not have a data source I have made this post easy to follow by providing a code to download data from the alphavantage api. You just need to edit the alphavantage link and add in your own API key:

# Ehlers mean reversion

using HTTP
using DataFrames
using CSV
using Gadfly
using MarketCycles

# Multiple symbols
adjusted = 1  # if want adjusted closes set to 1 else non adjusted set it to 0
t = 1
tickers = ["SPY"]
for t in 1:length(tickers)  # Note it might get stuck!! if it fails.. print the t value... then restart the for loop for t in 4:length(tickers)  instead of 1:length(tickers) for example
# Using HTTP package
sleep(10)  # sleep between API calls
if adjusted == 1
res = HTTP.get(joinpath(
"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
else
    res = HTTP.get(joinpath(
    "https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLYD&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
end
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:]  # subset remove header row
# Rename Columns
if adjusted == 1
    colnames = ["Date","Open","High","Low","Close","Adjusted_Close","Volume","Dividend_Amount","split_coefficient"]
else
colnames = ["Date","Open","High","Low","Close","Volume"]
end
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = Date.(x[:Date],Dates.DateFormat("yyy-mm-dd"))
# Sort Date Frame By Date
x = sort!(x, [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
if adjusted == 1
for i in 2:length(x)-2
    x[i] = convert(Array{Float64,1},x[i])
end
    x[7] = convert(Array{Int64,1},x[7])
    x[8] = convert(Array{Float64,1},x[8])
else
    for i in 2:length(x)-1
        x[i] = convert(Array{Float64,1},x[i])
    end
        x[6] = convert(Array{Int64,1},x[6])
end

if adjusted == 1
CSV.write(joinpath("CSV_OUT_ADJUSTED_"tickers[t]".csv"), x)
else
CSV.write(joinpath("CSV_OUT_"tickers[t]".csv"), x)
end
end

# Load Data
SPY = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/Indicators/CSV_OUT_ADJUSTED_SPY.csv",types=[String; fill(Float64, 5); Int;Float64],header=true)
SPY[:Date] = DateTime.(SPY[:Date],Dates.DateFormat("mm/dd/yyyy"))

# Pull arrays
Date_index = SPY[:Date]
Spy_close = SPY[:Close]
Spy_open = SPY[:Open]

# Eherls Adaptive RSI
Adaptive_RSI = AdaptiveRSI(Spy_close)

# Strategy input variables
starting_capital = 10000.00
comms = 10
entry_level = .3
exit_level = .8

# Cross under entry level and sell when cross over exit level
entry = zeros(Adaptive_RSI)
exit = zeros(Adaptive_RSI)
for i = 2:size(Adaptive_RSI,1)
    if (Adaptive_RSI[i] <= entry_level) && (Adaptive_RSI[i-1] > entry_level)
        entry[i] = 1
    else
        entry[i] = 0
    end
    if (Adaptive_RSI[i] >= exit_level) && (Adaptive_RSI[i-1] < exit_level)
        exit[i] = 2
    else
        exit[i] = 0
    end
end

# combine entry and exit
signal = exit .+ entry


# Fill the trade between entry and exit with 1's
all_signal = zeros(signal)
for i = 2:length(signal)
    print(i)
    if signal[i] == 1.0
        all_signal[i] = 1.0
    end
    if signal[i] == 0.0 && all_signal[i-1] == 1.0
        all_signal[i] = 1
    end
    if signal[i-1] == 2.0 && signal[i] == 0.0
        all_signal[i-1] = 1.0
    end
end

# ID the start and end of the trade
trade_start = zeros(all_signal)
trade_end = zeros(all_signal)
for i = 2:size(trade_end,1)
if all_signal[i] == 1.0 && all_signal[i] != all_signal[i-1]
    trade_start[i] = 1
end
if all_signal[i-1] == 1.0 && all_signal[i-1] != all_signal[i]
    trade_end[i-1] = 2
end
end

# Calculate the % return buying at open and selling at the close
open_price = Float64[]
close_price = Float64[]
strategy_returns = zeros(size(all_signal,1))
for i = 1:size(all_signal,1)-1
    if trade_start[i] == 1
        open_price = Spy_open[i+1]
    elseif trade_end[i] == 2
        close_price = Spy_close[i]
        # place the close return on same index position as the end of the trade
    strategy_returns[i] = close_price /  open_price - 1
end
end


# Work out gain on starting investment amount
dollar_gain = zeros(size(strategy_returns,1))
dollar_gain[1] = starting_capital

i=1
for i=2:size(strategy_returns,1)
    # if the returns are 0 then use the previous $ amount
if strategy_returns[i] == 0.0
    dollar_gain[i] = dollar_gain[i-1]
elseif strategy_returns[i] != 0.0
    dollar_gain[i] = dollar_gain[i-1] + (dollar_gain[i-1] * strategy_returns[i]) - comms
end
end

# plot final result
white_panel = Theme(
	panel_fill="white",
    default_color="blue",
    background_color="white"
)
p = plot(x=Date_index,y=dollar_gain,Geom.line,Guide.title("Ehlers Adaptive RSI"),white_panel,Scale.y_continuous(format=:plain))
         draw(PNG("C:/Users/Andrew.Bannerman/Desktop/Julia/ehlers_rsi.png", 1500px, 800px), p);

And we may plot the output which is the growth of $10,000 inclusive of $5 round trip commissions:

ehlers_rsi

It should be noted that default settings from Ehlers book were used as a High and low pass filter as well as the range of lags used in the autocorrelation periodogram to calculate the dominant cycle.

The function I created has many inputs that may be calibrated:

AdaptiveRSI(x::Array{Float64}; min_lag::Int64=1, max_lag::Int64=48,LPLength::Int64=10, HPLength::Int64=48, AvgLength::Int64=3)::Array{Float64}

Thus the default settings might not be best tuned to SPY daily data. One would be advised to iterate through the input variables to learn how each variable affects the output as well as find a correct tuning of the indicator.

If anyone wishes to have a more professional back test script with varying entry / exits methods that accounts for slippage and commissions do not hesitate to contact me.

Follow me on twitter: @flare9x

Github

Julia – Build any time resolution using 1 minute data

Reliable data makes for more accurate models. It is not the end of the world if there are minor discrepancies although data does need to be representative to build models and make good assumptions. Common data errors are known to be found at market closing times. We want the auction price not the last price. […]

 

Reliable data makes for more accurate models. It is not the end of the world if there are minor discrepancies although data does need to be representative to build models and make good assumptions.

Common data errors are known to be found at market closing times. We want the auction price not the last price. Last price might be some fluff trade with 1 lot. We want the real close or the auction close. This increases the accuracy of the daily close data.

To achieve this we can create any resolution using 1 minute bars and sample them at which ever time resolution one wishes. For example. If we want to create more accurate daily close data using the auction close price at 15:15 for ES. We may simply sample every 1 minute close at time stamp of 15:15.

If we want to build models and avoid a period of volatility, the last 15 minutes of trade we may sample the time at every 15:00 time stamp.

So in order to have more control over the creation of data I created the code attached to this blog post.

If we build 15 minute data. We may sample the 1 minute close price at each :00, :15, :30, :45 time increment.

For 30 minute data. We may sample the 1 minute close price at each :00 and :30 time increment.

# Handling missing data

Where missing data was experienced this was dealt with by forward filling the closest value. If there was a time stamp as follows: 09:01, 09:02, 09:03, 09:05, 09:07, 09:08. Where 09:04 and 09:06 are missing. To fill missing 09:04 we forward fill 09:03 data points and to fill missing 09:06 we forward fill 09:05.

This methodology seems consistent with how tradestation builds their larger resolution data from 1 minute data. Although if the data point is simply too far away, TS has an ignore feature (after studying further TS forward or back fill missing data if on the same day, I am since now looking to make this edit as it makes complete sense!). So they would miss the point 100% in the new time resolution sample.

Despite this, I feel the procedure deployed and the low frequency of missing data makes the data set good enough to build models on.

# Data Accuracy

A point to note pertaining to data accuracy. When building models on OHLC price data it makes sense to form statistical distributions by altering the original reported OHLC price by some % of the days ATR over many iterations. Thus we may observe how sensitive the models are to changes in the reported OHLC prices. In the absence of expensive bid ask data this is a good alternative.

# How to have confidence

The data of course has to be representative and accurate to how it unfolded in real time. Artificial signals would ensue if the data was of terrible quality. For this reason by forming distributions we may land in some place of confidence. If a trading model can take a hammering over the days random % ATR adjustment.  Future confidence can be gained and suggests the model is fit to signal where noise/errors in OHLC reporting does not throw off the model and make it redundant.

The code for building any resolution may be found on my github with the steps as:

  1. Load ES daily data to index the days of trading. This removes all market holiday closures.
  2. Create a date and time index and join original 1 minute data to the date/time index.
  3. Forward fill missing data. For loop within acts like na.locf.
  4. Find half day market closures. Here we load 30 minute data and loop through looking for early closures and late market open (1x, sep 11, 2002)
  5. Save market holiday dates and times and loop through the series filling in the holiday times with “” or 0.0
  6. Reduce to remove all 0 and “”
  7. Build any time frame resolution. 10min, 15 min, 30 min etc…

Hope you enjoy and can save time vs. downloading each and every time resolution 🙂

See bar plot of number of errors in re-sampled vs Tradestation:

Errors

julia> out_all
6×2 DataFrames.DataFrame
│ Row │ x1 │ x2 │
├─────┼───────┼────┤
│ 1 │ Date │ 0 │
│ 2 │ Time │ 0 │
│ 3 │ Open │ 58 │
│ 4 │ High │ 9 │
│ 5 │ Low │ 13 │
│ 6 │ Close │ 0 │

58 discrepancies involved with the open price. There are a total of 67387 30 min bars in the example. This represents 0.0861% of the total sample.

Tradestation had missing data around the open 1 min prices. I forward filled the previous day close in this case. Where it would be more accurate to backfill the next available quote on the same day. I will likely update this pretty soon. There were other special closes such as 1 minute silences which I didn’t account for in my data set. A list may be found:

Nyse-Closings.pdf

Thanks for reading.

using DataFrames
using CSV
using Missings
# Load 1 minute data – close data sampled every 1 minute
main_data1_1min = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/1.min.reg.sess.es_0830_to_1500.txt",types=[String; String; fill(Float64, 4); Int;Int],header=true)
main_data2_1min = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/1.min.reg.sess.es_0830_to_1500.txt", header=true)
main_data3_1min = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/1.min.reg.sess.es_0830_to_1500.txt", header=true)
test_one = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/10.min.reg.sess.es.txt", header=true)
test = convert(Array{Float64},test_one[:Close])
test_date = convert(Array{String},test_one[:Date])
test_time = convert(Array{String},test_one[:Time])
# Make Date_Time column
# Data 1
a = main_data1_1min[:Date]
b = main_data1_1min[:Time]
out = String[]
temp = String[]
for i in 1:length(a)
temp = map(join,zip([a[i]],[b[i]]), " ")
append!(out,temp)
end
main_data1_1min[:Date_Time] = out
String.(out)
# Convert to date time
main_data1_1min[:Date_Time] = DateTime.(main_data1_1min[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M"))
# Data 2
a = main_data2_1min[:Date]
b = main_data2_1min[:Time]
out = String[]
temp = String[]
for i in 1:length(a)
temp = map(join,zip([a[i]],[b[i]]), " ")
append!(out,temp)
end
main_data2_1min[:Date_Time] = out
# Convert to date time
main_data2_1min[:Date_Time] = DateTime.(main_data2_1min[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M"))
# Data 3
a = main_data3_1min[:Date]
b = main_data3_1min[:Time]
out = String[]
temp = String[]
for i in 1:length(a)
temp = map(join,zip([a[i]],[b[i]]), " ")
append!(out,temp)
end
main_data3_1min[:Date_Time] = out
# Convert to date time
main_data3_1min[:Date_Time] = DateTime.(main_data3_1min[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M"))
# Create index and left join to known 1 minute index
# Index Data 1
data1_time_index = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/time_index/es_reg_sess_time_index.txt", header=true)
data1_time_index = convert(Array{String},data1_time_index[:header])
data1_time_index_1 = data1_time_index
data1_date_index = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/1.day.reg.sess.es.txt", header=true) ####////////////////////////////// Replace with es_D
data1_date_index = convert(Array{String},data1_date_index[:Date])
len = length(data1_date_index)
data1_date_index = repeat(data1_date_index, inner = 390)
# Repeat times
data1_time_index = repeat(data1_time_index, inner=1, outer=len)
# Join Date Time
out = String[]
temp = String[]
for i = 1:length(data1_time_index)
temp = map(join,zip([data1_date_index[i]],[data1_time_index[i]]), " ")
append!(out,temp)
end
data1_date_time_index = out
# Convert to date time format
data1_date_time_index = DateTime.(data1_date_time_index,Dates.DateFormat("mm/dd/yyyy H:M"))
# Make df to prepare to inner join
index_dim = hcat(data1_date_time_index)
data1_date_time_index = DataFrame(index_dim)
data1_date_time_index = hcat(data1_date_time_index,data1_date_index,data1_time_index)
colnames = ["Date_Time","Date_index","Time_index"]
names!(data1_date_time_index.colindex, map(parse, colnames))
# Data 1 Data Frame
main_data1_1min = join(data1_date_time_index,main_data1_1min, on= :Date_Time, kind = :left,makeunique=true)
# Index Data 2
data2_time_index = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/time_index/es_reg_sess_time_index.txt", header=true)
data2_time_index = convert(Array{String},data2_time_index[:header])
data2_time_index_1 = data2_time_index
data2_date_index = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/1.day.reg.sess.es.txt", header=true) ####////////////////////////////// Replace with es_D
data2_date_index = convert(Array{String},data2_date_index[:Date])
len = length(data2_date_index)
data2_date_index = repeat(data2_date_index, inner = 390) # Es has 390 1 minute bars from 08:31 to 15:00
# Repeat times
data2_time_index = repeat(data2_time_index, inner=1, outer=len)
# Join Date Time
out = String[]
temp = String[]
for i = 1:length(data2_time_index)
temp = map(join,zip([data2_date_index[i]],[data2_time_index[i]]), " ")
append!(out,temp)
end
data2_date_time_index = out
# Convert to date time format
data2_date_time_index = DateTime.(data2_date_time_index,Dates.DateFormat("mm/dd/yyyy H:M"))
# Make df to prepare to inner join
index_dim = hcat(data2_date_time_index)
data2_date_time_index = DataFrame(index_dim)
data2_date_time_index = hcat(data2_date_time_index,data2_date_index,data2_time_index)
colnames = ["Date_Time","Date_index","Time_index"]
names!(data2_date_time_index.colindex, map(parse, colnames))
# Data 2 Data Frame
main_data2_1min = join(data2_date_time_index,main_data2_1min, on= :Date_Time, kind = :left,makeunique=true)
# Index Data 3
data3_time_index = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/time_index/es_reg_sess_time_index.txt", header=true)
data3_time_index = convert(Array{String},data3_time_index[:header])
data3_time_index_1 = data3_time_index
data3_date_index = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/1.day.reg.sess.es.txt", header=true) ####////////////////////////////// Replace with es_D
data3_date_index = convert(Array{String},data3_date_index[:Date])
len = length(data3_date_index)
data3_date_index = repeat(data3_date_index, inner = 390) # Es has 390 1 minute bars from 08:31 to 15:00
# Repeat times
data3_time_index = repeat(data3_time_index, inner=1, outer=len)
# Join Date Time
out = String[]
temp = String[]
for i = 1:length(data3_time_index)
temp = map(join,zip([data3_date_index[i]],[data3_time_index[i]]), " ")
append!(out,temp)
end
data3_date_time_index = out
# Convert to date time format
data3_date_time_index = DateTime.(data3_date_time_index,Dates.DateFormat("mm/dd/yyyy H:M"))
# Make df to prepare to inner join
index_dim = hcat(data3_date_time_index)
data3_date_time_index = DataFrame(index_dim)
data3_date_time_index = hcat(data3_date_time_index,data3_date_index,data3_time_index)
colnames = ["Date_Time","Date_index","Time_index"]
names!(data3_date_time_index.colindex, map(parse, colnames))
# Data 3 Data Frame
main_data3_1min = join(data3_date_time_index,main_data3_1min, on= :Date_Time, kind = :left,makeunique=true)
# Build arrays
# Data 1
data1_date_time_1min_i = main_data1_1min[:Date_Time]
data1_date_1min_i = String.(collect(Missings.replace(main_data1_1min[:Date_index], "")))
data1_time_1min_i = String.(collect(Missings.replace(main_data1_1min[:Time_index], "")))
data1_open_1min_i = Float64.(collect(Missings.replace(main_data1_1min[:Open], 0.0)))
data1_high_1min_i = Float64.(collect(Missings.replace(main_data1_1min[:High], 0.0)))
data1_low_1min_i = Float64.(collect(Missings.replace(main_data1_1min[:Low], 0.0)))
data1_close_1min_i = Float64.(collect(Missings.replace(main_data1_1min[:Close], 0.0)))
# Data 2
data2_date_time_1min_i = main_data2_1min[:Date_Time]
data2_date_1min_i = String.(collect(Missings.replace(main_data2_1min[:Date_index], "")))
data2_time_1min_i = String.(collect(Missings.replace(main_data2_1min[:Time_index], "")))
data2_open_1min_i = Float64.(collect(Missings.replace(main_data2_1min[:Open], 0.0)))
data2_high_1min_i = Float64.(collect(Missings.replace(main_data2_1min[:High], 0.0)))
data2_low_1min_i = Float64.(collect(Missings.replace(main_data2_1min[:Low], 0.0)))
data2_close_1min_i = Float64.(collect(Missings.replace(main_data2_1min[:Close], 0.0)))
# Data 3
data3_date_time_1min_i = main_data3_1min[:Date_Time]
data3_date_1min_i = String.(collect(Missings.replace(main_data3_1min[:Date_index], "")))
data3_time_1min_i = String.(collect(Missings.replace(main_data3_1min[:Time_index], "")))
data3_open_1min_i = Float64.(collect(Missings.replace(main_data3_1min[:Open], 0.0)))
data3_high_1min_i = Float64.(collect(Missings.replace(main_data3_1min[:High], 0.0)))
data3_low_1min_i = Float64.(collect(Missings.replace(main_data3_1min[:Low], 0.0)))
data3_close_1min_i = Float64.(collect(Missings.replace(main_data3_1min[:Close], 0.0)))
# Forward Fill Missing Data 1
# Performs like na.locf
###### Note only forward fill if missing data is on the same day!
data1_date_1min_i
data1_open_1min = zeros(data1_open_1min_i)
data1_high_1min = zeros(data1_high_1min_i)
data1_low_1min = zeros(data1_low_1min_i)
data1_close_1min = zeros(data1_close_1min_i)
i=1
for i in 2:length(data1_close_1min)
if i == 2
# fill index 1
data1_open_1min[i1] = data1_open_1min_i[i1]
data1_high_1min[i1] = data1_high_1min_i[i1]
data1_low_1min[i1] = data1_low_1min_i[i1]
data1_close_1min[i1] = data1_close_1min_i[i1]
end
if (data1_close_1min_i[i] == 0.0 && data1_close_1min_i[i1] != 0.0) & (data1_date_1min_i[i] == data1_date_1min_i[i1])
data1_open_1min[i] = data1_open_1min_i[i1]
data1_high_1min[i] = data1_high_1min_i[i1]
data1_low_1min[i] = data1_low_1min_i[i1]
data1_close_1min[i] = data1_close_1min_i[i1]
else
data1_open_1min[i] = data1_open_1min_i[i]
data1_high_1min[i] = data1_high_1min_i[i]
data1_low_1min[i] = data1_low_1min_i[i]
data1_close_1min[i] = data1_close_1min_i[i]
end
if (data1_close_1min_i[i] == 0.0 && data1_close_1min_i[i1] == 0.0) & (data1_date_1min_i[i] == data1_date_1min_i[i1])
data1_open_1min[i] = data1_open_1min[i1]
data1_high_1min[i] = data1_high_1min[i1]
data1_low_1min[i] = data1_low_1min[i1]
data1_close_1min[i] = data1_close_1min[i1]
end
end
# Reverse loop to back fill
# Note julia does not have reverse iteration
data1_open_1min_rev = reverse(data1_open_1min)
data1_high_1min_rev = reverse(data1_high_1min)
data1_low_1min_rev = reverse(data1_low_1min)
data1_close_1min_rev = reverse(data1_close_1min)
# Data 1 reverse
data1_open_1min_rev_out = zeros(data1_open_1min_rev)
data1_high_1min_rev_out = zeros(data1_high_1min_rev)
data1_low_1min_rev_out = zeros(data1_low_1min_rev)
data1_close_1min_rev_out = zeros(data1_close_1min_rev)
i=1
for i in 2:length(data1_close_1min_rev)
if i == 2
# fill index 1
data1_open_1min_rev[i1] = data1_open_1min_rev[i1]
data1_high_1min_rev[i1] = data1_high_1min_rev[i1]
data1_low_1min_rev[i1] = data1_low_1min_rev[i1]
data1_close_1min_rev[i1] = data1_close_1min_rev[i1]
end
if (data1_close_1min_rev[i] == 0.0 && data1_close_1min_rev[i1] != 0.0)
data1_open_1min_rev[i] = data1_open_1min_rev[i1]
data1_high_1min_rev[i] = data1_high_1min_rev[i1]
data1_low_1min_rev[i] = data1_low_1min_rev[i1]
data1_close_1min_rev[i] = data1_close_1min_rev[i1]
else
data1_open_1min_rev[i] = data1_open_1min_rev[i]
data1_high_1min_rev[i] = data1_high_1min_rev[i]
data1_low_1min_rev[i] = data1_low_1min_rev[i]
data1_close_1min_rev[i] = data1_close_1min_rev[i]
end
if (data1_close_1min_rev[i] == 0.0 && data1_close_1min_rev[i1] == 0.0)
data1_open_1min_rev[i] = data1_open_1min_rev[i1]
data1_high_1min_rev[i] = data1_high_1min_rev[i1]
data1_low_1min_rev[i] = data1_low_1min_rev[i1]
data1_close_1min_rev[i] = data1_close_1min_rev[i1]
end
end
# Reverse to normal index position
data1_open_1min = reverse(data1_open_1min_rev)
data1_high_1min = reverse(data1_high_1min_rev)
data1_low_1min = reverse(data1_low_1min_rev)
data1_close_1min = reverse(data1_close_1min_rev)
# Validation
all = hcat(data1_date_1min_i[1:144130],data1_time_1min_i[1:144130],data1_open_1min[1:144130])
all
# Forward Fill Missing Data 2
# Performs like na.locf
###### Note only forward fill if missing data is on the same day!
data2_date_1min_i
data2_open_1min = zeros(data2_open_1min_i)
data2_high_1min = zeros(data2_high_1min_i)
data2_low_1min = zeros(data2_low_1min_i)
data2_close_1min = zeros(data2_close_1min_i)
i=1
for i in 2:length(data2_close_1min)
if i == 2
# fill index 1
data2_open_1min[i1] = data2_open_1min_i[i1]
data2_high_1min[i1] = data2_high_1min_i[i1]
data2_low_1min[i1] = data2_low_1min_i[i1]
data2_close_1min[i1] = data2_close_1min_i[i1]
end
if (data2_close_1min_i[i] == 0.0 && data2_close_1min_i[i1] != 0.0) & (data2_date_1min_i[i] == data2_date_1min_i[i1])
data2_open_1min[i] = data2_open_1min_i[i1]
data2_high_1min[i] = data2_high_1min_i[i1]
data2_low_1min[i] = data2_low_1min_i[i1]
data2_close_1min[i] = data2_close_1min_i[i1]
else
data2_open_1min[i] = data2_open_1min_i[i]
data2_high_1min[i] = data2_high_1min_i[i]
data2_low_1min[i] = data2_low_1min_i[i]
data2_close_1min[i] = data2_close_1min_i[i]
end
if (data2_close_1min_i[i] == 0.0 && data2_close_1min_i[i1] == 0.0) & (data2_date_1min_i[i] == data2_date_1min_i[i1])
data2_open_1min[i] = data2_open_1min[i1]
data2_high_1min[i] = data2_high_1min[i1]
data2_low_1min[i] = data2_low_1min[i1]
data2_close_1min[i] = data2_close_1min[i1]
end
end
# Reverse loop to back fill
# Note julia does not have reverse iteration
data2_open_1min_rev = reverse(data2_open_1min)
data2_high_1min_rev = reverse(data2_high_1min)
data2_low_1min_rev = reverse(data2_low_1min)
data2_close_1min_rev = reverse(data2_close_1min)
# Data 1 reverse
data2_open_1min_rev_out = zeros(data2_open_1min_rev)
data2_high_1min_rev_out = zeros(data2_high_1min_rev)
data2_low_1min_rev_out = zeros(data2_low_1min_rev)
data2_close_1min_rev_out = zeros(data2_close_1min_rev)
i=1
for i in 2:length(data2_close_1min_rev)
if i == 2
# fill index 1
data2_open_1min_rev[i1] = data2_open_1min_rev[i1]
data2_high_1min_rev[i1] = data2_high_1min_rev[i1]
data2_low_1min_rev[i1] = data2_low_1min_rev[i1]
data2_close_1min_rev[i1] = data2_close_1min_rev[i1]
end
if (data2_close_1min_rev[i] == 0.0 && data2_close_1min_rev[i1] != 0.0)
data2_open_1min_rev[i] = data2_open_1min_rev[i1]
data2_high_1min_rev[i] = data2_high_1min_rev[i1]
data2_low_1min_rev[i] = data2_low_1min_rev[i1]
data2_close_1min_rev[i] = data2_close_1min_rev[i1]
else
data2_open_1min_rev[i] = data2_open_1min_rev[i]
data2_high_1min_rev[i] = data2_high_1min_rev[i]
data2_low_1min_rev[i] = data2_low_1min_rev[i]
data2_close_1min_rev[i] = data2_close_1min_rev[i]
end
if (data2_close_1min_rev[i] == 0.0 && data2_close_1min_rev[i1] == 0.0)
data2_open_1min_rev[i] = data2_open_1min_rev[i1]
data2_high_1min_rev[i] = data2_high_1min_rev[i1]
data2_low_1min_rev[i] = data2_low_1min_rev[i1]
data2_close_1min_rev[i] = data2_close_1min_rev[i1]
end
end
# Reverse to normal index position
data2_open_1min = reverse(data2_open_1min_rev)
data2_high_1min = reverse(data2_high_1min_rev)
data2_low_1min = reverse(data2_low_1min_rev)
data2_close_1min = reverse(data2_close_1min_rev)
# Forward Fill Missing Data 3
# Performs like na.locf
###### Note only forward fill if missing data is on the same day!
data3_date_1min_i
data3_open_1min = zeros(data3_open_1min_i)
data3_high_1min = zeros(data3_high_1min_i)
data3_low_1min = zeros(data3_low_1min_i)
data3_close_1min = zeros(data3_close_1min_i)
i=1
for i in 2:length(data3_close_1min)
if i == 2
# fill index 1
data3_open_1min[i1] = data3_open_1min_i[i1]
data3_high_1min[i1] = data3_high_1min_i[i1]
data3_low_1min[i1] = data3_low_1min_i[i1]
data3_close_1min[i1] = data3_close_1min_i[i1]
end
if (data3_close_1min_i[i] == 0.0 && data3_close_1min_i[i1] != 0.0) & (data3_date_1min_i[i] == data3_date_1min_i[i1])
data3_open_1min[i] = data3_open_1min_i[i1]
data3_high_1min[i] = data3_high_1min_i[i1]
data3_low_1min[i] = data3_low_1min_i[i1]
data3_close_1min[i] = data3_close_1min_i[i1]
else
data3_open_1min[i] = data3_open_1min_i[i]
data3_high_1min[i] = data3_high_1min_i[i]
data3_low_1min[i] = data3_low_1min_i[i]
data3_close_1min[i] = data3_close_1min_i[i]
end
if (data3_close_1min_i[i] == 0.0 && data3_close_1min_i[i1] == 0.0) & (data3_date_1min_i[i] == data3_date_1min_i[i1])
data3_open_1min[i] = data3_open_1min[i1]
data3_high_1min[i] = data3_high_1min[i1]
data3_low_1min[i] = data3_low_1min[i1]
data3_close_1min[i] = data3_close_1min[i1]
end
end
# Reverse loop to back fill
# Note julia does not have reverse iteration
data3_open_1min_rev = reverse(data3_open_1min)
data3_high_1min_rev = reverse(data3_high_1min)
data3_low_1min_rev = reverse(data3_low_1min)
data3_close_1min_rev = reverse(data3_close_1min)
# Data 1 reverse
data3_open_1min_rev_out = zeros(data3_open_1min_rev)
data3_high_1min_rev_out = zeros(data3_high_1min_rev)
data3_low_1min_rev_out = zeros(data3_low_1min_rev)
data3_close_1min_rev_out = zeros(data3_close_1min_rev)
i=1
for i in 2:length(data3_close_1min_rev)
if i == 2
# fill index 1
data3_open_1min_rev[i1] = data3_open_1min_rev[i1]
data3_high_1min_rev[i1] = data3_high_1min_rev[i1]
data3_low_1min_rev[i1] = data3_low_1min_rev[i1]
data3_close_1min_rev[i1] = data3_close_1min_rev[i1]
end
if (data3_close_1min_rev[i] == 0.0 && data3_close_1min_rev[i1] != 0.0)
data3_open_1min_rev[i] = data3_open_1min_rev[i1]
data3_high_1min_rev[i] = data3_high_1min_rev[i1]
data3_low_1min_rev[i] = data3_low_1min_rev[i1]
data3_close_1min_rev[i] = data3_close_1min_rev[i1]
else
data3_open_1min_rev[i] = data3_open_1min_rev[i]
data3_high_1min_rev[i] = data3_high_1min_rev[i]
data3_low_1min_rev[i] = data3_low_1min_rev[i]
data3_close_1min_rev[i] = data3_close_1min_rev[i]
end
if (data3_close_1min_rev[i] == 0.0 && data3_close_1min_rev[i1] == 0.0)
data3_open_1min_rev[i] = data3_open_1min_rev[i1]
data3_high_1min_rev[i] = data3_high_1min_rev[i1]
data3_low_1min_rev[i] = data3_low_1min_rev[i1]
data3_close_1min_rev[i] = data3_close_1min_rev[i1]
end
end
# Reverse to normal index position
data3_open_1min = reverse(data3_open_1min_rev)
data3_high_1min = reverse(data3_high_1min_rev)
data3_low_1min = reverse(data3_low_1min_rev)
data3_close_1min = reverse(data3_close_1min_rev)
# Data 1 find half day holidays
# Note I do not include times when market was closed for
# Find days the market closed early using 30 min data as reference point
hol_ref = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/GE/ES Data/30.min.reg.sess.es.txt", types=[String; String; fill(Float64, 4); Int;Int],header=true)
hol_ref_time = hol_ref[:Time]
hol_ref_date = hol_ref[:Date]
holiday_dates = fill("",length(hol_ref_time))
holiday_times = fill("",length(hol_ref_time))
# Extract dates and last time print
for i = 2:size(holiday_times,1)
if (hol_ref_time[i1] != "15:00" && hol_ref_time[i] == "09:00")
holiday_times[i1] = hol_ref_time[i1]
holiday_dates[i1] = hol_ref_date[i1]
end
end
# Late market opens
# http://s3.amazonaws.com/armstrongeconomics-wp/2013/07/NYSE-Closings.pdf
# Only 09/11/2002 Opening delayed until 12:00 noon out of respect for the memorial events commemorating the one-year
# anniversary of the attack on the World Trade Center.
# Other 03/08/1999 was missing 0900 data that TS didnt fill
holiday_dates_late_o = fill("",length(hol_ref_time))
holiday_times_late_o = fill("",length(hol_ref_time))
for i = 2:size(holiday_times,1)
if (hol_ref_time[i1] == "15:00" && hol_ref_time[i] != "09:00")
holiday_times_late_o[i1] = hol_ref_time[i]
holiday_dates_late_o[i1] = hol_ref_date[i]
end
end
# Reduce to remove ""
start_times = holiday_times[holiday_times .!= ""]
holiday_dates = holiday_dates[holiday_dates .!= ""]
end_times = fill("15:00",length(start_times))
# Late opens
dates_late_o = holiday_dates_late_o[holiday_dates_late_o .!= ""]
times_late_o = holiday_times_late_o[holiday_times_late_o .!= ""]
# join date time to one column and concert to datetime format
holiday_date_start_times = String[]
holiday_date_end_times = String[]
temp = String[]
temp1 = String[]
for i in 1:length(holiday_dates)
temp = map(join,zip([holiday_dates[i]],[start_times[i]]), " ")
append!(holiday_date_start_times,temp)
temp1 = map(join,zip([holiday_dates[i]],[end_times[i]]), " ")
append!(holiday_date_end_times,temp1)
end
# Set start and end times as date format
start_date_time = DateTime.(holiday_date_start_times,Dates.DateFormat("mm/dd/yyyy H:M"))
end_date_time = DateTime.(holiday_date_end_times,Dates.DateFormat("mm/dd/yyyy H:M"))
# Late open
# Tag, remember exclude prior to start date, so start is end
#late_open_end_date_time = map(join,zip([dates_late_o[2]],[times_late_o[2]]), " ")
late_open_end_date_time = "09/11/2002 11:00"
late_o_s = "09/11/2002 08:31"
late_open_end_date_time = DateTime(late_open_end_date_time,Dates.DateFormat("mm/dd/yyyy H:M"))
late_open_start_date_time = DateTime(late_o_s,Dates.DateFormat("mm/dd/yyyy H:M"))
# Adjust for holidays
# As indexed from ES daily bars, major closes already included
# This adjusts early closes and late opens
# Data 1
data1_date_1min_i_f = fill("",length(data1_date_1min_i))
data1_time_1min_i_f = fill("",length(data1_time_1min_i))
data1_open_1min_f = zeros(data1_open_1min)
data1_high_1min_f = zeros(data1_high_1min)
data1_low_1min_f = zeros(data1_low_1min)
data1_close_1min_f = zeros(data1_close_1min)
for i in 1:length(data1_close_1min_f)
if (data1_date_time_1min_i[i] <= start_date_time[1]) | (data1_date_time_1min_i[i] > end_date_time[1]) & (data1_date_time_1min_i[i] <= start_date_time[2]) | (data1_date_time_1min_i[i] > end_date_time[2]) &
(data1_date_time_1min_i[i] <= start_date_time[3]) | (data1_date_time_1min_i[i] > end_date_time[3]) & (data1_date_time_1min_i[i] <= start_date_time[4]) | (data1_date_time_1min_i[i] > end_date_time[4]) &
(data1_date_time_1min_i[i] <= start_date_time[5]) | (data1_date_time_1min_i[i] > end_date_time[5]) & (data1_date_time_1min_i[i] <= start_date_time[6]) | (data1_date_time_1min_i[i] > end_date_time[6]) &
(data1_date_time_1min_i[i] <= start_date_time[7]) | (data1_date_time_1min_i[i] > end_date_time[7]) & (data1_date_time_1min_i[i] <= start_date_time[8]) | (data1_date_time_1min_i[i] > end_date_time[8]) &
(data1_date_time_1min_i[i] <= start_date_time[9]) | (data1_date_time_1min_i[i] > end_date_time[9]) & (data1_date_time_1min_i[i] <= start_date_time[10]) | (data1_date_time_1min_i[i] > end_date_time[10]) &
(data1_date_time_1min_i[i] <= start_date_time[11]) | (data1_date_time_1min_i[i] > end_date_time[11]) & (data1_date_time_1min_i[i] <= start_date_time[12]) | (data1_date_time_1min_i[i] > end_date_time[12]) &
(data1_date_time_1min_i[i] <= start_date_time[13]) | (data1_date_time_1min_i[i] > end_date_time[13]) & (data1_date_time_1min_i[i] <= start_date_time[14]) | (data1_date_time_1min_i[i] > end_date_time[14]) &
(data1_date_time_1min_i[i] <= start_date_time[15]) | (data1_date_time_1min_i[i] > end_date_time[15]) & (data1_date_time_1min_i[i] <= start_date_time[16]) | (data1_date_time_1min_i[i] > end_date_time[16]) &
(data1_date_time_1min_i[i] <= start_date_time[17]) | (data1_date_time_1min_i[i] > end_date_time[17]) & (data1_date_time_1min_i[i] <= start_date_time[18]) | (data1_date_time_1min_i[i] > end_date_time[18]) &
(data1_date_time_1min_i[i] <= start_date_time[19]) | (data1_date_time_1min_i[i] > end_date_time[19]) & (data1_date_time_1min_i[i] <= start_date_time[20]) | (data1_date_time_1min_i[i] > end_date_time[20]) &
(data1_date_time_1min_i[i] <= start_date_time[21]) | (data1_date_time_1min_i[i] > end_date_time[21]) & (data1_date_time_1min_i[i] <= start_date_time[22]) | (data1_date_time_1min_i[i] > end_date_time[22]) &
(data1_date_time_1min_i[i] <= start_date_time[23]) | (data1_date_time_1min_i[i] > end_date_time[23]) & (data1_date_time_1min_i[i] <= start_date_time[24]) | (data1_date_time_1min_i[i] > end_date_time[24]) &
(data1_date_time_1min_i[i] <= start_date_time[25]) | (data1_date_time_1min_i[i] > end_date_time[25]) & (data1_date_time_1min_i[i] <= start_date_time[26]) | (data1_date_time_1min_i[i] > end_date_time[26]) &
(data1_date_time_1min_i[i] <= start_date_time[27]) | (data1_date_time_1min_i[i] > end_date_time[27]) & (data1_date_time_1min_i[i] <= start_date_time[28]) | (data1_date_time_1min_i[i] > end_date_time[28]) &
(data1_date_time_1min_i[i] <= start_date_time[29]) | (data1_date_time_1min_i[i] > end_date_time[29]) & (data1_date_time_1min_i[i] <= start_date_time[30]) | (data1_date_time_1min_i[i] > end_date_time[30]) &
(data1_date_time_1min_i[i] <= start_date_time[31]) | (data1_date_time_1min_i[i] > end_date_time[31]) & (data1_date_time_1min_i[i] <= start_date_time[32]) | (data1_date_time_1min_i[i] > end_date_time[32]) &
(data1_date_time_1min_i[i] <= start_date_time[33]) | (data1_date_time_1min_i[i] > end_date_time[33]) & (data1_date_time_1min_i[i] <= start_date_time[34]) | (data1_date_time_1min_i[i] > end_date_time[34]) &
(data1_date_time_1min_i[i] <= start_date_time[35]) | (data1_date_time_1min_i[i] > end_date_time[35]) & (data1_date_time_1min_i[i] <= start_date_time[36]) | (data1_date_time_1min_i[i] > end_date_time[36]) &
(data1_date_time_1min_i[i] <= start_date_time[37]) | (data1_date_time_1min_i[i] > end_date_time[37]) & (data1_date_time_1min_i[i] <= start_date_time[38]) | (data1_date_time_1min_i[i] > end_date_time[38]) &
(data1_date_time_1min_i[i] <= start_date_time[39]) | (data1_date_time_1min_i[i] > end_date_time[39]) & (data1_date_time_1min_i[i] <= start_date_time[40]) | (data1_date_time_1min_i[i] > end_date_time[40]) &
(data1_date_time_1min_i[i] <= start_date_time[41]) | (data1_date_time_1min_i[i] > end_date_time[41]) & (data1_date_time_1min_i[i] <= start_date_time[42]) | (data1_date_time_1min_i[i] > end_date_time[42]) &
(data1_date_time_1min_i[i] <= start_date_time[43]) | (data1_date_time_1min_i[i] > end_date_time[43]) & (data1_date_time_1min_i[i] <= start_date_time[44]) | (data1_date_time_1min_i[i] > end_date_time[44]) &
(data1_date_time_1min_i[i] <= start_date_time[45]) | (data1_date_time_1min_i[i] > end_date_time[45]) & (data1_date_time_1min_i[i] <= start_date_time[46]) | (data1_date_time_1min_i[i] > end_date_time[46]) &
(data1_date_time_1min_i[i] <= start_date_time[47]) | (data1_date_time_1min_i[i] > end_date_time[47]) & (data1_date_time_1min_i[i] <= start_date_time[48]) | (data1_date_time_1min_i[i] > end_date_time[48]) &
(data1_date_time_1min_i[i] <= start_date_time[49]) | (data1_date_time_1min_i[i] > end_date_time[49]) & (data1_date_time_1min_i[i] <= start_date_time[50]) | (data1_date_time_1min_i[i] > end_date_time[50]) &
(data1_date_time_1min_i[i] <= start_date_time[51]) | (data1_date_time_1min_i[i] > end_date_time[51])
# Add all days
data1_date_1min_i_f[i] = data1_date_1min_i[i]
data1_time_1min_i_f[i] = data1_time_1min_i[i]
data1_open_1min_f[i] = data1_open_1min[i]
data1_high_1min_f[i] = data1_high_1min[i]
data1_low_1min_f[i] = data1_low_1min[i]
data1_close_1min_f[i] = data1_close_1min[i]
end
end
# Loop to mark late open with "" 0.0
data1_date_1min_i_filt = fill("",length(data1_date_1min_i))
data1_time_1min_i_filt = fill("",length(data1_time_1min_i))
data1_open_1min_filt = zeros(data1_open_1min)
data1_high_1min_filt = zeros(data1_high_1min)
data1_low_1min_filt = zeros(data1_low_1min)
data1_close_1min_filt = zeros(data1_close_1min)
for i in 1:length(data1_close_1min_filt)
if (data1_date_time_1min_i[i] < late_open_start_date_time) | (data1_date_time_1min_i[i] >= late_open_end_date_time)
data1_date_1min_i_filt[i] = data1_date_1min_i_f[i]
data1_time_1min_i_filt[i] = data1_time_1min_i_f[i]
data1_open_1min_filt[i] = data1_open_1min_f[i]
data1_high_1min_filt[i] = data1_high_1min_f[i]
data1_low_1min_filt[i] = data1_low_1min_f[i]
data1_close_1min_filt[i] = data1_close_1min_f[i]
end
end
# Reduce To Remove all holiday half days ("" or 0.0)
# Data 1
data1_date_1min_i = data1_date_1min_i_filt[data1_date_1min_i_filt .!= ""]
data1_time_1min_i = data1_time_1min_i_filt[data1_time_1min_i_filt .!= ""]
data1_open_1min = data1_open_1min_filt[data1_open_1min_filt .!= 0]
data1_high_1min = data1_high_1min_filt[data1_high_1min_filt .!= 0]
data1_low_1min = data1_low_1min_filt[data1_low_1min_filt .!= 0]
data1_close_1min = data1_close_1min_filt[data1_close_1min_filt .!= 0]
# Validation checks
# Test that the holiday dates have been shortened to half days
#time_out = fill("",length(data1_date_1min_i_filt))
#test_date = data1_date_1min_i_filt[data1_date_1min_i_filt .== "09/11/2002"]
#for i in 1:length(data1_date_1min_i_filt)
#if data1_date_1min_i_filt[i] == "09/11/2002"
# time_out[i] = data1_time_1min_i_filt[i]
#else
# time_out[i] = ""
#end
#end
#time_out = time_out[time_out .!= ""] # reduce to remove 0
# Adjust for holidays
# As indexed from ES daily bars, major closes already included
# This adjusts early closes and late opens
# Data 2
data2_date_1min_i_f = fill("",length(data2_date_1min_i))
data2_time_1min_i_f = fill("",length(data2_time_1min_i))
data2_open_1min_f = zeros(data2_open_1min)
data2_high_1min_f = zeros(data2_high_1min)
data2_low_1min_f = zeros(data2_low_1min)
data2_close_1min_f = zeros(data2_close_1min)
for i in 1:length(data2_close_1min_f)
if (data2_date_time_1min_i[i] <= start_date_time[1]) | (data2_date_time_1min_i[i] > end_date_time[1]) & (data2_date_time_1min_i[i] <= start_date_time[2]) | (data2_date_time_1min_i[i] > end_date_time[2]) &
(data2_date_time_1min_i[i] <= start_date_time[3]) | (data2_date_time_1min_i[i] > end_date_time[3]) & (data2_date_time_1min_i[i] <= start_date_time[4]) | (data2_date_time_1min_i[i] > end_date_time[4]) &
(data2_date_time_1min_i[i] <= start_date_time[5]) | (data2_date_time_1min_i[i] > end_date_time[5]) & (data2_date_time_1min_i[i] <= start_date_time[6]) | (data2_date_time_1min_i[i] > end_date_time[6]) &
(data2_date_time_1min_i[i] <= start_date_time[7]) | (data2_date_time_1min_i[i] > end_date_time[7]) & (data2_date_time_1min_i[i] <= start_date_time[8]) | (data2_date_time_1min_i[i] > end_date_time[8]) &
(data2_date_time_1min_i[i] <= start_date_time[9]) | (data2_date_time_1min_i[i] > end_date_time[9]) & (data2_date_time_1min_i[i] <= start_date_time[10]) | (data2_date_time_1min_i[i] > end_date_time[10]) &
(data2_date_time_1min_i[i] <= start_date_time[11]) | (data2_date_time_1min_i[i] > end_date_time[11]) & (data2_date_time_1min_i[i] <= start_date_time[12]) | (data2_date_time_1min_i[i] > end_date_time[12]) &
(data2_date_time_1min_i[i] <= start_date_time[13]) | (data2_date_time_1min_i[i] > end_date_time[13]) & (data2_date_time_1min_i[i] <= start_date_time[14]) | (data2_date_time_1min_i[i] > end_date_time[14]) &
(data2_date_time_1min_i[i] <= start_date_time[15]) | (data2_date_time_1min_i[i] > end_date_time[15]) & (data2_date_time_1min_i[i] <= start_date_time[16]) | (data2_date_time_1min_i[i] > end_date_time[16]) &
(data2_date_time_1min_i[i] <= start_date_time[17]) | (data2_date_time_1min_i[i] > end_date_time[17]) & (data2_date_time_1min_i[i] <= start_date_time[18]) | (data2_date_time_1min_i[i] > end_date_time[18]) &
(data2_date_time_1min_i[i] <= start_date_time[19]) | (data2_date_time_1min_i[i] > end_date_time[19]) & (data2_date_time_1min_i[i] <= start_date_time[20]) | (data2_date_time_1min_i[i] > end_date_time[20]) &
(data2_date_time_1min_i[i] <= start_date_time[21]) | (data2_date_time_1min_i[i] > end_date_time[21]) & (data2_date_time_1min_i[i] <= start_date_time[22]) | (data2_date_time_1min_i[i] > end_date_time[22]) &
(data2_date_time_1min_i[i] <= start_date_time[23]) | (data2_date_time_1min_i[i] > end_date_time[23]) & (data2_date_time_1min_i[i] <= start_date_time[24]) | (data2_date_time_1min_i[i] > end_date_time[24]) &
(data2_date_time_1min_i[i] <= start_date_time[25]) | (data2_date_time_1min_i[i] > end_date_time[25]) & (data2_date_time_1min_i[i] <= start_date_time[26]) | (data2_date_time_1min_i[i] > end_date_time[26]) &
(data2_date_time_1min_i[i] <= start_date_time[27]) | (data2_date_time_1min_i[i] > end_date_time[27]) & (data2_date_time_1min_i[i] <= start_date_time[28]) | (data2_date_time_1min_i[i] > end_date_time[28]) &
(data2_date_time_1min_i[i] <= start_date_time[29]) | (data2_date_time_1min_i[i] > end_date_time[29]) & (data2_date_time_1min_i[i] <= start_date_time[30]) | (data2_date_time_1min_i[i] > end_date_time[30]) &
(data2_date_time_1min_i[i] <= start_date_time[31]) | (data2_date_time_1min_i[i] > end_date_time[31]) & (data2_date_time_1min_i[i] <= start_date_time[32]) | (data2_date_time_1min_i[i] > end_date_time[32]) &
(data2_date_time_1min_i[i] <= start_date_time[33]) | (data2_date_time_1min_i[i] > end_date_time[33]) & (data2_date_time_1min_i[i] <= start_date_time[34]) | (data2_date_time_1min_i[i] > end_date_time[34]) &
(data2_date_time_1min_i[i] <= start_date_time[35]) | (data2_date_time_1min_i[i] > end_date_time[35]) & (data2_date_time_1min_i[i] <= start_date_time[36]) | (data2_date_time_1min_i[i] > end_date_time[36]) &
(data2_date_time_1min_i[i] <= start_date_time[37]) | (data2_date_time_1min_i[i] > end_date_time[37]) & (data2_date_time_1min_i[i] <= start_date_time[38]) | (data2_date_time_1min_i[i] > end_date_time[38]) &
(data2_date_time_1min_i[i] <= start_date_time[39]) | (data2_date_time_1min_i[i] > end_date_time[39]) & (data2_date_time_1min_i[i] <= start_date_time[40]) | (data2_date_time_1min_i[i] > end_date_time[40]) &
(data2_date_time_1min_i[i] <= start_date_time[41]) | (data2_date_time_1min_i[i] > end_date_time[41]) & (data2_date_time_1min_i[i] <= start_date_time[42]) | (data2_date_time_1min_i[i] > end_date_time[42]) &
(data2_date_time_1min_i[i] <= start_date_time[43]) | (data2_date_time_1min_i[i] > end_date_time[43]) & (data2_date_time_1min_i[i] <= start_date_time[44]) | (data2_date_time_1min_i[i] > end_date_time[44]) &
(data2_date_time_1min_i[i] <= start_date_time[45]) | (data2_date_time_1min_i[i] > end_date_time[45]) & (data2_date_time_1min_i[i] <= start_date_time[46]) | (data2_date_time_1min_i[i] > end_date_time[46]) &
(data2_date_time_1min_i[i] <= start_date_time[47]) | (data2_date_time_1min_i[i] > end_date_time[47]) & (data2_date_time_1min_i[i] <= start_date_time[48]) | (data2_date_time_1min_i[i] > end_date_time[48]) &
(data2_date_time_1min_i[i] <= start_date_time[49]) | (data2_date_time_1min_i[i] > end_date_time[49]) & (data2_date_time_1min_i[i] <= start_date_time[50]) | (data2_date_time_1min_i[i] > end_date_time[50]) &
(data2_date_time_1min_i[i] <= start_date_time[51]) | (data2_date_time_1min_i[i] > end_date_time[51])
# Add all days
data2_date_1min_i_f[i] = data2_date_1min_i[i]
data2_time_1min_i_f[i] = data2_time_1min_i[i]
data2_open_1min_f[i] = data2_open_1min[i]
data2_high_1min_f[i] = data2_high_1min[i]
data2_low_1min_f[i] = data2_low_1min[i]
data2_close_1min_f[i] = data2_close_1min[i]
end
end
# Loop to mark late open with "" 0.0
data2_date_1min_i_filt = fill("",length(data2_date_1min_i))
data2_time_1min_i_filt = fill("",length(data2_time_1min_i))
data2_open_1min_filt = zeros(data2_open_1min)
data2_high_1min_filt = zeros(data2_high_1min)
data2_low_1min_filt = zeros(data2_low_1min)
data2_close_1min_filt = zeros(data2_close_1min)
for i in 1:length(data2_close_1min_filt)
if (data2_date_time_1min_i[i] < late_open_start_date_time) | (data2_date_time_1min_i[i] >= late_open_end_date_time)
data2_date_1min_i_filt[i] = data2_date_1min_i_f[i]
data2_time_1min_i_filt[i] = data2_time_1min_i_f[i]
data2_open_1min_filt[i] = data2_open_1min_f[i]
data2_high_1min_filt[i] = data2_high_1min_f[i]
data2_low_1min_filt[i] = data2_low_1min_f[i]
data2_close_1min_filt[i] = data2_close_1min_f[i]
end
end
# Reduce To Remove all holiday half days ("" or 0.0)
# Data 2
data2_date_1min_i = data2_date_1min_i_filt[data2_date_1min_i_filt .!= ""]
data2_time_1min_i = data2_time_1min_i_filt[data2_time_1min_i_filt .!= ""]
data2_open_1min = data2_open_1min_filt[data2_open_1min_filt .!= 0]
data2_high_1min = data2_high_1min_filt[data2_high_1min_filt .!= 0]
data2_low_1min = data2_low_1min_filt[data2_low_1min_filt .!= 0]
data2_close_1min = data2_close_1min_filt[data2_close_1min_filt .!= 0]
# Validation checks
# Test that the holiday dates have been shortened to half days
#time_out = fill("",length(data2_date_1min_i_filt))
#test_date = data2_date_1min_i_filt[data2_date_1min_i_filt .== "09/11/2002"]
#for i in 1:length(data2_date_1min_i_filt)
#if data2_date_1min_i_filt[i] == "09/11/2002"
# time_out[i] = data2_time_1min_i_filt[i]
#else
# time_out[i] = ""
#end
#end
#time_out = time_out[time_out .!= ""] # reduce to remove 0
# Adjust for holidays
# As indexed from ES daily bars, major closes already included
# This adjusts early closes and late opens
# Data 3
data3_date_1min_i_f = fill("",length(data3_date_1min_i))
data3_time_1min_i_f = fill("",length(data3_time_1min_i))
data3_open_1min_f = zeros(data3_open_1min)
data3_high_1min_f = zeros(data3_high_1min)
data3_low_1min_f = zeros(data3_low_1min)
data3_close_1min_f = zeros(data3_close_1min)
for i in 1:length(data3_close_1min_f)
if (data3_date_time_1min_i[i] <= start_date_time[1]) | (data3_date_time_1min_i[i] > end_date_time[1]) & (data3_date_time_1min_i[i] <= start_date_time[2]) | (data3_date_time_1min_i[i] > end_date_time[2]) &
(data3_date_time_1min_i[i] <= start_date_time[3]) | (data3_date_time_1min_i[i] > end_date_time[3]) & (data3_date_time_1min_i[i] <= start_date_time[4]) | (data3_date_time_1min_i[i] > end_date_time[4]) &
(data3_date_time_1min_i[i] <= start_date_time[5]) | (data3_date_time_1min_i[i] > end_date_time[5]) & (data3_date_time_1min_i[i] <= start_date_time[6]) | (data3_date_time_1min_i[i] > end_date_time[6]) &
(data3_date_time_1min_i[i] <= start_date_time[7]) | (data3_date_time_1min_i[i] > end_date_time[7]) & (data3_date_time_1min_i[i] <= start_date_time[8]) | (data3_date_time_1min_i[i] > end_date_time[8]) &
(data3_date_time_1min_i[i] <= start_date_time[9]) | (data3_date_time_1min_i[i] > end_date_time[9]) & (data3_date_time_1min_i[i] <= start_date_time[10]) | (data3_date_time_1min_i[i] > end_date_time[10]) &
(data3_date_time_1min_i[i] <= start_date_time[11]) | (data3_date_time_1min_i[i] > end_date_time[11]) & (data3_date_time_1min_i[i] <= start_date_time[12]) | (data3_date_time_1min_i[i] > end_date_time[12]) &
(data3_date_time_1min_i[i] <= start_date_time[13]) | (data3_date_time_1min_i[i] > end_date_time[13]) & (data3_date_time_1min_i[i] <= start_date_time[14]) | (data3_date_time_1min_i[i] > end_date_time[14]) &
(data3_date_time_1min_i[i] <= start_date_time[15]) | (data3_date_time_1min_i[i] > end_date_time[15]) & (data3_date_time_1min_i[i] <= start_date_time[16]) | (data3_date_time_1min_i[i] > end_date_time[16]) &
(data3_date_time_1min_i[i] <= start_date_time[17]) | (data3_date_time_1min_i[i] > end_date_time[17]) & (data3_date_time_1min_i[i] <= start_date_time[18]) | (data3_date_time_1min_i[i] > end_date_time[18]) &
(data3_date_time_1min_i[i] <= start_date_time[19]) | (data3_date_time_1min_i[i] > end_date_time[19]) & (data3_date_time_1min_i[i] <= start_date_time[20]) | (data3_date_time_1min_i[i] > end_date_time[20]) &
(data3_date_time_1min_i[i] <= start_date_time[21]) | (data3_date_time_1min_i[i] > end_date_time[21]) & (data3_date_time_1min_i[i] <= start_date_time[22]) | (data3_date_time_1min_i[i] > end_date_time[22]) &
(data3_date_time_1min_i[i] <= start_date_time[23]) | (data3_date_time_1min_i[i] > end_date_time[23]) & (data3_date_time_1min_i[i] <= start_date_time[24]) | (data3_date_time_1min_i[i] > end_date_time[24]) &
(data3_date_time_1min_i[i] <= start_date_time[25]) | (data3_date_time_1min_i[i] > end_date_time[25]) & (data3_date_time_1min_i[i] <= start_date_time[26]) | (data3_date_time_1min_i[i] > end_date_time[26]) &
(data3_date_time_1min_i[i] <= start_date_time[27]) | (data3_date_time_1min_i[i] > end_date_time[27]) & (data3_date_time_1min_i[i] <= start_date_time[28]) | (data3_date_time_1min_i[i] > end_date_time[28]) &
(data3_date_time_1min_i[i] <= start_date_time[29]) | (data3_date_time_1min_i[i] > end_date_time[29]) & (data3_date_time_1min_i[i] <= start_date_time[30]) | (data3_date_time_1min_i[i] > end_date_time[30]) &
(data3_date_time_1min_i[i] <= start_date_time[31]) | (data3_date_time_1min_i[i] > end_date_time[31]) & (data3_date_time_1min_i[i] <= start_date_time[32]) | (data3_date_time_1min_i[i] > end_date_time[32]) &
(data3_date_time_1min_i[i] <= start_date_time[33]) | (data3_date_time_1min_i[i] > end_date_time[33]) & (data3_date_time_1min_i[i] <= start_date_time[34]) | (data3_date_time_1min_i[i] > end_date_time[34]) &
(data3_date_time_1min_i[i] <= start_date_time[35]) | (data3_date_time_1min_i[i] > end_date_time[35]) & (data3_date_time_1min_i[i] <= start_date_time[36]) | (data3_date_time_1min_i[i] > end_date_time[36]) &
(data3_date_time_1min_i[i] <= start_date_time[37]) | (data3_date_time_1min_i[i] > end_date_time[37]) & (data3_date_time_1min_i[i] <= start_date_time[38]) | (data3_date_time_1min_i[i] > end_date_time[38]) &
(data3_date_time_1min_i[i] <= start_date_time[39]) | (data3_date_time_1min_i[i] > end_date_time[39]) & (data3_date_time_1min_i[i] <= start_date_time[40]) | (data3_date_time_1min_i[i] > end_date_time[40]) &
(data3_date_time_1min_i[i] <= start_date_time[41]) | (data3_date_time_1min_i[i] > end_date_time[41]) & (data3_date_time_1min_i[i] <= start_date_time[42]) | (data3_date_time_1min_i[i] > end_date_time[42]) &
(data3_date_time_1min_i[i] <= start_date_time[43]) | (data3_date_time_1min_i[i] > end_date_time[43]) & (data3_date_time_1min_i[i] <= start_date_time[44]) | (data3_date_time_1min_i[i] > end_date_time[44]) &
(data3_date_time_1min_i[i] <= start_date_time[45]) | (data3_date_time_1min_i[i] > end_date_time[45]) & (data3_date_time_1min_i[i] <= start_date_time[46]) | (data3_date_time_1min_i[i] > end_date_time[46]) &
(data3_date_time_1min_i[i] <= start_date_time[47]) | (data3_date_time_1min_i[i] > end_date_time[47]) & (data3_date_time_1min_i[i] <= start_date_time[48]) | (data3_date_time_1min_i[i] > end_date_time[48]) &
(data3_date_time_1min_i[i] <= start_date_time[49]) | (data3_date_time_1min_i[i] > end_date_time[49]) & (data3_date_time_1min_i[i] <= start_date_time[50]) | (data3_date_time_1min_i[i] > end_date_time[50]) &
(data3_date_time_1min_i[i] <= start_date_time[51]) | (data3_date_time_1min_i[i] > end_date_time[51])
# Add all days
data3_date_1min_i_f[i] = data3_date_1min_i[i]
data3_time_1min_i_f[i] = data3_time_1min_i[i]
data3_open_1min_f[i] = data3_open_1min[i]
data3_high_1min_f[i] = data3_high_1min[i]
data3_low_1min_f[i] = data3_low_1min[i]
data3_close_1min_f[i] = data3_close_1min[i]
end
end
# Loop to mark late open with "" 0.0
data3_date_1min_i_filt = fill("",length(data3_date_1min_i))
data3_time_1min_i_filt = fill("",length(data3_time_1min_i))
data3_open_1min_filt = zeros(data3_open_1min)
data3_high_1min_filt = zeros(data3_high_1min)
data3_low_1min_filt = zeros(data3_low_1min)
data3_close_1min_filt = zeros(data3_close_1min)
for i in 1:length(data3_close_1min_filt)
if (data3_date_time_1min_i[i] < late_open_start_date_time) | (data3_date_time_1min_i[i] >= late_open_end_date_time)
data3_date_1min_i_filt[i] = data3_date_1min_i_f[i]
data3_time_1min_i_filt[i] = data3_time_1min_i_f[i]
data3_open_1min_filt[i] = data3_open_1min_f[i]
data3_high_1min_filt[i] = data3_high_1min_f[i]
data3_low_1min_filt[i] = data3_low_1min_f[i]
data3_close_1min_filt[i] = data3_close_1min_f[i]
end
end
# Reduce To Remove all holiday half days ("" or 0.0)
# Data 3
data3_date_1min_i = data3_date_1min_i_filt[data3_date_1min_i_filt .!= ""]
data3_time_1min_i = data3_time_1min_i_filt[data3_time_1min_i_filt .!= ""]
data3_open_1min = data3_open_1min_filt[data3_open_1min_filt .!= 0]
data3_high_1min = data3_high_1min_filt[data3_high_1min_filt .!= 0]
data3_low_1min = data3_low_1min_filt[data3_low_1min_filt .!= 0]
data3_close_1min = data3_close_1min_filt[data3_close_1min_filt .!= 0]
# Validation checks
# Test that the holiday dates have been shortened to half days
#time_out = fill("",length(data3_date_1min_i_filt))
#test_date = data3_date_1min_i_filt[data3_date_1min_i_filt .== "09/11/2002"]
#for i in 1:length(data3_date_1min_i_filt)
#if data3_date_1min_i_filt[i] == "09/11/2002"
# time_out[i] = data3_time_1min_i_filt[i]
#else
# time_out[i] = ""
#end
#end
#time_out = time_out[time_out .!= ""] # reduce to remove 0
# Carry over high, low and open to same index position as the close
# Data 1 index marker
data1_time_index = zeros(size(data1_close_1min,1))
for i = 1:size(data1_close_1min,1)
temp = data1_time_1min_i[i]
if temp[4:5] == "01" || temp[4:5] == "31"
data1_time_index[i] = 1.0
else
data1_time_index[i] = 0.0
end
end
# verify
#all = hcat(data1_time_index[487140:487700],data1_time_1min_i[487140:487700],data1_date_1min_i[487140:487700],data1_open_1min[487140:487700],data1_high_1min[487140:487700],data1_low_1min[487140:487700],data1_close_1min[487140:487700],data1_h_carry[487140:487700],data1_l_carry[487140:487700])
#all = hcat(data1_time_index[1:32],data1_time_1min_i_filt[1:32])
#all = DataFrame(all)
# Data 2 index marker
data2_time_index = zeros(size(data2_close_1min,1))
for i = 1:size(data2_close_1min,1)
temp = data2_time_1min_i[i]
if temp[4:5] == "01" || temp[4:5] == "16" || temp[4:5] == "31" || temp[4:5] == "46"
data2_time_index[i] = 1.0
else
data2_time_index[i] = 0.0
end
end
# Data 3 index marker
data3_time_index = zeros(size(data3_close_1min,1))
for i = 1:size(data3_close_1min,1)
temp = data3_time_1min_i[i]
if temp[4:5] == "01" || temp[4:5] == "11" || temp[4:5] == "21" || temp[4:5] == "31" || temp[4:5] == "41" || temp[4:5] == "51"
data3_time_index[i] = 1.0
else
data3_time_index[i] = 0.0
end
end
# Forward carry high
# Data 1
data1_h_carry = zeros(size(data1_high_1min,1))
temp = data1_high_1min[1] # Set static variable to compare too
tempsave = zeros(data1_high_1min) # Set this to see what is happening with temp
for i = 2:size(data1_high_1min,1) # start on 2nd i as using [i-1]
if data1_high_1min[i] > temp # if current dummy[i] over temp (at start this evaluates to dummy[1]) and temp is again updated later..
data1_h_carry[i] = data1_high_1min[i] # as new high save new high to dummy[i]
temp = data1_h_carry[i] # update temp with new high
tempsave[i] = temp # track temp
else
data1_h_carry[i] = data1_h_carry[i1] # if not a new high then use previous high found
temp = data1_h_carry[i1] # update temp with prev. high found
tempsave[i] = temp # track temp
end
if data1_time_index[i] == 1.0 # if time index is 1
temp = data1_high_1min[i] # reset temp with dummy[i]
data1_h_carry[i] = temp # add dummy[i] to running
end
end
# Forward carry low
# Data 1
data1_l_carry = zeros(size(data1_low_1min,1))
temp = data1_low_1min[1] # Set static variable to compare too
tempsave = zeros(data1_low_1min) # Set this to see what is happening with temp
for i = 2:size(data1_low_1min,1) # start on 2nd i as using [i-1]
if data1_low_1min[i] < temp # if current dummy[i] over temp (at start this evaluates to dummy[1]) and temp is again updated later..
data1_l_carry[i] = data1_low_1min[i] # as new high save new high to dummy[i]
temp = data1_l_carry[i] # update temp with new high
tempsave[i] = temp # track temp
else
data1_l_carry[i] = data1_l_carry[i1] # if not a new high then use previous high found
temp = data1_l_carry[i1] # update temp with prev. high found
tempsave[i] = temp # track temp
end
if data1_time_index[i] == 1.0 # if time index is 1
temp = data1_low_1min[i] # reset temp with dummy[i]
data1_l_carry[i] = temp # add dummy[i] to running
end
end
# Forward carry high
# Data 2
data2_h_carry = zeros(size(data2_high_1min,1))
temp = data2_high_1min[1] # Set static variable to compare too
tempsave = zeros(data2_high_1min) # Set this to see what is happening with temp
for i = 2:size(data2_high_1min,1) # start on 2nd i as using [i-1]
if data2_high_1min[i] > temp # if current dummy[i] over temp (at start this evaluates to dummy[1]) and temp is again updated later..
data2_h_carry[i] = data2_high_1min[i] # as new high save new high to dummy[i]
temp = data2_h_carry[i] # update temp with new high
tempsave[i] = temp # track temp
else
data2_h_carry[i] = data2_h_carry[i1] # if not a new high then use previous high found
temp = data2_h_carry[i1] # update temp with prev. high found
tempsave[i] = temp # track temp
end
if data2_time_index[i] == 1.0 # if time index is 1
temp = data2_high_1min[i] # reset temp with dummy[i]
data2_h_carry[i] = temp # add dummy[i] to running
end
end
# Forward carry low
# Data 2
data2_l_carry = zeros(size(data2_low_1min,1))
temp = data2_low_1min[1] # Set static variable to compare too
tempsave = zeros(data2_low_1min) # Set this to see what is happening with temp
for i = 2:size(data2_low_1min,1) # start on 2nd i as using [i-1]
if data2_low_1min[i] < temp # if current dummy[i] over temp (at start this evaluates to dummy[1]) and temp is again updated later..
data2_l_carry[i] = data2_low_1min[i] # as new high save new high to dummy[i]
temp = data2_l_carry[i] # update temp with new high
tempsave[i] = temp # track temp
else
data2_l_carry[i] = data2_l_carry[i1] # if not a new high then use previous high found
temp = data2_l_carry[i1] # update temp with prev. high found
tempsave[i] = temp # track temp
end
if data2_time_index[i] == 1.0 # if time index is 1
temp = data2_low_1min[i] # reset temp with dummy[i]
data2_l_carry[i] = temp # add dummy[i] to running
end
end
# Forward carry high
# Data 3
data3_h_carry = zeros(size(data3_high_1min,1))
temp = data3_high_1min[1] # Set static variable to compare too
tempsave = zeros(data3_high_1min) # Set this to see what is happening with temp
for i = 2:size(data3_high_1min,1) # start on 2nd i as using [i-1]
if data3_high_1min[i] > temp # if current dummy[i] over temp (at start this evaluates to dummy[1]) and temp is again updated later..
data3_h_carry[i] = data3_high_1min[i] # as new high save new high to dummy[i]
temp = data3_h_carry[i] # update temp with new high
tempsave[i] = temp # track temp
else
data3_h_carry[i] = data3_h_carry[i1] # if not a new high then use previous high found
temp = data3_h_carry[i1] # update temp with prev. high found
tempsave[i] = temp # track temp
end
if data3_time_index[i] == 1.0 # if time index is 1
temp = data3_high_1min[i] # reset temp with dummy[i]
data3_h_carry[i] = temp # add dummy[i] to running
end
end
# Forward carry low
# Data 3
data3_l_carry = zeros(size(data3_low_1min,1))
temp = data3_low_1min[1] # Set static variable to compare too
tempsave = zeros(data3_low_1min) # Set this to see what is happening with temp
for i = 2:size(data3_low_1min,1) # start on 2nd i as using [i-1]
if data3_low_1min[i] < temp # if current dummy[i] over temp (at start this evaluates to dummy[1]) and temp is again updated later..
data3_l_carry[i] = data3_low_1min[i] # as new high save new high to dummy[i]
temp = data3_l_carry[i] # update temp with new high
tempsave[i] = temp # track temp
else
data3_l_carry[i] = data3_l_carry[i1] # if not a new high then use previous high found
temp = data3_l_carry[i1] # update temp with prev. high found
tempsave[i] = temp # track temp
end
if data3_time_index[i] == 1.0 # if time index is 1
temp = data3_low_1min[i] # reset temp with dummy[i]
data3_l_carry[i] = temp # add dummy[i] to running
end
end
# Validate output is as should be
all = hcat(data1_low_1min[1:31],data1_time_index[1:31],data1_l_carry[1:31],tempsave[1:31])
all = DataFrame(all)
print(all)
# Make DateTime
# Thus we can ignore the condition when market opened late 09/11/2002
# Data 1
data1_date_time = String[]
temp = String[]
for i in 1:length(data1_date_1min_i)
temp = map(join,zip([data1_date_1min_i[i]],[data1_time_1min_i[i]]), " ")
append!(data1_date_time,temp)
end
data1_date_time = DateTime.(data1_date_time,Dates.DateFormat("mm/dd/yyyy H:M"))
# Date 2
data2_date_time = String[]
temp = String[]
for i in 1:length(data2_date_1min_i)
temp = map(join,zip([data2_date_1min_i[i]],[data2_time_1min_i[i]]), " ")
append!(data2_date_time,temp)
end
data2_date_time = DateTime.(data2_date_time,Dates.DateFormat("mm/dd/yyyy H:M"))
# Data 3
data3_date_time = String[]
temp = String[]
for i in 1:length(data3_date_1min_i)
temp = map(join,zip([data3_date_1min_i[i]],[data3_time_1min_i[i]]), " ")
append!(data3_date_time,temp)
end
data3_date_time = DateTime.(data3_date_time,Dates.DateFormat("mm/dd/yyyy H:M"))
# Define Data1, Data2, Data3 time samplings
# 30 min, 45min etc..
data1_date = fill("",length(data1_date_1min_i))
data1_time = fill("",length(data1_time_1min_i))
data1_o = zeros(size(data1_close_1min,1))
data1_h = zeros(size(data1_close_1min,1))
data1_l = zeros(size(data1_close_1min,1))
data1_c = zeros(size(data1_close_1min,1))
i=1
for i in 1:length(data1_close_1min)
temp = data1_time_1min_i[i]
if (temp[4:5] == "00" || temp[4:5] == "30") & (data1_date_time[i] != late_open_end_date_time) # set close price at end of 30 minute bar
data1_date[i] = data1_date_1min_i[i] # set date time index to close bar
data1_time[i] = data1_time_1min_i[i] # set date time index to close bar
data1_c[i] = data1_close_1min[i]
data1_h[i] = data1_h_carry[i]
data1_l[i] = data1_l_carry[i]
else
data1_date[i] = ""
data1_time[i] = ""
data1_c[i] = 0.0
end
if temp[4:5] == "01" || temp[4:5] == "31" # Set open price
data1_o[i] = data1_open_1min[i]
else
data1_o[i] = 0.0
end
end
# Data 2
data2_date = fill("",length(data2_date_1min_i))
data2_time = fill("",length(data2_time_1min_i))
data2_o = zeros(size(data2_close_1min,1))
data2_h = zeros(size(data2_close_1min,1))
data2_l = zeros(size(data2_close_1min,1))
data2_c = zeros(size(data2_close_1min,1))
i=1
for i in 1:length(data2_close_1min)
temp = data2_time_1min_i[i]
if (temp[4:5] == "00" || temp[4:5] == "15" || temp[4:5] == "30" || temp[4:5] == "45") & (data2_date_time[i] != late_open_end_date_time) # set close price at end of 30 minute bar
data2_date[i] = data2_date_1min_i[i] # set date time index to close bar
data2_time[i] = data2_time_1min_i[i] # set date time index to close bar
data2_c[i] = data2_close_1min[i]
data2_h[i] = data2_h_carry[i]
data2_l[i] = data2_l_carry[i]
else
data2_date[i] = ""
data2_time[i] = ""
data2_c[i] = 0.0
end
if (temp[4:5] == "01" || temp[4:5] == "16" || temp[4:5] == "31" || temp[4:5] == "46") # Set open price
data2_o[i] = data2_open_1min[i]
else
data2_o[i] = 0.0
end
end
# Data 3
data3_date = fill("",length(data3_date_1min_i))
data3_time = fill("",length(data3_time_1min_i))
data3_o = zeros(size(data3_close_1min,1))
data3_h = zeros(size(data3_close_1min,1))
data3_l = zeros(size(data3_close_1min,1))
data3_c = zeros(size(data3_close_1min,1))
i=1
for i in 1:length(data3_close_1min)
temp = data3_time_1min_i[i]
if (temp[4:5] == "00" || temp[4:5] == "10" || temp[4:5] == "20" || temp[4:5] == "30" || temp[4:5] == "40" || temp[4:5] == "50") & (data3_date_time[i] != late_open_end_date_time)
data3_date[i] = data3_date_1min_i[i] # set date time index to close bar
data3_time[i] = data3_time_1min_i[i] # set date time index to close bar
data3_c[i] = data3_close_1min[i]
data3_h[i] = data3_h_carry[i]
data3_l[i] = data3_l_carry[i]
else
data3_date[i] = ""
data3_time[i] = ""
data3_c[i] = 0.0
end
if (temp[4:5] == "01" || temp[4:5] == "11" || temp[4:5] == "21" || temp[4:5] == "31" || temp[4:5] == "41" || temp[4:5] == "51") # Set open price
data3_o[i] = data3_open_1min[i]
else
data3_o[i] = 0.0
end
end
# Reduce To Remove all 0
# Data 1
data1_date = data1_date[data1_date .!= ""]
data1_time = data1_time[data1_time .!= ""]
data1_o = data1_o[data1_o .!= 0]
data1_h = data1_h[data1_h .!= 0]
data1_l = data1_l[data1_l .!= 0]
data1_c = data1_c[data1_c .!= 0]
# Check open is same length as the HLC
if length(data1_o) != length(data1_c)
fill_no = length(data1_c) length(data1_o)
data1_o = push!(data1_o,fill_no)
else
data1_o = data1_o
end
# Data 2
data2_date = data2_date[data2_date .!= ""]
data2_time = data2_time[data2_time .!= ""]
data2_o = data2_o[data2_o .!= 0]
data2_h = data2_h[data2_h .!= 0]
data2_l = data2_l[data2_l .!= 0]
data2_c = data2_c[data2_c .!= 0]
# Check open is same length as the HLC
if length(data2_o) != length(data2_c)
fill_no = length(data2_c) length(data2_o)
data2_o = push!(data2_o,fill_no)
else
data2_o = data2_o
end
# Data 3
data3_date = data3_date[data3_date .!= ""]
data3_time = data3_time[data3_time .!= ""]
data3_o = data3_o[data3_o .!= 0]
data3_h = data3_h[data3_h .!= 0]
data3_l = data3_l[data3_l .!= 0]
data3_c = data3_c[data3_c .!= 0]
# Check open is same length as the HLC
if length(data3_o) != length(data3_c)
fill_no = length(data3_c) length(data3_o)
data3_o = push!(data3_o,fill_no)
else
data3_o = data3_o
end
# Export as .csv
all = hcat(data3_date,data3_time,data3_o,data3_h,data3_l,data3_c)
all = DataFrame(all)
using DataFrames
CSV.write("C:/Users/Andrew.Bannerman/Desktop/Julia/check_data3_ohlc.csv", all;delim=',')

view raw
1min_data.jl
hosted with ❤ by GitHub

https://gist.github.com/flare9x/c60c4d938ea432101b28cf6993e8972f

https://gist.github.com/flare9x/c60c4d938ea432101b28cf6993e8972f

Estimating the Hurst Exponent Using R/S Range

In this post we will estimate the Hurst exponent using the R/S method. The Hurst exponent determines the long range memory of a time series (more below). If a series has no memory ie if each point in time is independent from previous points in time then its said to be more of a random process. Examples of random processes are markov processes, Brownian motion and white noise.

A series which trends has autocorrelation where one point in time is correlated with a lagged version of itself. The autocorrelation of a series may be calculated with the following formula:

auto correlation

In English the top part of the equation (numerator) is the covariance where Yi is the series and Yi+k some lagged version of itself. The bottom part of the equation (denominator) is the variance.

So we simply calculate the covariance of the series and a lagged version of itself and divide by the variance.

We may plot the autocorrelation of the SP500 futures contract close prices.
source: https://www.quandl.com/data/CHRIS/CME_SP1-S-P-500-Futures-Continuous-Contract-1-SP1-Front-Month

Can not validate the accuracy of this data but good enough for illustration purposes.

auto_cor.png

This is far from random and close prices have a high degree of correlation between some lag of itself.

Now we may estimate the Hurst exponent over a SP futures return series using the rescaled analysis method (RS). This is the original method calculated by Harold Edwin Hurst who devised the formula to measure the mean reversion tendency of the Nile river.

We shall estimate H for a 1 bar return series and a 100 bar return series.

The R/S calculation steps are depicted on Wikipedia.

The Hurst exponent is more of an estimate than an absolute calculation. Corresponding H values signal:

H values less than 0.5 = anti persistent behavior, mean reverting
H values of 0.5 = random process
H values greater than 0.5 = persistent behavior, trending

The Hurst exponent procedure is as follows which calculates the R/S range and estimates the Hurst exponent by regression log(R/S) and log(n-lag):

1. Calculate a return series, We will estimate the Hurst exponent on a 1 bar return series. The thesis here is that it will be close to a random process if indeed each bar is independent from the last, or no auto correlation, H=0.5. If we pick a longer return series such as 100 we would expect to see higher H values and higher auto correlation coefficients.
2. Hurst exponent is estimated by regression of a power law of log(R/S) vs log(n-lag). The Slope being the Hurst exponent. For that reason logarithmically spaced block sizes were chosen at lags: [8,16,32,64,128,256,512,1024]
2. Calculate the mean for each block size. In this case 1:8, 2:9 etc.. or i-n+1:i. Do this for each block size.
3. For said block size, subtract the mean from each value in the return series
4. Sum the deviations from the mean
5. Find the maximum and minimum of the sum of deviations from the mean
6. Find R, the range of summed deviations from the mean by subtracting maximum – minimum
7. Calculate the standard deviation of the deviations from the mean
8. Calculate R/S, the rescaled range by dividing R by the stdev of deviations from the mean
9. After rolling through the series calculating the RS value along each block size. We take the mean RS value for each block size. So for each lag (block size) we have one mean value.
10. Perform a regression log2(mean_RS) vs log2(n_lag). The slope is the hurst exponent. The procedure above is detailed below using Julia language:


####################################################################################
# Rolling Hurst
####################################################################################

# initialize outputs
m = Array{Float64}(length(d_es_close),0)
log_n_out = m
out_RS = m

# Set lags (range or specific)
max_lag = 100
n = 200
lags = n:max_lag
# or specific lags #comment out where necessary
lags = [8,16,32,64,128,256,512,1024]
# Specify return series lag
n_lag = 2000

i=1
j=1
c=1
    for j = lags
        n=lags[c]
    # Calculate returns of the series
    #n_lag = lags[c] # set n_lag 1 for 1 day returns
    rets = zeros(d_es_close)
    n_lag = n_lag
    for i in n_lag+1:length(d_es_close)
        rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
    end
    #rets = d_es_close
    # Find mean width of lookback
    mean_out = zeros(rets)
    for i = n:size(rets,1)
                mean_out[i] = mean(rets[i-n+1:i])
            end
    # Subtract deviations from the mean
    dev_mean_out = zeros(mean_out)
    for i = n:size(mean_out,1)
                dev_mean_out[i] = (rets[i] - mean_out[i])
            end
    # Roll sum the deviations from the mean
    sum_out = zeros(dev_mean_out)
    for i = n:size(dev_mean_out,1)
            sum_out[i] = sum(dev_mean_out[i-n+1:i])
        end
    # Find the maximum and minimum of sum of the mean deviations
    max_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                max_out[i] = maximum(sum_out[i-n+1:i])
            end
    min_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                min_out[i] = minimum(sum_out[i-n+1:i])
            end
    # R = Range, max - min
    R_out = zeros(dev_mean_out)
    for i= n:size(dev_mean_out,1)
        R_out[i] = max_out[i] - min_out[i]
    end
    # Rolling standard deviation of (returns) the mean
    stdev_out = zeros(rets)
    for i = n:size(rets,1)
            stdev_out[i] = sqrt(var(dev_mean_out[i-n+1:i]))
        end
    # Calculate rescaled range (Range (R_out) / stdev of returns )
    RS_out = zeros(R_out)
    for i = n:size(R_out,1)
            RS_out[i] = R_out[i] / stdev_out[i]
        end
    # Calculate log_n (n)
    index = fill(n,length(rets))
    log_n = zeros(rets)
    for i =n:size(index,1)
        log_n[i] = log2(index[i])
    end

# out
log_n_out = hcat(log_n_out, log_n)
#log_rs_out = hcat(log_rs_out, log_RS)
out_RS = hcat(out_RS, RS_out) # re-scaled range

c = c+1
end

# access dims of the matrix
# row ,col
#log_rs_out[20,:]

# Calculate expected value for R/S over various n
# Take the rolling mean over each varying n lag at n width
expected_RS = zeros(size(out_RS,1), size(out_RS,2))
n=100
for j = 1:size(out_RS,2)  # loop on column dimension
    for i = n:size(out_RS,1) # loop on row dimension
            expected_RS[i,j] =  mean(out_RS[i-n+1:i,j])
            end
        end

# log 2
expected_log_RS = zeros(size(out_RS,1), size(out_RS,2))
c=1
for j = 1:size(expected_log_RS,2)  # loop on column dimension
    for i = n:size(expected_log_RS,1) # loop on row dimension
            expected_log_RS[i,j] =  log2(expected_RS[i,j])
            end
                        c=c+1
        end

    # Regression slope of log(n) and expected value of log(R/S)
    # x = log(n), y = log(R/S)
    b_slope = zeros(size(out_RS,1))
    A_intercept = zeros(size(out_RS,1))

    for i = n:size(out_RS,1)
        xi = log_n_out[i,:]  # grab the row for varying lags of log_n
        yi = expected_log_RS[i,:] # grab the row for varying lags of r/s value
        # Mean of X (Mx)
        Mx = mean(xi)
        # Mean of Y (My)
        My = mean(yi)
        # Standard deviation of X (sX)
        sX = sqrt(var(xi))
        # Standard Deviation of Y (sY)
        sY = sqrt(var(yi))
        # Correlation of x and y  (r)
        r = cor(xi,yi)
        # slope (b) = r * (sY/sX)
        b = r * (sY/sX)
    # find intercept A = MY - bMX
    A = My - (b*Mx)

# out
b_slope[i] = b
A_intercept[i] = A
end

https://gist.github.com/flare9x/76fe49368f14e652f6de3a130344ea4e#file-rolling_rs_hurst-jl

Using the code above we may plot the hurst exponent for a 1 bar return series.

Hurst (2)

If we have a random process, successive points in time are independent from each other. We see Hurst values fluctuated around the .50 area with H .75 in 2009 during a strong trending period.

We may see the effect of analyzing a longer return stream, lets say 100 bar return series:

H_close.png

We see Hurst values closer to 1 for a 100 bar return series.

Furthermore to see how the Hurst exponent and auto correlation are related, we may calculate a rolling auto correlation:

# Rolling Autocorrelation
# Sliding window (k width)
mean_out = zeros(size(d_es_close,1))
var_out = zeros(size(d_es_close,1))
cov_out = zeros(size(d_es_close,1))
autocor_out = zeros(size(d_es_close,1))
devmean_out = zeros(size(d_es_close,1))
n=1000
k = 1 # set lag
n_lag = 100 # return series lag

# Calculate 1 bar return series
rets = zeros(size(d_es_close,1))
for i in n_lag+1:length(d_es_close)
    rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
end

# lag the series by k
lagged_series = [fill(0,k); rets[1:end-k]]

# Calculate the mean of the rolling  sample
for i = n:size(rets,1)
    mean_out[i] = mean(rets[i-n+1:i])
end

# Calculate deviations from the mean
for i = n:size(rets,1)
devmean_out[i] = rets[i] - mean_out[i]
end

# Calculate rolling variance
for i = n:size(rets,1)
    var_out[i] = var(rets[i-n+1:i])
end

# Calculate rolling covariance
for i = n:size(rets,1)
    if i+k >= size(rets,1)
        break
    end
    cov_out[i] = cov(rets[i-n+1:i],lagged_series[i-n+1:i])
end

# Rolling Autocorrelation
for i =n:size(rets,1)
    autocor_out[i] = cov_out[i] / var_out[i]
end
# Rolling Autocorrelation
# Sliding window (k width)
mean_out = zeros(size(d_es_close,1))
var_out = zeros(size(d_es_close,1))
cov_out = zeros(size(d_es_close,1))
autocor_out = zeros(size(d_es_close,1))
devmean_out = zeros(size(d_es_close,1))
n=1000
k = 1 # set lag
n_lag = 100 # return series lag
# Calculate 1 bar return series
rets = zeros(size(d_es_close,1))
for i in n_lag+1:length(d_es_close)
rets[i] = ((d_es_close[i] / d_es_close[in_lag])1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
end
# lag the series by k
lagged_series = [fill(0,k); rets[1:endk]]
# Calculate the mean of the rolling sample
for i = n:size(rets,1)
mean_out[i] = mean(rets[in+1:i])
end
# Calculate deviations from the mean
for i = n:size(rets,1)
devmean_out[i] = rets[i] mean_out[i]
end
# Calculate rolling variance
for i = n:size(rets,1)
var_out[i] = var(rets[in+1:i])
end
# Calculate rolling covariance
for i = n:size(rets,1)
if i+k >= size(rets,1)
break
end
cov_out[i] = cov(rets[in+1:i],lagged_series[in+1:i])
end
# Rolling Autocorrelation
for i =n:size(rets,1)
autocor_out[i] = cov_out[i] / var_out[i]
end

For a 1 bar return series at lag, k=1

auto_cor_1_bar

We see very little correlation from one point in time to the next. The respective hurst exponent is around 0.40 to 0.55 range.

And the auto correlation for a 100 bar return series at lag , k=1

auto_cor_100_bar.png

We see strong correlation at the 100 bar return series with correlation coefficients greater than .96 with respective hurst exponents closer to 1.0.

So whats the point in all of this?

As depicted above information pertaining to the (non) randomness of series can may be of some benefit when designing models to fit to the data. The autocorrelation / Hurst exponent may be used to compliment existing strategies signalling the type of regime the market currently is under and may add to strategies which better suit current conditions.

As a  note of interest. On a 1 bar daily return series for the SPY ETF at lags 2:20:

H_two_twenty

Values of H are below .50 and in mean reversion territory. Lets shorten the lags one more time 2:10.

H_two_ten

We see significant mean reversion H values. At holding periods sub 20 bars, a mean reversion model would be best suited.

A way I like to think of this is how quickly does something diffuse from the mean? A mean reversion series is almost stuck in the mud, and the prices are centered around a mean. I like to think of it as a compressed coil or in the fractal dimension, pertains to the roughness of a surface where mean reversion is most certainly rough. On the other side a strong trend has a faster rate of diffusion from the mean or in the fractal dimension has a smooth surface. The S&P500 is a blend of this action where on a shorter holding period, there is mean reversion tendency (rough) but on longer time horizon the market displays trending behavior (smooth).

In closing and to recap. We see how the Hurst exponent and the auto correlation of a series are related to each other. Random processes show no dependence from one point in time to a successive point in time, or having no correlation with a subsequent Hurst value of 0.5. On the other side of this, a series showing persistent behavior, a trending series, displayed high correlation coefficients where successive points in time were correlated with each other and subsequent H values closer to 1.

Code may be found on my github.

Julia – Download Free Data Using Alphavantage API

For those of you wanting to get started/familiar with the Julia language. I wrote a small script to import .csv data using the Alphavantage API.

In this example we demonstrate how to download a single .csv file as well as batch download multiple .csv files and export as .csv.

First lets download a single .csv. AAPL 1 minute data using the alphavantage API. You will need to insert your API key.

single__test

And plot the output:

AAPL_1min

If we wish to download multiple .csv files and export to .csv we may:

multiple

Those of you using R may visit: Free Data: Alternative to Yahoo Finance

Full Julia code on my github

# Download Data Using Alphavantage
using HTTP
using StatPlots
res = HTTP.get("https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=AAPL&interval=1min&apikey=your_api_key&datatype=csv")
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:] # subset remove header row
# Rename Columns
colnames = ["Date","Open","High","Low","Close","Volume"]
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = DateTime.(x[:Date],Dates.DateFormat("yyy-mm-dd H:M:S"))
# Sort Dataframe by Date Column
x = sort!(x, cols = [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
for i in 2:length(x)1
x[i] = convert(Array{Float64,1},x[i])
end
x[6] = convert(Array{Int64,1},x[6])
# Plot Data
gr(size=(1500 ,1000))
@df x plot(:Date, [:Close],title = "AAPL Intraday", xlab = "Date", ylab = "Close",colour = [:red],legend = :topleft)
savefig("AAPL_1min.png")
# Multiple symbols
#start_time = Dates.now(Dates.UTC)
t=1
tickers = ["DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM",
"RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB",
"SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY"]
for t in 1:length(tickers)
# Using HTTP package
#sleep(5)
res = HTTP.get(joinpath(
"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:] # subset remove header row
# Rename Columns
colnames = ["Date","Open","High","Low","Close","Volume"]
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = Date.(x[:Date],Dates.DateFormat("yyy-mm-dd"))
# Sort Date Frame By Date
x = sort!(x, cols = [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
i=1
for i in 2:length(x)1
x[i] = convert(Array{Float64,1},x[i])
end
x[6] = convert(Array{Int64,1},x[6])
writetable(joinpath("CSV_OUT_"tickers[t]".csv"), x)
end

view raw
alpha_vantage.jl
hosted with ❤ by GitHub

Speed Check! Juilia Vs R Back Test Script

In a quest for speed enhancements over R. I opted to look at the Julia language. It is a high level programming language touting similar speed to C. I find the syntax not at all that different from Python and R. If you have knowledge of the ‘how’ to solve many problems using those languages, the same logic applies to using Julia only having to learn a slightly new but similar syntax.

I created a very simple back test script. The strategy is to stay long the ES mini when the close price is over the 200 period moving average applied to 30 minute bars (200 * 30 minutes is a 6000 minute moving average). Close the position when the ES crosses under the 6000 minute moving average.

I kept the functions and methods of calculation mostly similar between the two languages as stepped below:

1. Load .csv data, 30 minute ES time
2. Create a date and time column, convert to Date format
3. Create 200 Bar SMA
4. Create Long Signals
5. Lag the Long Signal forward +1 to avoid look ahead bias
6. For loop to calculate bar to bar returns
7. Subset Data to remove 200 missing data on SMA 200 creation
8. Calculate strategy returns and buy and hold returns

I excluded any plotting processes as right now I am plotting within the IDE. For R Im using R studio and for Julia I am using Atom – Juno.

Lets now get to the code showing the backtest script for both R and Julia:

Load Packages R:

require(TTR)
require(lubridate)
require(dplyr)

Load Packages Julia:

using DataFrames using Indicators

Load .txt Data R

df <- read.csv("C:/Users/Andrew.Bannerman/Desktop/Julia/30.min.es.txt", header=TRUE,stringsAsFactors = FALSE)

Load txt Data Julia:

df = readtable("30.min.es.txt", header=true)

Lets check to see how many rows the ES 30 minute data has:

julia> nrow(df) 223571

Next lets make a Date Time Column and convert to Date Time format in R:

# Make date time column
df$Date_Time <- paste(df$Date,df$Time)
df$Date_Time <- mdy_hm(df$Date_Time)

Make Date Time Column Julia (Couldn't find a clean R paste() like Julia function!) and convert to DateTime format:

a = df[:Date] b = df[:Time] c = map(join,zip(a,b), " ") out = String[] temp = String[] for i in 1:length(a) temp = map(join,zip([a[i]],[b[i]]), " ") append!(out,temp) end df[:Date_Time] = out df[:Date_Time] = DateTime.(df[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M")

Next we can create the 200SMA and Calculate the Long Signal, first R:

# Create Sma
df$sma_200 <- SMA(df$Close,200)
# Create long signal
df$Long_Signal  df$sma_200,1,0)
df$Long_Signal <- dplyr::lag(df$Long_Signal,1) # lag forward avoid look ahead bias

And Julia:

# Create simple moving average # using Indicators Close = convert(Array, df[:Close]) sma_200 = sma(Close,n=200) df[:Close_200sma] = sma_200 # Create Signals # Stay long over 200sma # Exit positions below 200sma # use ifelse() function see - #https://en.wikibooks.org/wiki/Introducing_Julia/Controlling_the_flow # remember . in front of the (.>) for vectorization! df[:Signal_Long] = ifelse(df[:Close] .> df[:Close_200sma],1,0) # Lag data +1 forward # Avoid look ahead bias df[:Signal_Long] = [0; df[1:end-1,:Signal_Long]]

Next we can calculate Close to Close Returns. From this we multiply the returns by the strategy signal 1 or 0.

First R:

# For loop for returns
out <- vector()
for (i in 2:nrow(df)){
out[i] = df$Close[i]/df$Close[i-2+1] - 1.0
}
df <- cbind(df,out)
colnames(df)[12] = "Close_Ret"
# Calculate strategy Returns
df$Sig_Rets <- df$Long_Signal * df$Close_Ret
df[is.na(df)] <- 0

And same for Julia:

# Calculate Close to Close Returns Close = df[:Close] x = convert(Array, Close) out = zeros(x) for i in 2:size(Close,1) out[i] = Close[i]/Close[i-2+1] - 1.0 end df[:Close_Rets] = out # Calculate signal returns df[:Signal_Rets] = df[:Signal_Long] .* df[:Close_Rets]

And finally we calculate cumulative returns:

First R:

# Calculate Cumulative Returns
# Buy and hold and Strategy returns
# Subset Data To start after SMA creation
df = df[201:nrow(df),]
df$Signal_cum_ret <- cumprod(1+df$Sig_Rets)-1
df$BH_cum_ret <- cumprod(1+df$Close_Ret)-1

And Julia:

# Calculate Cumulative Returns df = df[201:end,:] df[:Cum_Rets] = cumprod(1+df[1:end, :Signal_Rets])-1 df[:BH_Cum_Rets] = cumprod(1+df[1:end, :Close_Rets])-1g] .* df[:Close_Rets]

Next lets wrap the script in a for loop and run it 100 times and take the mean time ( full code on my github)

The mean time result for a 100 iterations using R:

out_results
Time
1 4.881509
2 4.550159
3 4.762161
4 4.847419
5 5.260049
6 4.715544
7 4.617849
8 4.642842
9 4.933652
10 4.660920

mean(out_results$Time)
[1] 4.582826

And the mean time result for 100 iterations Julia:

julia> final_out
100-element Array{Int64,1}:
 2321
 1974
 2123
    ⋮
 1943
 1933
 2083

julia> print(mean(final_out))
1957.93
julia> 1957.93/1000  # Convert milliseconds to seconds
1.9579300000000002

We see on average that Julia took 1.95 seconds to complete each back test iteration. The Julia script contained two for loops vs 1x for loop in R. I didnt play to R’s vectorized strengths in this regard. But on a almost exact same code to code speed check Julia comes out on top beating R on average by 2.624896 seconds per script iteration.

After 100 iterations R total time for completion:

> sum(out_results$Time)
[1] 458.2826

or 7.6380433333 minutes.

And total Time for Julia:

julia> print(sum(final_out))
195793
julia> 195793 / 1000
195.793

or 3.263216667 minutes.

In this example after running a back test script 100 times and taking the average time + sum time for completion we see Julia is 2.34 times faster than R.
It should be noted that each function is pretty standard to each language. I used Julias DataFrames package versus using straight Arrays. Using Arrays might be faster than working with dataframes. We see no slow down at all using for loops in Julia. My hunch is that removing the for loop in R would get the time closer to Julia but i’m too lazy to check this 🙂 (ok i’m not if we play to the vectored theme of R and remove the slow for loop for calculating returns and replacing with data.table:

require(data.table)
df = data.table(df)
df[, Close_Ret := (Close / shift(Close))-1]

Speed improves with 1x script run taking:

Time difference of 2.614989 secs
)

This is my first Julia script so if spot anywhere I can make the code more efficient drop me a line.

A similar package to TTR for financial indicators is Julias Indicators package.

I like working with Rstudio and a similar IDE for Julia is Juno-Atom

atom_juno

Finally:

Here is the back test results from R / Julia:

plot(df$Signal_cum_ret,type="l",main="R 200SMA Back Test Result")

Rplot431

# Plot
using StatPlots
gr(size=(1500 ,1000))
@df df plot(:Date_Time, [:Cum_Rets :BH_Cum_Rets], title = "SPY Long Over 200sma", xlab = "Date", ylab = "Cumulative Returns",colour = [:lightgreen :pink],legend = :topleft)
savefig("myplot.png")

myplot.png

R Code = https://gist.github.com/flare9x/2d73e73218967699c035d6d70fa4ae8a
Julia Code = https://gist.github.com/flare9x/7d1d41856ffbe3106983d15885d8a0cc

R – Quantifying Trend Days

In this post I will be using R and data.table to extract all trend up / down days. The following method was used to quantify a trend day:

1. Trend up = Close price closes within 25% of the days high
2. Trend down = Close prices closes within 25% of the days low
3. Exclusive of gaps, if open is above yesterdays high or low exclude
4. Daily return must be over / below .75 / -.75%

Other methods come to mind:
1. Select those stocks that close within 25% of low / high AND when daily gain / loss is above or below 1%.
2. Linear regression, low r2 see blog post here: http://quantgorilla.com/blog/quantifying-trend-days/

My shell of a code can be altered to include these methods but for now lets stick to closing within top / bottom 25% of days range excluding gap ups, the code to do this: (check github code bottom of page, WP is having a hard time displaying the code snippet correctly)

# Es Up / Down Trend Isolation
# 2.26.2018
# Andrew Bannerman 

# Load Daily ES Data
es_d <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.day.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_d$Date <- mdy(es_d$Date)  # Change to date format 

# Find relative position of the closing price  (Close - Low) / (High - Low)
es_d$pos <- (es_d$Close - es_d$Low) / (es_d$High - es_d$Low)

# Calculate returns
es_d$rets <- ((es_d$Close / lag(es_d$Close,1)) - 1) * 100

# Find gap up / down days 
es_d$gap_up  lag(es_d$High,1),1,0)
es_d$gap_dn <- ifelse(es_d$Open < lag(es_d$Low,1),1,0)

# Find trend up / down days 
es_d$trend_up = .75,1,0)
es_d$trend_dn <- ifelse(es_d$pos <= .25,1,0)

# Subset all trend up / down days 
# Trend day definition: close within 25 % of day high / low and close  over .75% to exclude quiet days 
# Exclude gap up / down days where open closes over high , low
trend_up = .75 & es_d$gap_up == 0, ]
trend_dn <- es_d[es_d$trend_dn == 1 & es_d$rets <= -.75 & es_d$gap_dn == 0, ]

# Count total trend up / down days
total_up <- nrow(trend_up)
total_dn <- nrow(trend_dn)

# Percentage trend days of sample
total <- nrow(es_d)
perc.up <- total_up / total
perc.dn <- total_dn / total

# Save Dates in order to use for susetting 5 min bars
trend_up_dates <- trend_up$Date
trend_dn_dates <- trend_dn$Date

There is a total of  5167 days in the sample. Of those days approx 11% are up trend days and 10% down trend days.  Next we may extract each trend day and save the 1 minute intraday plot:

# Load 1 minute bars
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new <- mdy_hm(es_1$Date_new) # Change date to date format
es_1$Date <- mdy(es_1$Date)
# Save up trend plots
# initialize list
t_UP <- list()
i=1
for (i in 1:length(trend_up_dates)) {
tryCatch({
ptm0 <- proc.time()
temp <- subset(es_1 , Date == trend_up_dates[i]) # Conditionl subset == grab trend day on intraday level
temp$range <- temp$High – temp$Low
t_UP[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
temp <- temp[3:10]
head(temp)
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
head(df)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Up Stats
trend_up_out <- do.call(rbind,t_UP)
head(trend_up_out)
t_up_mean_r <- mean(trend_up_out$range)

# Save intraday down trend plots
# initialize list
t_DN <- list()
i=1
for (i in 1:length(trend_dn_dates)) {
tryCatch({
ptm0 <- proc.time()
Sys.sleep(0.1)
temp <- subset(es_1 , Date == trend_dn_dates[i])
temp$range <- temp$High – temp$Low
t_DN[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1))) #replace – with ""
temp$chg temp$Open, "up", "dn")
temp <- temp[3:10]
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save down trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_down_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Down Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Down Stats
trend_dn_out <- do.call(rbind,t_DN)
t_dn_mean_r <- mean(trend_dn_out$range)

With example output:

20180126

20020611

Next we can plot daily bars, 5 days prior to each trend day:

##############################################################################
# Extract 5 days prior to trend up / down day
##############################################################################
dates_df <- data.frame(Date = es_d$Date)
dates_df$wkdays <- weekdays(as.Date(dates_df$Date)) # extract day of week

# Grab trend up start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# up trend subsetting
up_list <- list()
for (i in 1:length(trend_up_dates)) {
  up_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_up_dates[i],]
}

# Plot 5 days prior to uptrend
t_UP_Prior <- list()
i=1
for (i in 1:length(up_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- up_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Grab trend down start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# down trend subsetting
dn_list <- list()
for (i in 1:length(trend_dn_dates)) {
  dn_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_dn_dates[i],]
}

# Plot 5 days prior to down trend
i=1
for (i in 1:length(dn_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- dn_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_dn_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

With output:

19990204

20110914

There are two ways to attempt to predict a trend day with a certain degree of probability:

1. Use patterns of daily bars
2. Use intraday patterns within the trend day

The daily bars plotted above allow to view daily bars prior to a trend up / down day. This code can be extended to look at specific features which I will allow the reader to work at which may include:

1. Prior to trend day, was previous day down or up?
2. Where did the price close relative to the days range (high – low / high – low)
3. What did the 3 day pattern look like prior to a trend day?
– 3 tight closes?
– middle bar low and close lower than the bars each side?

etc etc…

Now we move on to an attempt to quantify a trend day within the trend day itself. This involves a simple count of the number of new 1 minute highs made by a certain time frame. In this example I chose the number of 1 minute highs or lows prior to 10am Central time.

The code that achieves this:

##########################################################################################
# Intraday Trend up / down
##########################################################################################
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new  <- mdy_hm(es_1$Date_new)  # Change date to date format
es_1$Date <- mdy(es_1$Date)
# make data.table
# data.table for speed
es_1  shift(roll_max), 1,0),by = Date] # place 1 when close makes new day high
# rolling day low
es_1[, roll_min := cummin(Low), by = Date] # rolling day high
es_1[, close_below_low := ifelse(Close = "08:31" & Time = "08:31" & Time  0)
# plot distriutions
ggplot(data=plot.df, aes(plot.df$count_max)) +
  geom_histogram(binwidth = .5,fill="darkgreen",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_max), max(plot.df$count_max),name="No. New Highs Prior 10am Central")+
  scale_y_continuous(breaks = seq(0, 600000,100000))+
  ggtitle("Number of new 1 minute highs prior to 10am Central")+
  ylab("Total New Highs")

# Plot new lows
ggplot(data=plot.df, aes(plot.df$count_min)) +
  geom_histogram(binwidth = .5,fill="red",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_min), max(plot.df$count_min),name="No. New Lows Prior 10am Central")+
  #scale_y_continuous(breaks = seq(0, 35500,2000))+
  ggtitle("Number of new 1 minute lows prior to 10am Central")+
  ylab("Total New Low")
graphics.off()

We may plot the distribution of new highs / lows prior to 10am central time:

Rplot305

Rplot306

We will back test less frequent new highs / new lows values prior to 10am central.

Rules of back test:

1. Go long when number of new highs prior to 10am central is >= 8
2. Go short when number of new highs prior to 10am central is >= 8
3. exit at the end of the trading day

no stops, no slippage and commissions.

The code to do this:

# Back test
# Trading Signal long and short
es_1[, sig_buy := ifelse(before_10_UCT_up >= 8, 1,0)] # long signal
es_1[, sig_exit :=  Time = 8, -1,0)] # long signal
es_1[, sig_s_exit :=  Time < "15:15"] # exit time
#es_1[is.na(es_1)] <- 0
es_1[, sig_end_l := ifelse(sig_exit == FALSE,0,NA)] # mark end of trade retain NA
es_1[, sig_end_s := ifelse(sig_s_exit == FALSE,0,NA)] # mark end of trade retain NA
# Combine NA + signal for long trades
es_1$z_l <- ifelse(es_1$sig_buy == 1,
               rowSums(es_1[, c("sig_buy", "sig_end_l")], na.rm=TRUE),
               rowSums(es_1[, c("sig_buy", "sig_end_l")]))
es_1$z_l[1] <- 0
# Combine NA + signal for short trades
es_1$z_s <- ifelse(es_1$sig_short == -1,
                   rowSums(es_1[, c("sig_short", "sig_end_s")], na.rm=TRUE),
                   rowSums(es_1[, c("sig_short", "sig_end_s")]))
es_1$z_s[1] <- 0
# Forward fill 1's to end of the trade using package zoo, function na.locf
es_1[, final.signal_long := na.locf(z_l)] # long trades
es_1[, final.signal_short := na.locf(z_s)] # long trades
# lag signal by one forward day to signal entry next day (Trade at market open)
es_1$final.signal_long <- lag(es_1$final.signal_long,1) # Note k=1 implies a move *forward*
es_1$final.signal_short <- lag(es_1$final.signal_short,1) # Note k=1 implies a move *forward*
# Close to close returns
es_1[, one_min_rets := (Close / shift(Close)) - 1]
# Calculate strategy returns
es_1[, sig_long_rets := final.signal_long * one_min_rets]
es_1[, sig_short_rets := final.signal_short * one_min_rets]
es_1$sig_long_rets[1] <- 0
es_1$sig_short_rets[1] <- 0
# Combine equity curves
es_1[, combined_rets := sig_long_rets + sig_short_rets]
# Cumulative returns
es_1[, cum_rets_l := cumprod(1 + sig_long_rets) - 1]
es_1[, cum_rets_s := cumprod(1 + sig_short_rets) - 1]
es_1[, cum_rets_comb := cumprod(1 + combined_rets) - 1]
plot(es_1$Date_new,es_1$cum_rets_l,type="l")
graphics.off()
# Plot equity curves
line_plot_df <- data.frame(es_1$cum_rets_l,es_1$cum_rets_s,es_1$cum_rets_comb)
line.plot.df <- melt(line_plot_df)
date <- rep(es_1$Date,3)
line.plot.df <- cbind(line.plot.df,date)
head(line.plot.df)
ggplot(data=line.plot.df, aes(x=date, y=value, group = variable, colour = variable)) +
  geom_line()

With the final equity curves:

Rplot307

No optimization for times, number of new highs etc… I arbitrarily chose each.

To conclude, the idea of predicting trend days prior and within the trend day itself is an interesting idea. This is a very simple example which may be extended.

Full code can be found on my github:

# Es Up / Down Trend Isolation
# 2.26.2018
# Andrew Bannerman
###############################################################################################
# Steps
# 1. Define a trend day when the close price was within 25% of the high / low
# 2. Save all dates
# 3. Subset 5 minutes bars, save each day into a list
# 4. Output price plots
# 5. Study characteristics
# 6. Add moving averages, TICK etc.. anything else to help cast a +2D view on the trend data
###############################################################################################
# Packages
require(lubridate)
require(dplyr)
require(ggplot2)
require(xts)
require(quantmod)
# Load Daily ES Data
es_d <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.day.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_d$Date <- mdy(es_d$Date) # Change to date format
# Find relative position of the closing price (Close – Low) / (High – Low)
es_d$pos <- (es_d$Close es_d$Low) / (es_d$High es_d$Low)
# Calculate returns
es_d$rets <- ((es_d$Close / lag(es_d$Close,1)) 1) * 100
# Find gap up / down days
es_d$gap_up <- ifelse(es_d$Open > lag(es_d$High,1),1,0)
es_d$gap_dn <- ifelse(es_d$Open < lag(es_d$Low,1),1,0)
# Find trend up / down days
es_d$trend_up <- ifelse(es_d$pos >= .75,1,0)
es_d$trend_dn <- ifelse(es_d$pos <= .25,1,0)
# Subset all trend up / down days
# Trend day definition: close within 25 % of day high / low and close over .75% to exclude quiet days
# Exclude gap up / down days where open closes over high , low
trend_up <- es_d[es_d$trend_up == 1 & es_d$rets >= .75 & es_d$gap_up == 0, ]
trend_dn <- es_d[es_d$trend_dn == 1 & es_d$rets <= .75 & es_d$gap_dn == 0, ]
# Count total trend up / down days
total_up <- nrow(trend_up)
total_dn <- nrow(trend_dn)
# Save Dates in order to use for susetting 5 min bars
trend_up_dates <- trend_up$Date
trend_dn_dates <- trend_dn$Date
###############################################################################
# Load 1 minute bars
###############################################################################
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new <- mdy_hm(es_1$Date_new) # Change date to date format
es_1$Date <- mdy(es_1$Date)
# Save up trend plots
# initialize list
t_UP <- list()
i=1
for (i in 1:length(trend_up_dates)) {
tryCatch({
ptm0 <- proc.time()
temp <- subset(es_1 , Date == trend_up_dates[i]) # Conditionl subset == grab trend day on intraday level
temp$range <- temp$High temp$Low
t_UP[[i]] <- temp
name_date <- as.numeric(gsub("", "", head(temp$Date,1)))
temp <- temp[3:10]
head(temp)
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
head(df)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names
# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}
# General Trend Up Stats
trend_up_out <- do.call(rbind,t_UP)
head(trend_up_out)
t_up_mean_r <- mean(trend_up_out$range)
# Save intraday down trend plots
# initialize list
t_DN <- list()
i=1
for (i in 1:length(trend_dn_dates)) {
tryCatch({
ptm0 <- proc.time()
Sys.sleep(0.1)
temp <- subset(es_1 , Date == trend_dn_dates[i])
temp$range <- temp$High temp$Low
t_DN[[i]] <- temp
name_date <- as.numeric(gsub("", "", head(temp$Date,1))) #replace – with ""
temp$chg <- ifelse(temp$Close > temp$Open, "up", "dn")
temp <- temp[3:10]
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names
# Save down trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_down_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Down Day"))
dev.off()
ptm1=proc.time() ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}
# General Trend Down Stats
trend_dn_out <- do.call(rbind,t_DN)
t_dn_mean_r <- mean(trend_dn_out$range)
##############################################################################
# Extract 5 days prior to trend up / down day
##############################################################################
dates_df <- data.frame(Date = es_d$Date)
dates_df$wkdays <- weekdays(as.Date(dates_df$Date)) # extract day of week
# Grab trend up start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_up_dates, paste(es_d$Date 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_up_dates, paste(es_d$Date 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_up_dates, paste(es_d$Date 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)
# up trend subsetting
up_list <- list()
for (i in 1:length(trend_up_dates)) {
up_list[[i]] <- es_d[es_d$Date >= t_up_all_prior[i] & es_d$Date <= trend_up_dates[i],]
}
# Plot 5 days prior to uptrend
t_UP_Prior <- list()
i=1
for (i in 1:length(up_list)) {
tryCatch({
ptm0 <- proc.time()
temp <- up_list[[i]] # Conditionl subset == grab trend day on intraday level
name_date <- as.numeric(gsub("", "", head(temp$Date,1)))
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names
# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_prior"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}
# Grab trend down start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)
# down trend subsetting
dn_list <- list()
for (i in 1:length(trend_dn_dates)) {
dn_list[[i]] <- es_d[es_d$Date >= t_up_all_prior[i] & es_d$Date <= trend_dn_dates[i],]
}
# Plot 5 days prior to down trend
i=1
for (i in 1:length(dn_list)) {
tryCatch({
ptm0 <- proc.time()
temp <- dn_list[[i]] # Conditionl subset == grab trend day on intraday level
name_date <- as.numeric(gsub("", "", head(temp$Date,1)))
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names
# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_dn_prior"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}
str(es_1$Time)
##########################################################################################
# Intraday Trend up / down
##########################################################################################
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new <- mdy_hm(es_1$Date_new) # Change date to date format
es_1$Date <- mdy(es_1$Date)
# make data.table
# data.table for speed
es_1 <- data.table(es_1)
# rolling day high
es_1[, roll_max := cummax(High), by = Date] # rolling day high
es_1[, close_over_high := ifelse(Close > shift(roll_max), 1,0),by = Date] # place 1 when close makes new day high
# rolling day low
es_1[, roll_min := cummin(Low), by = Date] # rolling day high
es_1[, close_below_low := ifelse(Close < shift(roll_min), 1,0),by = Date] # place 1 when close makes new day low
for(j in seq_along(es_1)){
set(es_1, i = which(is.na(es_1[[j]]) & is.numeric(es_1[[j]])), j = j, value = 0)
}
# count numbert 1 minute highs before 10am central
es_1[, before_10_UCT_up := ifelse(Time >= "08:31" & Time <= "10:00", cumsum(close_over_high),0),by = Date]
# count numbert 1 minute highs before 10am central
es_1[, before_10_UCT_dn := ifelse(Time >= "08:31" & Time <= "10:00", cumsum(close_below_low),0),by = Date]
for(j in seq_along(es_1)){
set(es_1, i = which(is.na(es_1[[j]]) & is.numeric(es_1[[j]])), j = j, value = 0)
}
# find maximum new highs / low per day
es_1[, count_max := max(before_10_UCT_up), by = Date]
es_1[, count_min := max(before_10_UCT_dn), by = Date]
# Plot distribution of number of new highs / lows prior to 10am
options(scipen=999)
plot.df <- subset(es_1, count_max > 0 | count_min > 0)
ggplot(data=plot.df, aes(plot.df$count_max)) +
geom_histogram(binwidth = .5,fill="darkgreen",alpha = 0.5,col="grey") +
scale_x_continuous(breaks = seq(plot.df$count_max), max(plot.df$count_max),name="No. New Highs Prior 10am Central")+
scale_y_continuous(breaks = seq(0, 600000,100000))+
ggtitle("Number of new 1 minute highs prior to 10am Central")+
ylab("Total New Highs")
# Plot new lows
ggplot(data=plot.df, aes(plot.df$count_min)) +
geom_histogram(binwidth = .5,fill="red",alpha = 0.5,col="grey") +
scale_x_continuous(breaks = seq(plot.df$count_min), max(plot.df$count_min),name="No. New Lows Prior 10am Central")+
#scale_y_continuous(breaks = seq(0, 35500,2000))+
ggtitle("Number of new 1 minute lows prior to 10am Central")+
ylab("Total New Low")
graphics.off()
# Back test
# Trading Signal long and short
es_1[, sig_buy := ifelse(before_10_UCT_up >= 8, 1,0)] # long signal
es_1[, sig_exit := Time < "15:15"] # exit time
es_1[, sig_short := ifelse(before_10_UCT_dn >= 8, 1,0)] # long signal
es_1[, sig_s_exit := Time < "15:15"] # exit time
#es_1[is.na(es_1)] <- 0
es_1[, sig_end_l := ifelse(sig_exit == FALSE,0,NA)] # mark end of trade retain NA
es_1[, sig_end_s := ifelse(sig_s_exit == FALSE,0,NA)] # mark end of trade retain NA
# Combine NA + signal for long trades
es_1$z_l <- ifelse(es_1$sig_buy == 1,
rowSums(es_1[, c("sig_buy", "sig_end_l")], na.rm=TRUE),
rowSums(es_1[, c("sig_buy", "sig_end_l")]))
es_1$z_l[1] <- 0
# Combine NA + signal for short trades
es_1$z_s <- ifelse(es_1$sig_short == 1,
rowSums(es_1[, c("sig_short", "sig_end_s")], na.rm=TRUE),
rowSums(es_1[, c("sig_short", "sig_end_s")]))
es_1$z_s[1] <- 0
# Forward fill 1's to end of the trade using package zoo, function na.locf
es_1[, final.signal_long := na.locf(z_l)] # long trades
es_1[, final.signal_short := na.locf(z_s)] # long trades
# lag signal by one forward day to signal entry next day (Trade at market open)
es_1$final.signal_long <- lag(es_1$final.signal_long,1) # Note k=1 implies a move *forward*
es_1$final.signal_short <- lag(es_1$final.signal_short,1) # Note k=1 implies a move *forward*
# Close to close returns
es_1[, one_min_rets := (Close / shift(Close)) 1]
# Calculate strategy returns
es_1[, sig_long_rets := final.signal_long * one_min_rets]
es_1[, sig_short_rets := final.signal_short * one_min_rets]
es_1$sig_long_rets[1] <- 0
es_1$sig_short_rets[1] <- 0
# Combine equity curves
es_1[, combined_rets := sig_long_rets + sig_short_rets]
# Cumulative returns
es_1[, cum_rets_l := cumprod(1 + sig_long_rets) 1]
es_1[, cum_rets_s := cumprod(1 + sig_short_rets) 1]
es_1[, cum_rets_comb := cumprod(1 + combined_rets) 1]
plot(es_1$Date_new,es_1$cum_rets_l,type="l")
graphics.off()
# Plot equity curves
line_plot_df <- data.frame(es_1$cum_rets_l,es_1$cum_rets_s,es_1$cum_rets_comb)
line.plot.df <- melt(line_plot_df)
date <- rep(es_1$Date,3)
line.plot.df <- cbind(line.plot.df,date)
head(line.plot.df)
ggplot(data=line.plot.df, aes(x=date, y=value, group = variable, colour = variable)) +
geom_line()

view raw
es.trend.days.R
hosted with ❤ by GitHub

Thanks for reading!

Stress Testing an Intraday Strategy Through Monte Carlo Methods

This is an intraday ES strategy that I was testing for a client. The client was not interested in it due to the low frequency of trades, hence I may post it for others to view. It shows how a strategy was proved through stress testing and looking for optimal conditions to apply the strategy. The steps for strategy development are below:

Strategy Development Example – Andrew Bannerman – 1.28.2018
Bannerman1985@gmail.com

Introduction:
Follow a scientific process in which a strategy is developed and stress tested before live incubation /
trading.

Tools used: R Statistical Programming Language

Strategy Development Procedure:
1. General view, general statistics, tail risk
2. Initial Back Testing
3. Walk Forward Analysis (Cross validation) , Rolling and Fixed.
4. Parameter Sensitivity Analysis (Random adjustments 0 to 50% parameter change, n times)
5. Draw down expectation (sample without replacement net trade result, n times)
6. Re-sample original time series (Maximum Entropy Bootstrapping, n times) and run strategy over new
series.
7. Market Random Test (Test if strategy beats buying randomly)
8. Noise Test (Adding random fixed % to open, high, low, close, simulate back test n times)
9. Strategy Seasonality
10. Layering non-correlated strategies

Please see attached .pdf.

Download – Intraday Trading Strategy Development Using R – pdf

Let me know if you have any questions!