Thursday, 6 September 2018

SQL Server Paging - The Holy Grail

DECLARE @startRow INT
SET @startrow = 50
;WITH cols
AS
(
    SELECT table_name, column_name,
        ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq,
        ROW_NUMBER() OVER(ORDER BY table_name DESC, column_name desc) AS totrows
    FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name, totrows + seq -1 as TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

More info on this at SqlServerCentral

Interpreting the query plan and speeding it up

  • Bookmark Lookup - An index was used but as the index does not 'cover' the query the query processor must use the ROWID from the index to 'lookup' the actual data row. Unless the index can be alter to ‘cover’ the columns of the query this cannot be optimised further.
  • Clustered Index Scan - Similar to a table scan (its just that the table has a clustered index) and similarly undesirable, particularly for large tables. If you see these then the columns in the query are either not indexed or the distribution statistics have led the query optimiser to decide the scan is more/ as efficient. Restructuring the query can sometimes eliminate these operations; or add an index.
  • Clustered Index Seek - The clustered index is used to find matching rows. This is optimal behaviour.
  • Compute Scalar - As it says – a scalar is being computed. Unless this is not really needed it can't be optimised. It its not needed, remove it!
  • Constant Scan - The query requires a constant value in some (or all) rows
  • Hash Match - The query processor builds a hash table for each row being processed. As subsequent rows are processed, the hash is computed and compared to the hash table for matches. Queries with DISTINCT, UNION, or aggregates often require a hash table to remove duplicates. If such operations are required there is little you can do to optimise.
  • Index Scan - The non-clustered index is being used to locate a large number of data rows which must then be scanned. Unless you can restructure the query to return fewer rows this cannot be optimised.
  • Index Seek - A non-clustered index is being used and only a small part of the index is required. You've chosen your indexes well as this is one of the most efficient operations.
  • Index Spool - As rows are scanned tempdb is used to store a 'spool' table that can be used rather than re-reading input rows. This is an internal optimization for complex queries and cannot be changed.
  • Merge Join - Occurs when the two inputs contain sorted data and the query processor can merge them together, or when two or more indexes are used to query a table. Very efficient but to occur the joins must have access to sorted data – achieved by indexing on join or ORDER BY columns.
  • Nested Loop - In this join, one table is chosen as the inner and scanned for each row of the outer. This is only efficient for small numbers of rows. Restructuring the query can remove these joins. Note that if available memory is low these are more likely to occur.
  • Remote Query - As it says – a query occurring on a remote data source so you either need to optimise on the remote data source or move the necessary data to your local SQLServer.
  • Sort - Is expensive but tends also to be necessary.
  • Table Scan - Is performed when the table has no clustered index (its a heap). Probably the least desirable operation to see chosen as the query processor will read each row into memory in order to decide whether it should be returned by the query. For tables of more than a few hundred rows you should add an appropriate index. Some of the gains in performance come from writing well structured, efficient queries that return the minimum amount of information needed, but most gains in performance are made by choosing good indexes. Hence the basic recommendation is to ensure that there are indexes on all tables and that the statistics for those indexes are up to date. Not coincidentally, these are the two main factors that influence the query optimiser in making decisions about the execution plan. At this stage I would recommend you run a few multi-table queries from the Northwind database, or your own, in Query Analyzer and assess whether the execution plans generated are optimal based on the above information.
  • Optimising queries - As just stated, indexes and rewriting queries are the most common options for optimisation but you can also use query hints though care should be taken to ensure you’re not forcing the server to choose a sub-optimal execution plan. Generally query hints are not recommended – SQLServer knows best! Considering indexes: in order for SQLServer to make use of an index, the first indexed column must be included in the WHERE clause as part of a qualification, or it will not be considered. However, all columns in a WHERE clause do not need to be included in the index for it to be chosen. There can only be one clustered index per table and, as a result, it should be chosen carefully. By default the primary key is the clustered index but often this is not the best choice unless it is the only index that will exist on the table. Clustered indexes are best used for range queries, e.g. dates or numerical ranges. Non-clustered indexes work best on large tables when very few rows are being returned. The query optimiser will often choose a non-clustered index that is highly selective when the index columns are included in the join statements. When this occurs SQLServer can find the row needed in the index very quickly and get to the actual data quickly as well. If not selective the process is much less efficient and the index may not be chosen. An exception is if the non-clustered index is a covering index. Foreign keys are usually good choices for non-clustered indexes as usually these columns are used in joins. Regardless of the type of index there are a few general guidelines for creating indexes:
    1. Index those columns often used by joins
    2. Be sure that the first column of the index is the column specified in most joins.
    3. Analyze your queries and ensure your indexes are being used; if not they are an unnecessary overhead and/ or indexes could be better placed
    4. CONSIDER FILEGROUPS: placing non clustered indexes on a different filegroup on a distinct physical device than the data itself can lead to performance gains

How to compare one query against another to see which is best

Highlight both queries, and press CTRL-L to view the query plans of both.

The two percentages will total 100%. In this case, the first query at 98% sucks, and the 2% one wins by a mile. The lower % the cost, the better.

 

 What is the true cost of a query?

Click on the FIRST box of the query plan, and hover your mouse over it. A pop up window will appear. The cost of the query is the "Estimated Subtree Cost".

If Estimated Subtree Cost is < 20, its OK. If it's > 20, you really need to change it, or add in an index, etc.

 

 My query is slow, how to easily speed it up

I used to get this a lot. And it either boils down to an "OR" clause, or a non-sargable query.

 

Get rid of 'OR' clauses

If you query contains an OR clause in the query, Immediately split the query into two queries, one query using the left side of the or clause, and the other the right side. Then join the results with a UNION ALL. Don't use UNION as that is slow due to it having to remove duplicates between the two result sets.

Example:

Slow query
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FP.forename = 'Simon'
        OR FC.name LIKE 'acme%';

Faster query
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FP.forename = 'Simon'
UNION ALL
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FC.name LIKE 'acme%';

Even a simple case as those two above the query plan comparison is 55% vs 45%.

 

Non-Sargable queries

The most common thing that will make a query non-sargable is to include a field inside a function in the where clause:

SELECT ... FROM ... WHERE Year(myDate) = 2008

The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use:
 
WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009'

Some other examples:
 
Bad:  Select ... WHERE ISNULL(FullName,'Simon Hughes') = 'Simon Hughes'
Fixed: Select ... WHERE ((FullName = 'Simon Hughes') OR (FullName IS NULL))

Bad:  Select ... WHERE SUBSTRING(Forename, 3) = 'Sim'
Fixed: Select ... WHERE Forename Like 'Sim%'

Bad:  Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm, -30, GetDate())

Wednesday, 18 July 2018

MSMQ Best Practices

Performance recommendations

  1. Avoid unnecessary database lookups before deciding if a message can be dropped.
  2. Prioritization should be given in code for dropping the message as quickly and efficiently as possible.
  3. If database lookups are required in deciding if a message can be dropped, can these be cached (with a refresh timeout)?
  4. If you are dropping messages, is there a better way to only receive the correct messages?
  5. If you are subscribed to an event processor publisher, does that publisher provide any kind of subscription filtering? Filtering can be achieved via database event subscription tables, (a list of subscribers with their input queues, and a list of events each input queue is interested in).

Separate event types

Publishers should translate event types into separate System.Types, with hierarchies to support grouping.
For example, with some kind of  delivery event data, these could be group interfaces such as ProofOfDeliveryGroup, ProofOfCollectionGroup, etc.
Inherited from ProofOfDeliveryGroup would be the specific interface types such as ManualProofOfDelivery, DeliveredByCourierToSite, etc.
This allows subscribers to use NServiceBus idioms for subscribing to only the messages they need and removes the need for specialised publication filtering as seen in the above step 5 publisher.
NServiceBus recommends interfaces for events because you can effectively do multiple inheritance, which isn’t possible for classes and allows for handling “groups” of events, as well as gentle evolution of events.

Separate the handlers

Favour multiple event handlers over a small number of more complex handlers.
Each handler should do one thing and be named after that one thing. The order of operation of the event handlers can be specified (see below) and this will start to read like a pseudo specification of what happens when an event of that type arrives.
As with any unit of code, message handlers should aim to follow 'Single Responsibility Principle' and only have one reason to change, and so one should favour multiple small handlers over fewer large ones. However, only if there is no implicit coupling (e.g. through bespoke handler ordering), in which case look for other ways to accomplish this.

General recommendations

  1. There should only one place to subscribe to any given event, though publishers can be scaled out if necessary, as long as each instance shares the same subscription storage database.
  2. To avoid coupling, either:
    1. Publish a new message when a message is handled (successfully or otherwise), so another handler (potentially in a different endpoint) can handle it in a new transaction.
    2. Use NServiceBus Sagas.
  3. Using separate messages and/or sagas allows implementation of the “no business reason to change” philosophy, where all failures are technical failures, and attempts can be made to overcome then with automatic retries etc, using separate, chained transactions. This is especially helpful when dealing with resources such as email or the file system that do not participate in the ambient distributed transaction while a message is being handled.
  4. It is possible to perform validation/mutation of messages before any handlers are invoked (Message Mutators). Again, prefer this over handler ordering.
    1. Mutators are not automatically registered using dependency injection.
    2. Mutators are registered using:
      endpointConfiguration.RegisterMessageMutator(new MyIncomingMessageMutator());

      endpointConfiguration.RegisterMessageMutator(new MyOutgoingTransportMessageMutator());

Handler ordering

NSB documentation
Multiple classes may implement IHandleMessages for the same message. In this scenario, all handlers will execute in the same transaction scope. These handlers can be invoked in any order but the order of execution can be specified in code.
The way NServiceBus works is:
  1. Find the list of possible handlers for a message.
  2. If an order has been specified for any of those handlers, move them to the start of the list.
  3. Execute the handlers.
The remaining handlers (i.e. ones not specified in the ordering) are executed in a non-deterministic order.

Specifying one handler to run first

public class SpecifyMessageHandlerOrder : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.SpecifyFirst<handlerb>();
    }
}

Specifying multiple handlers to run in order

public class SpecifyMessageHandlerOrder : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.Specify(
            typeof(HandlerB),
            typeof(HandlerA),
            typeof(HandlerC));
    }
}

Example

public class OrderReceivedEventHandlerOrdering : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.Specify(
            typeof(ValidateOrderEventHandler),
            typeof(CheckForDuplicateOrderEventHandler),
            typeof(PlaceOrderEventHandler),
            typeof(SendOrderEmailConfirmationEventHandler));
    }
}

With the configuration API

This is typically done within the EndpointConfig class.
configuration.LoadMessageHandlers(
    First<HandlerB>
    .Then<HandlerA>()
    .AndThen<HandlerC>());

Preferred method

Using the interface ISpecifyMessageHandlerOrdering is the preferred method, as these can be placed within the area of concern. This makes it easier to maintain as you don't have to go searching for the ordering of the handlers.

Dropping messages

If you are not going to process all messages, but decide to drop/filter some out, have a separate handler for these and make this the first handler:
public class SpecifyMessageHandlerOrder : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.Specify(
            typeof(MessageFiltering), // FilterMessage, IgnoreInapplicableEvents, IgnoreIfNotLatestEvent, etc
            typeof(HandleSomeMessage);
    }
}
You may wish to have several message filters, all with their own criteria: order.Specify(First.Then()); etc.
The takeaway is to be as efficient as possible in dropping messages if it's of no interest.

Event processor filtering

However, it’s generally preferable that publishers simply publish all events that are subscribed to. This leaves the subscribers in charge of what messages they receive and which of those to ignore and how. Filtering of published messages increases coupling between publisher and subscriber and should only be used as a last resort when subscribers have not been implemented/deployed in a scalable way.

However, if your events come from CDC (Change data capture) and you want to take advantage of event filtering, you need something similar to the below:

CREATE TABLE EventProcessor
(
 Id INT NOT NULL IDENTITY(1,1),
 [Name] VARCHAR(200) NOT NULL,
 [Description] VARCHAR(512) NULL,
 [EndpointAddress] VARCHAR(512) NULL, -- Name of msmq queue
 [Enabled] BIT NOT NULL,
 CONSTRAINT [PK_EventProcessor] PRIMARY KEY CLUSTERED (Id)
);
CREATE TABLE EventProcessorEventFilter
(
 Id INT NOT NULL IDENTITY(1,1),
 EventProcessorId INT NOT NULL,
 WantedEventId INT NOT NULL,
 CONSTRAINT [PK_EventProcessorEventFilter] PRIMARY KEY CLUSTERED (Id)
);
GO
CREATE UNIQUE INDEX [IX_EventProcessorEventFilter] ON EventProcessorEventFilter (EventProcessorId, WantedEventId); 
 
ALTER TABLE EventProcessorEventFilter ADD CONSTRAINT [FK_EventProcessorEventFilter__EventProcessor] FOREIGN KEY (EventProcessorId) REFERENCES EventProcessor (Id); 
 
ALTER TABLE EventProcessorEventFilter ADD CONSTRAINT [FK_EventProcessorEventFilter__SomeEventEnumTable] FOREIGN KEY (WantedEventId) REFERENCES SomeEventEnumTable (Id);
GO

Tuesday, 22 May 2018

AI Reading comprehension scores

In January 2018 there was a great many blog posts about AI surpassing human reading comprehension scores.
Humans comprehension of a piece of text is 82.304% (Human Performance Stanford University, Rajpurkar et al. '16)

As of March 19 2019, AI is now hitting scores of 83.877, by QANet (ensemble) Google Brain & CMU.

Plotting the scores on a graph:
 

Extrapolating the scores, we should see an AI hit 100% sometime in 2020.
Spreadsheet I created for the graph above: AI-Comprehension-score.xlsx

Original data source: https://rajpurkar.github.io/SQuAD-explorer/

Thursday, 7 December 2017

Stock prediction using Azure Machine Learning

I like dabbling in the stock market, and wondered if artificial intelligence could help predict when I should buy/sell stock.

Using one of the FANG stocks (Facebook, Amazon, Netflix, Google) I'll choose Netflix for the purposes of this blog post.

Here is what Netflix has done in the last 3 months:
The top graph is Fast Stochastics (12,26)
As you can see, predicting the future is not easy.

So my first step was to not use AI at all, but to write a program to help me decide which values of %K and %D I should use for the Fast Stochastics graph, using the last 5 years of stock history for Netflix.

Historical data

This can be obtained from Yahoo finance.
Change the Time period to 5Y and press Apply. Then click the download data link to obtain your CSV file.



Best %K %D Fast Stochastics values for Netflix

The program I wrote takes into account that buying stock and selling stock incur a fee, and that the buy and sell values are not the same (bid/ask difference).
The fee for buying/selling stock is £10 (Motley Fool), and Netflix shares incurr a currency fee of £49. So each Netflix trade would cost me £59 GBP.

I will use a £15000 as an example for my investment. Simply buying stock 5 years ago and selling it 5 years later, the money I would have is £219,342.80

The program I wrote iterates through %K values of 3 to 100, and for each value of %K, iterate %D values of 2 to 100, and for each value of %D, iterate a buy/sell trigger by looking at the difference between %K and %D from 1 to 100. 941094 iterations in all. Each buy/sell incurs fees, and each time we sell, it's below the value we could buy at.
So the target to beat is £219,342.80, and the top results are:
k: 4, d: 4, kdDiff 55, profit: £166,469.40 loss: -£8,350.28, gain: £174,819.70, money £180,997.40
k: 52, d: 40, kdDiff 42, profit: £161,173.20 loss: -£3,937.38, gain: £165,110.60, money £175,583.20

Here is what a fast stochastic (4,4) looks like:
The difference between the %K (black line) and %D (Red line) must be 55 in order to trigger a buy/sell.

The 2nd best graph looks like:
The difference between the %K (black line) and %D (Red line) must be 42 in order to trigger a buy/sell.

Azure Machine Learning and AI


First off, we need the historical data, along with the correct buy/sell flag depending on what tomorrow brings. This is easy to work out as looking back in time is easy.
Using SQL Server, I imported the historical data into a table. I did this by using SSMS and right clicking on the database --> Tasks --> Import Data. Select the NFLX.csv file as the source, and the destination as "sql server native client". Press next a few times and finish. And voila, data is now in a table called NFLX. I left everything as string data, which is fine for my import.

Create a table with:
CREATE TABLE StockPrediction
(
    id INT IDENTITY(1,1) NOT NULL PRIMARY key,
    [name] VARCHAR(50) NOT NULL,
    [date] DATETIME NOT NULL,
    [open] DECIMAL(10,4) NOT NULL,
    [high] DECIMAL(10,4) NOT NULL,
    [low] DECIMAL(10,4) NOT NULL,
    [close] DECIMAL(10,4) NOT NULL,
    adj_close DECIMAL(10,4) NOT NULL,
    volume int NOT NULL,
    buy BIT null
)


Feel free to import any other data you wish into separate tables. Hence the need for the name discriminator.

The insert is as follows:
INSERT INTO StockPrediction (name,date,[open],high,low,[close],adj_close,volume)
    SELECT 'NFLX',[Date],[Open], High, Low, [Close], [Adj Close], Volume FROM nflx
    UNION ALL
    SELECT 'PPH',[Date],[Open], High, Low, [Close], [Adj Close], Volume FROM PPH


Next, we need to correctly set the buy column by looking at the next days close. That should tell us if would should buy or sell. This is done via:
UPDATE  StockPrediction
SET     buy = CASE WHEN (future.[close] > StockPrediction.[close]) THEN 1
                   ELSE 0
              END
FROM    StockPrediction
        JOIN StockPrediction future
            ON future.id = StockPrediction.id + 1;
UPDATE  StockPrediction
SET     buy = 0
WHERE   buy IS NULL;


Ok, so now we historical data, and we have a buy/sell flag that I'd like the AI to try and learn to forecast.

Let's export this data into a CSV file.
Run this:
SELECT * FROM StockPrediction


Right click on the results and select "Save results as". I chose StockPrediction.csv on the desktop.
Edit the file and add in the header row: id,name,date,open,high,low,close,adj_close,volume,buy

Fire up your Azure portal and create yourself a Machine learning studio. Once that has been created, head over to Microsoft Azure Machine Learning Studio.

Import your dataset by clicking on "Datasets", and clicking +New icon.

I won't bore you with all the steps, and jump straight to it. Create an experiment, and drop in your StockPrediction.csv file.

Because I have several sets of stock data in the csv file, I only want netflix data. So in the "Apply SQL Transformation" I have
select * from t1
where name='NFLX'
order by id;


For Split Data I have 0.95, meaning I want it to predict the final 5% of data, which is 2 months worth of data to predict. This is usually set to 0.7 for most models, so the last 30% is predicted. Predicting 2 months in advance is plenty for me, so I'm going with 0.95.

For Train Models, select the column 'buy'. This is what we want to predict.

I am using 2 AI engines, to compete against each other, to see which is best for this particular problem. At the botton, click the Run button. Once complete, right click on the "Evaluate Model", and select Evaluation results --> Visualise.
ROC
The results are better than guessing as they are over the midway line. The best graph to see is one that goes straight up the left hand side to the top, then across the top to the top right.

Precision/Recall
As you can see, it's about 50% accurate.

Trying with the "Two-Class Neural Network", and a few tweaked values for the AI settings, I get the following results:

Summary

Given the 5 year history of Netflix stock data, on a day by day basis is not good enough for the AI to accurately predict with. However, it's not bad. More fine grained data (per minute), and many more columns are required to help train it even better. The extra columns could be how many times Netflix appears in a tweets on that day (and the gauge of the sentiment), did directors buy/sell shares, and if so, how many, etc, etc. The more, the better. It doesn't matter if YOU think it's not relevant as the AI training will work out what data works best. Always remove the human bias, as shown by the Google chess/go playing AI proved as it beat the human biased AI every time
It was a good experiment, and one I will keep playing with. It made me realise that more data, perhaps even extraneous data, is also required to help train the AI's to give better predictions.

Thursday, 15 January 2015

Calling SQL Server stored procedures from Entity Framework

Using the methods below, you can obtain the stored procedure return value, along with your data, which I've not seen in other blogs.
A stored proc should return 0 for success, and any other value for a failure.

Here is an example table, filled with data:

CREATE TABLE DemoData
(
    id INT NOT NULL PRIMARY key,
    someValue DECIMAL(4,4) NOT NULL
)
GO
INSERT INTO DemoData(id, someValue)
VALUES (1, 1.23), (2, 2.34), (3, 3.45), (4, 4.56)

Here are our example stored procedures:
CREATE PROCEDURE GetDemoData(@maxId INT)
AS
BEGIN
    SET NOCOUNT ON;
    SELECT id, someValue FROM DemoData WHERE id <= @maxId
END
GO

CREATE PROCEDURE AddTwoValues(@a INT, @b INT)
AS
BEGIN
    SET NOCOUNT ON;
    RETURN @a + @b -- Don't do this. Stored procs should return
                   -- 0 for success, and any other value for failure.
END
GO

CREATE PROCEDURE AddTwoValuesWithResult(@a INT, @b INT, @result INT OUTPUT, @result2 INT OUTPUT)
AS
BEGIN
    SET NOCOUNT ON;
    SET @result = @a + @b
    SET @result2 = @b - @a
END
GO

CREATE PROCEDURE ConvertToString(@someValue INT, @someString VARCHAR(20) OUTPUT)
AS
BEGIN
    SET NOCOUNT ON;
    SET @someString = '*' + CAST(@someValue AS VARCHAR(20)) + '*'
END
GO


Here is the C# code to call the above:
public class DemoData
{
    public Int32 Id { get; set; }
    public Decimal? SomeValue { get; set; }
}

public class MyDbContext : DbContext
{
    static MyDbContext()
    {
        Database.SetInitializer<MyDbContext>(null);
    }

    public MyDbContext()
        : base("Name=MyDbContext")
    {
    }

    public MyDbContext(string connectionString)
        : base(connectionString)
    {
    }

    public MyDbContext(string connectionString, 
        System.Data.Entity.Infrastructure.DbCompiledModel model)
        : base(connectionString, model)
    {
    }

    // Stored Procedures
    public int AddTwoValues(int a, int b)
    {
        var procResult = new SqlParameter
        {
            ParameterName = "@procResult", 
            SqlDbType = SqlDbType.Int, 
            Direction = ParameterDirection.Output
        };

        Database.ExecuteSqlCommand(
            "exec @procResult = AddTwoValues @a, @b", 
            new object[]
        {
            new SqlParameter
            {
                ParameterName = "@a", 
                Value = a, 
                SqlDbType = SqlDbType.Int, 
                Direction = ParameterDirection.Input
            }, 
            new SqlParameter
            {
                ParameterName = "@b", 
                Value = b, 
                SqlDbType = SqlDbType.Int, 
                Direction = ParameterDirection.Input
            }, 
            procResult
        });

        return (int)procResult.Value;
    }

    public int AddTwoValuesWithResult(int a, int b, out int result, out int result2)
    {
        var resultParam = new SqlParameter
        {
            ParameterName = "@result", 
            SqlDbType = SqlDbType.Int, 
            Direction = ParameterDirection.Output
        };
        var result2Param = new SqlParameter 
        { 
            ParameterName = "@result2", 
            SqlDbType = SqlDbType.Int, 
            Direction = ParameterDirection.Output };
        var procResult = new SqlParameter
        {
            ParameterName = "@procResult", 
            SqlDbType = SqlDbType.Int, 
            Direction = ParameterDirection.Output
        };

        Database.ExecuteSqlCommand(
            "exec @procResult = AddTwoValuesWithResult @a, @b, @result OUTPUT, @result2 OUTPUT", 
            new object[]
        {
            new SqlParameter
            {
                ParameterName = "@a", 
                Value = a, 
                SqlDbType = SqlDbType.Int, 
                Direction = ParameterDirection.Input
            }, 
            new SqlParameter
            {
                ParameterName = "@b", 
                Value = b, 
                SqlDbType = SqlDbType.Int, 
                Direction = ParameterDirection.Input
            }, 
            resultParam, 
            result2Param, 
            procResult
        });

        result = (int)resultParam.Value;
        result2 = (int)result2Param.Value;

        return (int)procResult.Value;
    }

    public int ConvertToString(int someValue, out string someString)
    {
        var someStringParam = new SqlParameter
        {
            ParameterName = "@someString", 
            SqlDbType = SqlDbType.VarChar, 
            Size = 20, 
            Direction = ParameterDirection.Output
        };
        var procResult = new SqlParameter 
        { 
            ParameterName = "@procResult", 
            SqlDbType = SqlDbType.Int, 
            Direction = ParameterDirection.Output 
        };

        Database.ExecuteSqlCommand(
            "exec @procResult = ConvertToString @someValue, @someString OUTPUT", 
            new object[]
        {
            new SqlParameter
            {
                ParameterName = "@someValue", 
                Value = someValue, 
                SqlDbType = SqlDbType.Int, 
                Direction = ParameterDirection.Input
            }, 
            someStringParam,
            procResult
        });

        someString = (string)someStringParam.Value;
            
        return (int)procResult.Value;
    }


    public List<DemoData> GetDemoData(int maxId, out int procResult)
    {
        var procResultParam = new SqlParameter
        {
            ParameterName = "@procResult", 
            SqlDbType = SqlDbType.Int, 
            Direction = ParameterDirection.Output
        };
            
        var sqlQuery = Database.SqlQuery<DemoData>(
            "exec @procResult = GetDemoData @maxId", 
            new object[]
        {
            new SqlParameter
            {
                ParameterName = "@maxId", 
                Value = maxId, 
                SqlDbType = SqlDbType.Int, 
                Direction = ParameterDirection.Input
            }, 
            procResultParam
        }).ToList();

        procResult = (int) procResultParam.Value;

        return sqlQuery;
    }
}

To call the above and return the results:
using (var db = new MyDbContext())
{
    Console.WriteLine("AddTwoValuesWithResult");
    int result, result2;
    int procResult = db.AddTwoValuesWithResult(3, 7, out result, out result2);
    Console.WriteLine("result = " + result);
    Console.WriteLine("result2 = " + result2);
    Console.WriteLine("procResult = " + procResult);
    Console.WriteLine("");

    Console.WriteLine("ConvertToString");
    string someString;
    procResult = db.ConvertToString(56, out someString);
    Console.WriteLine("someString = " + someString);
    Console.WriteLine("procResult = " + procResult);
        Console.WriteLine("");

    Console.WriteLine("AddTwoValues");
    procResult = db.AddTwoValues(5, 10);
    Console.WriteLine("procResult = " + procResult);
    Console.WriteLine("");

    Console.WriteLine("GetDemoData");
    var list = db.GetDemoData(3, out procResult);
    Console.WriteLine("procResult = " + procResult);
    foreach (var i in list)
    {
        Console.WriteLine(i.Id + ", " + i.SomeValue);
    }
}

Yields the following output:
AddTwoValuesWithResult
result = 10
result2 = 4
procResult = 0

ConvertToString
someString = *56*
procResult = 0

AddTwoValues
procResult = 15

GetDemoData
procResult = 0
1 1.230
2 2.340
3 3.450

This will be available in the EntityFramework Reverse POCO Generator, available at https://visualstudiogallery.msdn.microsoft.com/ee4fcff9-0c4c-4179-afd9-7a2fb90f5838

Wednesday, 10 December 2014

Pregenerated Views for Entity Framework 6

When Entity Framework starts, it first builds a set of internal views that describes the database in an agnostic way. All further processing (queries, updates etc.) EF does, is performed against these views. Generating views however can be costly and will impact start up time of the application.
The startup time for a 1000 table context can be in excess of minutes.


However, this can be worked around by generating views at design time by using EFInteractiveViews Nuget Package. It allows you to pre-generate and save the generated views to a file. Neat!

Install the Nuget package: https://www.nuget.org/packages/EFInteractiveViews
Source code and documentation: https://github.com/moozzyk/EFInteractiveViews

Thursday, 7 August 2014

I'm in MSDN Magazine this month :-)

I am in the August 2014 edition of MSDN Magasine, Data Points section.
The article is written by Julie Lerman on the EntityFramework Reverse POCO Generator I created.
MSDN link: http://msdn.microsoft.com/en-us/magazine/dn759438.aspx

Thursday, 10 July 2014

SQL Datetime comparison gotcha

I discovered a problem when inserting data in our database. My insert statement was checking for the existence of data in the WHERE clause to prevent duplicate data being inserted. None was detected, and the INSERT happend. However, the unique constraint rejected the data as it already existed in the database.

The problem was the data to be inserted was DATETIMEOFFSET(2) and the database field being inserted into was DATETIME.

To show you want I'm talking about, run the following:

DECLARE @dt  DATETIME          = '2014-07-07 09:49:33.000';
DECLARE @dto DATETIMEOFFSET(2) = '2014-07-07 09:49:33.00 +07:00';

PRINT CASE WHEN @dt = @dto THEN 'Equals matches'
           ELSE 'Equals does not match'
      END

PRINT CASE WHEN @dt = CAST(@dto AS DATETIME) THEN 'Cast matches'
           ELSE 'Cast does not match'
      END


Results in the following:
Equals does not match
Cast matches

The comparison (=) operator does not perform the same way the implicit cast does if you insert the data. The cast/convert operator actually throws away the offset! Madness.

Tuesday, 1 April 2014

A new tutorial video

I have released a new and updated tutorial video for Entity Framework Reverse POCO generator.
It is available to stream or download at: www.reversepoco.com

Wednesday, 5 March 2014

Entityframework reverse POCO generator V2.4 released

Download: here

Whats new in v2.4.0
  1. Removed use of System.Data.Entity.DLL from the installation template as it is no longer required for EF 6.
  2. Moved spatial types from System.Data.Spatial to System.Data.Entity.Spatial for EF 6.
  3. Singular names and camel casing configuration were accidentally combined by using the UseCamelCase boolean. Thanks to Rune Gulbrandsen.
  4. Added new flag IncludeComments. This controls the generation of comments in the output.
  5. Fixed bug in constructor where a UNIQUEIDENTIFIER column had default value. Thanks to gonglei.

Wednesday, 13 November 2013

Entityframework reverse POCO generator V2.1 released

Whats new in v2.1.0:
  1. Fixed bug if default constraint was not in expected format. Thanks to Filipe Fujiy.
  2. Now detects name clashes with C# keywords.
  3. Added "System." to DateTime.Now(), etc to prevent clashes with table field names.
  4. "Configuration" class names are now configurable as "Mapping", "Map", etc.
  5. Added support for Spatial.DbGeometry and Spatial.DbGeography. Thanks to Simply Foolish and Jorge Bustos.
  6. Can now have custom collection type for Navigation Properties. "ObservableCollection" for example. Thanks to Simply Foolish.
Watch the video and Download at visualstudiogallery

Tuesday, 12 November 2013

NServiceBus and encrypting strings in messages, plus how to remove NServiceBus from your messages library

Summary

There are a few blog posts about using NServiceBus and encrypting strings using the WireEncryptedString type. However, you can also mark an ordinary string to be encrypted via configuration, which is more desirable as you can remove the dependancy from NServiceBus from your common Messages library.
There is a gotcha though, if you ever mix the two, then WireEncryptedString go back to being unencrypted in MSMQ. This blog post explains how to easily fix this, and also how to remove NServiceBus from your Messages library. Win Win.

Source code is avilable here. Compiling the project will automatically download packages via NuGet.

Initial project setup

Always have a separate Messages project to keep your messages in. In this example, the project is called Messages and contains the following class:
using NServiceBus;
 
namespace Messages
{
    public class MyMessage : IMessage
    {
        public WireEncryptedString Secret1 { getset; }
        public string PlainString { getset; }
    }
}

Note there it inherits from IMessage, and therefore has a dependance on NServiceBus. It also makes use of the WireEncryptedString which again has a dependance on NServiceBus.

Nothing special in the app.config other than to specify the message endpoint as SomeQueueName.
<configuration>
    <configSections>
        <section name="UnicastBusConfig" 
                 type="NServiceBus.Config.UnicastBusConfig, NServiceBus.Core" />
    </configSections>
    
    <UnicastBusConfig>
        <MessageEndpointMappings>
            <add Messages="Messages" Endpoint="SomeQueueName" />
        </MessageEndpointMappings>
    </UnicastBusConfig>
</configuration> 

The endpoint configuration specifies that NServiceBus use Rijndael as the encryption service:
public class EndpointConfig : 
    IConfigureThisEndpoint, 
    AsA_Client, 
    IWantCustomInitialization
{
    public void Init()
    {
        Configure.With()
                 .DefaultBuilder()
                 .RijndaelEncryptionService();
    }
}

We also need to setup an encryption key. This is done by the following class:
public class ConfigOverride : 
    IProvideConfiguration<RijndaelEncryptionServiceConfig>
{
    public RijndaelEncryptionServiceConfig GetConfiguration()
    {
        return new RijndaelEncryptionServiceConfig
            {
                // This key could be fetched from a REST/WS call, 
                // Database, or a common xml/settings file.
                Key = "blah.blah.blah.blah.blah.blah..."
            };
    }
}

The final piece is to create a class that will actually send a message:
public class MessageSender : IWantToRunWhenBusStartsAndStops
{
    public IBus Bus { getset; }

    public void Start()
    {
        var msg = new MyMessage
            {
                Secret1 = "You should not be able to read this in MSMQ",
                PlainString = "This is an ordinary string"
            };
        Bus.Send(msg);
        LogManager.GetLogger("MessageSender").Info("Sent message.");
    }

    public void Stop()
    {
    }
}

The resultant message looks like this:
<MyMessage>
    <Secret1>
        <EncryptedValue>
            <EncryptedBase64Value>QcJJfYP2SlOuDYNgFTQ8t8XI5D7zlwQzQXEFk8hcYT7gKUvqxN2Jg1UT7Q1CO929</EncryptedBase64Value>
            <Base64Iv>Q9Dc+k4V4LLPUc5+lEAtOQ==</Base64Iv>
        </EncryptedValue>
    </Secret1>
    <PlainString>This is an ordinary string</PlainString>
</MyMessage>

So far so good.

Now let's mark the PlainString to be encrypted using configuration instead of WireEncryptedString.
The first thing we do is to create a custom attribute we can use to mark strings to be encrypted:
namespace Messages
{
    public class MyEncryptionAttribute : Attribute
    {
    }
}

Then mark the strings to be encrypted:
public class MyMessage : IMessage
{
    public WireEncryptedString Secret1 { getset; }
    
    [MyEncryption]
    public string PlainString { getset; }
}

Next, we need to tell NServiceBus to treat all fields that use MyEncryption attributes to be encrypted:

namespace Sender
{
    public class ConventionsConfiguration : 
        IWantToRunBeforeConfiguration
    {
        public void Init()
        {
            Configure
                .Instance
                .DefiningEncryptedPropertiesAs(type => 
                    type.GetCustomAttributes(true)
                    .Any(t => t.GetType().Name == "MyEncryptionAttribute"));
        }
    }
}

So, run it and lets see what we get:
<MyMessage>
    <Secret1>
        <Value>You should not be able to read this in MSMQ</Value>
    </Secret1>
    <PlainString>k9FJRIfCMmyOW4ev8wytLIg56wEKsSbg1stmSo8Wu44=@XEx7Gkaks5nfhrQAuS2kZw==</PlainString>
</MyMessage>

Whoops!

The PlainString is now encrypted as we want, but in doing so the Secret1 string has become plain text. The Secret1 string is still a WireEncryptedString type and yet it still insists on being plain text, just because we've used the DefiningEncryptedPropertiesAs property on NServiceBus.


Let's fix it

We do this by changing ConventionsConfiguration class to also include the WireEncryptedString type as follows:
public class ConventionsConfiguration : 
    IWantToRunBeforeConfiguration
{
    public void Init()
    {
        Configure
            .Instance
            .DefiningEncryptedPropertiesAs(type =>
                type.PropertyType.Name == "WireEncryptedString" ||
                type.GetCustomAttributes(true)
                .Any(t => t.GetType().Name == "MyEncryptionAttribute"));
    }
}

Running this, the message now appears as:
<MyMessage>
 <Secret1>
  <EncryptedValue>
   <EncryptedBase64Value>4PHC9vcbOID08b8wbCl9tHdv8VH0aW8lFRFYGyHodwFkaV+YzF07aPPeRS3PVSCY</EncryptedBase64Value>
   <Base64Iv>/oFdQX9eTv7WWPJZIRwYsw==</Base64Iv>
  </EncryptedValue>
 </Secret1>
 <PlainString>P5vhzqd3NibLovHYMk8cVI/9k60jVDPERMHRPMMuCmg=@qbwJ2hrdeXegHbkisO4J8w==</PlainString>
</MyMessage>

Voila!

Both strings are now encrypted. You can see they are slightly different in the message.
To access Secret1, you do it via the .Value() method on Secret1. To access the plain unencrypted text on PlainString, you simply access the string as normal. Both strings are encrypted and unencrypted for you by NServiceBus.

How to remove NServiceBus from our Messages library

Now we have a way to tag strings to be encrypted, we can alter our message class to be:
public class MyMessage : IMessage
{
    [MyEncryption]
    public string Secret1 { getset; }
    
    [MyEncryption]
    public string PlainString { getset; }
}

Next we need to remove the IMessage interface, and remove all dependancies from NServiceBus. In a similar fashion to marking strings with attributes, we can mark message classes with an attribute.
public class MessageAttribute : Attribute
{
}

The we tag the message as follows:
[Message]
public class MyMessage
{
    [MyEncryption]
    public string Secret1 { getset; }
    
    [MyEncryption]
    public string PlainString { getset; }
}

NServiceBus now needs to be told which messages it can use. This is done in our ConventionsConfiguration class by adding a call to .DefinishMessageAs() as follows:
public class ConventionsConfiguration : 
    IWantToRunBeforeConfiguration
{
    public void Init()
    {
        Configure
            .Instance
            .DefiningEncryptedPropertiesAs(type =>
                type.PropertyType.Name == "WireEncryptedString" ||
                type.GetCustomAttributes(true)
                    .Any(t => t.GetType().Name == "MyEncryptionAttribute"))
            .DefiningMessagesAs(type =>
                type.GetCustomAttributes(true)
                    .Any(t => t.GetType().Name == "MessageAttribute"));
    }
}

Let's run this to see what the message now looks like:
<MyMessage>
    <Secret1>Rv7L0pDwDlY9KOZQqXuoRPGnD8MxXUJ7dPCOfXnBpP337Egz59aGMY/z6yhW03op@CAeXrzjYBwXYiN2R1iGKrw==</Secret1>
    <PlainString>obwa5exdHIt7AFoQKypzSZIRQR/Q2srJgP3Valej54o=@T09tOdiPjMxFntdI1FJouw==</PlainString>
</MyMessage>

That's it

  1. I've removed all references to NServiceBus from my messages library. So no more messy versioning problems for other projects.
  2. Strings to be encrypted are tagged with an attribute, which we tell NServiceBus about.
  3. Message classes are tagged with an attribute, which we tell NServiceBus about.

Friday, 8 November 2013

What's new roundup for EntityFramework Reverse POCO Code First Generator

Whats in the next version:
  1. Fixed bug if default constraint was not in expected format. Thanks to Filipe Fujiy.
Whats new in v2.0:
  1. Fixed issue when running 'Transform All T4 Templates' from Visual Studio Build Menu. Thanks to JRoselle.
  2. Changing mappings during runtime is not possible/expensive. A specific DbModelBuilder can be used for each needed database schema. Thanks to meixger.
  3. Added ability to detect and use .IsRowVersion().
  4. Added many-to-many mappings. The generated code now includes calls to .Map(). Therefore the generated code will be different to what you had previously in v1 if you have many-to-many table mappings in your database. Hence the revision change to v2.0.0 as I'm using semantic versioning.
Whats new in v1.12:
  1. VARBINARY(MAX) is now correctly assigned to byte[]. Thanks to Luke91577.
  2. Extending partial class support. Allow specification of file extension for partial classes (i.e. ".generated.cs"). Thanks to AB_dreeve.
  3. Fixes issues when targeting .NET 4. .NET 4.0 doesn't include the System.ComponentModel.DataAnnotations.Schema namespace, allow specification of TargetFrameworkVersion. Thanks to AB_dreeve.
Whats new in v1.11:
  1. Including views is now working. For a view to be included, at least one column must not be nullable. Thanks to Delmo Carruzzo.
  2. Added selective generation of components (Poco, Context, UnitOfWork, PocoConfiguration). You can now build entities in your Model project, and context, configuration, unit of work in your Data project.  Thanks to kscelfo.
  3. Added PrependSchemaName flag. You can now control if the schema name is prepended to the table name.  Thanks to kscelfo.
Whats new in v1.10:
  1. Added UseCamelCase flag to the tt file. You can now control if you want your table and column names CamelCase or left alone: i.e. FieldPersonOrder or field_person_order
Whats new in v1.9.2:
  1. Removed the Linq de-duplication code as it is now being done in SQL. The Linq version did not include the schema name when de-duplicating FK's.
Whats new in v1.9.1:
  1. Added DISTICT to the foreign key SQL. This prevents duplicates from appearing.
    They can appear with SQL such as:
    ALTER TABLE [Mars].[Table2] WITH CHECK ADD CONSTRAINT [FK_Table2_Table1]
        FOREIGN KEY([Table1Id]) REFERENCES [Mars].[Table1] ([Id])
    ALTER TABLE [Mars].[Table2] CHECK CONSTRAINT [FK_Table2_Table1] 
    
Whats new in v1.9.0:
  1. It now has the ability to read connection strings from other config files/projects.
    It starts by looking at the local project for the named connection string in the following files:
    • app.config
    • web.config
    • app.config.transform
    • web.config.transform
    • or edit the above list in the .tt and specify which file(s) to use
    If not found, it then looks for those files in all the projects within the solution.
Whats new in v1.8.0:
  1. Can now optionally generate separate files. See the new boolean flag GenerateSeparateFiles.
  2. Removed the hard coded 300 second timeout. The timeout can be specified in the connection string.
Whats new in v1.7.1:
  1. Add support for DateTime2
  2. Add support for UnitOfWork and the repository pattern. This means you can now unit test your repositories. See the source code for examples.
  3. Add IDisposable to db context.
Whats new in v1.7.0:
  1. Add .HasPrecision(precision, scale) for fields that have scale. Thanks to @choudeshell
Whats new in v1.6.0:
  1. Enhance ReadSchema performance with early table exclusion.
  2. Now supports columns with symbols and punctuation.
  3. Handles decimal default values.
Whats new in v1.5.1:
  1. A bugfix for WCF. See issue https://efreversepoco.codeplex.com/workitem/4
Whats new in v1.5.0:
Ability to add WCF(DataMember, DataContract attributes) support on Entity. Requested by spatemp