Tuesday, 11 December 2018

Amazon Alexa skill account linking using IdentityServer4

It took a lot of reading and quite some time to wade though exactly what was required to get Amazon Alexa account linking working with our Identity Server 4 oauth server. Most of the stuff out there was to perform account linking with Amazon's own OAUTH server, and not IdentityServer4.

Well, I finally got to the bottom of it all, and to save you devs valuable time and frustrations, I've laid it all out below:

  1. Create your Asp.Net Core API
  2. Configure Identity Server 4 for account linking
  3. Create your Alexa Skill
  4. Account link your Alexa skill to Identity Server 4. Amazon will take care to call your Identity Server 4 to obtain a token and manage refresh tokens for you.
  5. Call your API from Alexa.

Alexa voice command → Amazon Lambda function → [Hidden Identity Server 4 call] → Asp.Net Core API → Return Speech back to Alexa to say aloud.

Asp.Net Core API

The controller for your Alexa API should look something like this:

The IDataService is used solely for accessing the database and creating a return dto class.
The ISpeechServer takes the dto class and creates speech from it. For example:

Notice that the Controller is protected with
[Authorize(Policy = AuthSecrets.CadAlexaApi)]
That policy is declared in the Startup.cs



Configure Identity Server 4 for account linking

I've separated the identity server 4 from the API and is in a separate solution.
Nothing special in the Program.cs:

Startup.cs:

AuthConfig.cs

AuthSecrets.cs

Clone the IdentityServer4 samples source code from GitHub and copy the Quickstarts, Views and wwwroot folders to your identity server implementation. I previously tried other Quickstarts from other IdentityServer repos, and found this one to be the best. Your mileage may vary...

Nothing special in SeedData.cs

Account link your Alexa skill to Identity Server 4

In https://developer.amazon.com/alexa/console/ask/
Click on Build, and the ACCOUNT LINKING tab on the left




Select the "Auth Code Grant" with the following options:
Authorization URI: https://url-to-your-identity-server/connect/authorize
Access Token URI: https://url-to-your-identity-server/connect/token
Client ID: ALEXA
Client Secret: take the raw unencrypted string from AuthSecrets[CadAlexaApi].Secret
Client Authentication Scheme: HTTP Basic (Recommended)
Scopes:
  • email
  • openid
  • AlexaApi
  • offline_access
Domain List: is empty
Default Access Token Expiration Time: 31536000 Not sure if you can leave this blank or not.
The Redirect URLs shown on your screen are what you need to for Client configuration above.

Call your API from Alexa

I've kept the lambda function as Node.JS.
Install the NPM package node-fetch

Zip the folder up, and upload it to the amazon lambda associated with your skill.

Wednesday, 5 December 2018

Azure Sql Server Profiling

As you may have already guessed, you cannot use SQL Server Profiler on an Azure database. However, you can use the following code to find out what SQL was executed: In order to get the real parameter values, you need to enable sensitive data logging by using DbContextOptionsBuilder.EnableSensitiveDataLogging method:
Enables application data to be included in exception messages, logging, etc. This can include the values assigned to properties of your entity instances, parameter values for commands being sent to the database, and other such data. You should only enable this flag if you have the appropriate security measures in place based on the sensitivity of this data.

Thursday, 6 September 2018

SQL Server Paging - The Holy Grail

More info on the above see SqlServerCentral

Another version:

Interpreting the query plan and speeding it up

  • Bookmark Lookup - An index was used but as the index does not 'cover' the query the query processor must use the ROWID from the index to 'lookup' the actual data row. Unless the index can be alter to ‘cover’ the columns of the query this cannot be optimised further.
  • Clustered Index Scan - Similar to a table scan (its just that the table has a clustered index) and similarly undesirable, particularly for large tables. If you see these then the columns in the query are either not indexed or the distribution statistics have led the query optimiser to decide the scan is more/ as efficient. Restructuring the query can sometimes eliminate these operations; or add an index.
  • Clustered Index Seek - The clustered index is used to find matching rows. This is optimal behaviour.
  • Compute Scalar - As it says – a scalar is being computed. Unless this is not really needed it can't be optimised. It its not needed, remove it!
  • Constant Scan - The query requires a constant value in some (or all) rows
  • Hash Match - The query processor builds a hash table for each row being processed. As subsequent rows are processed, the hash is computed and compared to the hash table for matches. Queries with DISTINCT, UNION, or aggregates often require a hash table to remove duplicates. If such operations are required there is little you can do to optimise.
  • Index Scan - The non-clustered index is being used to locate a large number of data rows which must then be scanned. Unless you can restructure the query to return fewer rows this cannot be optimised.
  • Index Seek - A non-clustered index is being used and only a small part of the index is required. You've chosen your indexes well as this is one of the most efficient operations.
  • Index Spool - As rows are scanned tempdb is used to store a 'spool' table that can be used rather than re-reading input rows. This is an internal optimization for complex queries and cannot be changed.
  • Merge Join - Occurs when the two inputs contain sorted data and the query processor can merge them together, or when two or more indexes are used to query a table. Very efficient but to occur the joins must have access to sorted data – achieved by indexing on join or ORDER BY columns.
  • Nested Loop - In this join, one table is chosen as the inner and scanned for each row of the outer. This is only efficient for small numbers of rows. Restructuring the query can remove these joins. Note that if available memory is low these are more likely to occur.
  • Remote Query - As it says – a query occurring on a remote data source so you either need to optimise on the remote data source or move the necessary data to your local SQLServer.
  • Sort - Is expensive but tends also to be necessary.
  • Table Scan - Is performed when the table has no clustered index (its a heap). Probably the least desirable operation to see chosen as the query processor will read each row into memory in order to decide whether it should be returned by the query. For tables of more than a few hundred rows you should add an appropriate index. Some of the gains in performance come from writing well structured, efficient queries that return the minimum amount of information needed, but most gains in performance are made by choosing good indexes. Hence the basic recommendation is to ensure that there are indexes on all tables and that the statistics for those indexes are up to date. Not coincidentally, these are the two main factors that influence the query optimiser in making decisions about the execution plan. At this stage I would recommend you run a few multi-table queries from the Northwind database, or your own, in Query Analyzer and assess whether the execution plans generated are optimal based on the above information.
  • Optimising queries - As just stated, indexes and rewriting queries are the most common options for optimisation but you can also use query hints though care should be taken to ensure you’re not forcing the server to choose a sub-optimal execution plan. Generally query hints are not recommended – SQLServer knows best! Considering indexes: in order for SQLServer to make use of an index, the first indexed column must be included in the WHERE clause as part of a qualification, or it will not be considered. However, all columns in a WHERE clause do not need to be included in the index for it to be chosen. There can only be one clustered index per table and, as a result, it should be chosen carefully. By default the primary key is the clustered index but often this is not the best choice unless it is the only index that will exist on the table. Clustered indexes are best used for range queries, e.g. dates or numerical ranges. Non-clustered indexes work best on large tables when very few rows are being returned. The query optimiser will often choose a non-clustered index that is highly selective when the index columns are included in the join statements. When this occurs SQLServer can find the row needed in the index very quickly and get to the actual data quickly as well. If not selective the process is much less efficient and the index may not be chosen. An exception is if the non-clustered index is a covering index. Foreign keys are usually good choices for non-clustered indexes as usually these columns are used in joins. Regardless of the type of index there are a few general guidelines for creating indexes:
    1. Index those columns often used by joins
    2. Be sure that the first column of the index is the column specified in most joins.
    3. Analyze your queries and ensure your indexes are being used; if not they are an unnecessary overhead and/ or indexes could be better placed
    4. CONSIDER FILEGROUPS: placing non clustered indexes on a different filegroup on a distinct physical device than the data itself can lead to performance gains

How to compare one query against another to see which is best

Highlight both queries, and press CTRL-L to view the query plans of both.

The two percentages will total 100%. In this case, the first query at 98% sucks, and the 2% one wins by a mile. The lower % the cost, the better.

 

 What is the true cost of a query?

Click on the FIRST box of the query plan, and hover your mouse over it. A pop up window will appear. The cost of the query is the "Estimated Subtree Cost".

If Estimated Subtree Cost is < 20, its OK. If it's > 20, you really need to change it, or add in an index, etc.

 

 My query is slow, how to easily speed it up

I used to get this a lot. And it either boils down to an "OR" clause, or a non-sargable query.

 

Get rid of 'OR' clauses

If you query contains an OR clause in the query, Immediately split the query into two queries, one query using the left side of the or clause, and the other the right side. Then join the results with a UNION ALL. Don't use UNION as that is slow due to it having to remove duplicates between the two result sets.

Example:

Slow query
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FP.forename = 'Simon'
        OR FC.name LIKE 'acme%';

Faster query
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FP.forename = 'Simon'
UNION ALL
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FC.name LIKE 'acme%';

Even a simple case as those two above the query plan comparison is 55% vs 45%.

 

Non-Sargable queries

The most common thing that will make a query non-sargable is to include a field inside a function in the where clause:

SELECT ... FROM ... WHERE Year(myDate) = 2008

The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use:
 
WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009'

Some other examples:
 
Bad:  Select ... WHERE ISNULL(FullName,'Simon Hughes') = 'Simon Hughes'
Fixed: Select ... WHERE ((FullName = 'Simon Hughes') OR (FullName IS NULL))

Bad:  Select ... WHERE SUBSTRING(Forename, 3) = 'Sim'
Fixed: Select ... WHERE Forename Like 'Sim%'

Bad:  Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm, -30, GetDate())

Wednesday, 18 July 2018

MSMQ Best Practices

Performance recommendations

  1. Avoid unnecessary database lookups before deciding if a message can be dropped.
  2. Prioritization should be given in code for dropping the message as quickly and efficiently as possible.
  3. If database lookups are required in deciding if a message can be dropped, can these be cached (with a refresh timeout)?
  4. If you are dropping messages, is there a better way to only receive the correct messages?
  5. If you are subscribed to an event processor publisher, does that publisher provide any kind of subscription filtering? Filtering can be achieved via database event subscription tables, (a list of subscribers with their input queues, and a list of events each input queue is interested in).

Separate event types

Publishers should translate event types into separate System.Types, with hierarchies to support grouping.
For example, with some kind of  delivery event data, these could be group interfaces such as ProofOfDeliveryGroup, ProofOfCollectionGroup, etc.
Inherited from ProofOfDeliveryGroup would be the specific interface types such as ManualProofOfDelivery, DeliveredByCourierToSite, etc.
This allows subscribers to use NServiceBus idioms for subscribing to only the messages they need and removes the need for specialised publication filtering as seen in the above step 5 publisher.
NServiceBus recommends interfaces for events because you can effectively do multiple inheritance, which isn’t possible for classes and allows for handling “groups” of events, as well as gentle evolution of events.

Separate the handlers

Favour multiple event handlers over a small number of more complex handlers.
Each handler should do one thing and be named after that one thing. The order of operation of the event handlers can be specified (see below) and this will start to read like a pseudo specification of what happens when an event of that type arrives.
As with any unit of code, message handlers should aim to follow 'Single Responsibility Principle' and only have one reason to change, and so one should favour multiple small handlers over fewer large ones. However, only if there is no implicit coupling (e.g. through bespoke handler ordering), in which case look for other ways to accomplish this.

General recommendations

  1. There should only one place to subscribe to any given event, though publishers can be scaled out if necessary, as long as each instance shares the same subscription storage database.
  2. To avoid coupling, either:
    1. Publish a new message when a message is handled (successfully or otherwise), so another handler (potentially in a different endpoint) can handle it in a new transaction.
    2. Use NServiceBus Sagas.
  3. Using separate messages and/or sagas allows implementation of the “no business reason to change” philosophy, where all failures are technical failures, and attempts can be made to overcome then with automatic retries etc, using separate, chained transactions. This is especially helpful when dealing with resources such as email or the file system that do not participate in the ambient distributed transaction while a message is being handled.
  4. It is possible to perform validation/mutation of messages before any handlers are invoked (Message Mutators). Again, prefer this over handler ordering.
    1. Mutators are not automatically registered using dependency injection.
    2. Mutators are registered using:
      endpointConfiguration.RegisterMessageMutator(new MyIncomingMessageMutator());

      endpointConfiguration.RegisterMessageMutator(new MyOutgoingTransportMessageMutator());

Handler ordering

NSB documentation
Multiple classes may implement IHandleMessages for the same message. In this scenario, all handlers will execute in the same transaction scope. These handlers can be invoked in any order but the order of execution can be specified in code.
The way NServiceBus works is:
  1. Find the list of possible handlers for a message.
  2. If an order has been specified for any of those handlers, move them to the start of the list.
  3. Execute the handlers.
The remaining handlers (i.e. ones not specified in the ordering) are executed in a non-deterministic order.

Specifying one handler to run first

public class SpecifyMessageHandlerOrder : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.SpecifyFirst<handlerb>();
    }
}

Specifying multiple handlers to run in order

public class SpecifyMessageHandlerOrder : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.Specify(
            typeof(HandlerB),
            typeof(HandlerA),
            typeof(HandlerC));
    }
}

Example

public class OrderReceivedEventHandlerOrdering : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.Specify(
            typeof(ValidateOrderEventHandler),
            typeof(CheckForDuplicateOrderEventHandler),
            typeof(PlaceOrderEventHandler),
            typeof(SendOrderEmailConfirmationEventHandler));
    }
}

With the configuration API

This is typically done within the EndpointConfig class.
configuration.LoadMessageHandlers(
    First<HandlerB>
    .Then<HandlerA>()
    .AndThen<HandlerC>());

Preferred method

Using the interface ISpecifyMessageHandlerOrdering is the preferred method, as these can be placed within the area of concern. This makes it easier to maintain as you don't have to go searching for the ordering of the handlers.

Dropping messages

If you are not going to process all messages, but decide to drop/filter some out, have a separate handler for these and make this the first handler:
public class SpecifyMessageHandlerOrder : ISpecifyMessageHandlerOrdering
{
    public void SpecifyOrder(Order order)
    {
        order.Specify(
            typeof(MessageFiltering), // FilterMessage, IgnoreInapplicableEvents, IgnoreIfNotLatestEvent, etc
            typeof(HandleSomeMessage);
    }
}
You may wish to have several message filters, all with their own criteria: order.Specify(First.Then()); etc.
The takeaway is to be as efficient as possible in dropping messages if it's of no interest.

Event processor filtering

However, it’s generally preferable that publishers simply publish all events that are subscribed to. This leaves the subscribers in charge of what messages they receive and which of those to ignore and how. Filtering of published messages increases coupling between publisher and subscriber and should only be used as a last resort when subscribers have not been implemented/deployed in a scalable way.

However, if your events come from CDC (Change data capture) and you want to take advantage of event filtering, you need something similar to the below:

CREATE TABLE EventProcessor
(
 Id INT NOT NULL IDENTITY(1,1),
 [Name] VARCHAR(200) NOT NULL,
 [Description] VARCHAR(512) NULL,
 [EndpointAddress] VARCHAR(512) NULL, -- Name of msmq queue
 [Enabled] BIT NOT NULL,
 CONSTRAINT [PK_EventProcessor] PRIMARY KEY CLUSTERED (Id)
);
CREATE TABLE EventProcessorEventFilter
(
 Id INT NOT NULL IDENTITY(1,1),
 EventProcessorId INT NOT NULL,
 WantedEventId INT NOT NULL,
 CONSTRAINT [PK_EventProcessorEventFilter] PRIMARY KEY CLUSTERED (Id)
);
GO
CREATE UNIQUE INDEX [IX_EventProcessorEventFilter] ON EventProcessorEventFilter (EventProcessorId, WantedEventId); 
 
ALTER TABLE EventProcessorEventFilter ADD CONSTRAINT [FK_EventProcessorEventFilter__EventProcessor] FOREIGN KEY (EventProcessorId) REFERENCES EventProcessor (Id); 
 
ALTER TABLE EventProcessorEventFilter ADD CONSTRAINT [FK_EventProcessorEventFilter__SomeEventEnumTable] FOREIGN KEY (WantedEventId) REFERENCES SomeEventEnumTable (Id);
GO

Tuesday, 22 May 2018

AI Reading comprehension scores

In January 2018 there was a great many blog posts about AI surpassing human reading comprehension scores.
Humans comprehension of a piece of text is 82.304% (Human Performance Stanford University, Rajpurkar et al. '16)

As of March 19 2019, AI is now hitting scores of 83.877, by QANet (ensemble) Google Brain & CMU.

Plotting the scores on a graph:
 

Extrapolating the scores, we should see an AI hit 100% sometime in 2020.
Spreadsheet I created for the graph above: AI-Comprehension-score.xlsx

Original data source: https://rajpurkar.github.io/SQuAD-explorer/