Monday, 11 November 2019

AI Reading comprehension scores

In September 2019 there was a great many blog posts about AI surpassing human reading comprehension scores.
Humans comprehension of a piece of text is 89.452%

As of November 6 2019, AI is now hitting scores of 92.425, by ALBERT + DAAF + Verifier (ensemble) PINGAN Omni-Sinitic

Plotting the scores on a graph:


Extrapolating the scores, we should see an AI hit 100% sometime in 2020.
Spreadsheet I created for the graph above: AI-Comprehension-score.xlsx

Original data source: https://rajpurkar.github.io/SQuAD-explorer/

Monday, 23 September 2019

Entity Framework - Things you should be aware of…

1. Entity Cache

First thing to be aware of is entity caching
  • In this example, the first query reads data from the repository and materialize the data to a category entity, and update its Name. Then the repository is queried again by Name.
  • After reading the data, Entity Framework founds the primary key is the same as the cached entity, so Entity Framework does not materialize the data just read, it reuses the previous category entity. Performance is improved by skipping the materialization, but tricky results can happen.
  • When using DbSet.SqlQuery to directly execute SQL query in the repository, Entity Framework still looks up cache before materializing.


2. Materializing a new entity (no caching)

Entity is not cached when tracking is turned off, or entity is not queried from the repository. Each of the following queries materializes a new entity:


3. DbSet.Find

DbSet.Find accepts the primary keys and returns an entity. Calling Find can improve the performance, because it looks up cache before querying the database.
  • Calling Find can improve the performance
  • It looks up cache before querying the database


4a. LINQ query translation cache

  • Entity Framework translates a LINQ to Entities query in 2 steps:
    • Converts .NET expression tree to database command tree
    • Generate SQL from database command tree
  • To improve the performance, the generated SQL is automatically cached for each database command tree. Take the following query as example:

  • These first LINQ query builds expression trees with a ConstantExpression node representing int value 1.
  • The second query builds similar expression tree but with a different ConstantExpression node representing int value 10.
  • They are converted to 2 different database command trees, with 2 different DbConstantExpression nodes.

4b. Cached translations

  • To resolve this problem, these queries can be parameterized by simply replace the constants with variables:

  • The predicate lambda expressions capture variable minLength with the closure syntactic sugar.
  • The variable access is compiled to field access. So in the LINQ queries’ expression trees, there are no longer ConstantExpression nodes, but FieldExpression nodes. Entity Framework converts these FieldExpression nodes to DbParameterReference nodes, representing int parameters. As a result, these 2 LINQ queries are converted to identical database command trees.

5a. Skip & Take

  • If a query method accepts values instead of lambda expression, this parameterization approach does not work. For example, Skip and Take accept int values as parameters:

  • The above LINQ queries access to variable skip and take, but these variable access are also represented by ConstantExpression nodes. So their expression trees are different, and converted database command trees are different, and their translations cannot be reused for each other.

5b. Lambda expression

  • To resolve this problem, Entity Framework provides a lambda expression version for these methods:


5c. Skip & Take (optimal solution)

  • Now Skip and Take can access variables via closure:

  • These LINQ queries have FieldExpression nodes again. Entity Framework can convert them to identical parameterized database command trees. Now their translations can be reused for each other.

So what?

  • It doesn’t actually hurt anything, does it?
  • Unfortunately, it hurts SQL Server.

SQL query plan cache

  • LINQ queries with different constants are translated to different SQL queries.
  • The Previous queryWithConstant1 and queryWithConstant2 are translated to:


Multiple query plans :-(

  • They have different query plans in SQL database, which cannot be reused for each other:



Single query plan

  • With parameterization, queryWithClosure1 and queryWithClosure2 are translated to identical SQL queries, with different parameter values:


Cached SQL query plans :-)

  • queryWithClosure1’s query plan is now cached and reused for queryWithClosure2:

Final take-way...

If you forget everything, and can only remember two take-aways from this blog:
  • Remember to use lambda’s for your Skip & Take functions.


  • Use .Find() as this can skip hitting the database altogether.
    Instead of
    • .Where(x => x.Id == SomeId);
    Use
    • DbSet<T>.Find();