If you thought all you had to do was to declare a few indexes here and there and MySQL would magically be fast, you’ll be surprised reading this excellent article.
If you thought all you had to do was to declare a few indexes here and there and MySQL would magically be fast, you’ll be surprised reading this excellent article.
MySQL 8.0.0 has a few new UUID functions! Read all about this here!
Sure, there are tools and script to help you fine tune your MySQL server but there’s nothing like using your brain.
Pythian has done a wonderful job of explaining what all those InnoDb-related variables in a series of articles…
Exposing Innodb Internals via System Variables: Part 1, Memory
Exposing Innodb Internals via System Variables: Part 2, I/O (structure and logs)
Exposing Innodb Internals via System Variables: Part 3, I/O (Table data)
Exposing Innodb Internals via System Variables: Part 4, Concurrency
Exposing Innodb Internals via System Variables: Part 5, Consistency / Statistics handling
As you can see, Freewill now supports 17 different selection policies. At this point, all of them are coded but only half of them have been tested.
The 11 available termination policies are coded, half of them tested.
So far, only 2 mutation policies are available. Both of them are coded and tested. I will probably need a few extras for TSP type of problems as well as numerically parametrized problems (e.g. De Jong functions with a domain for each variable). I’ll probably add 3-4 other ones specific to the problem that started all this adventure!
Only one immigration policy (no immigration!) is available and it will stay that way for a long time. I’ll wait until I am hyper confident that this framework is rock solid before introducing parallelism and exchange of individuals between « islands » (i.e. simulations). This one is a faaaaaaar away!
Six crossover policies are available as of now . This area will require some (minor I think at first glance) changes for the TSP type of problems : not quite decided on the approach I will take to solve this. Since crossover is often very problem/chromosome specific, I’ll probably delay those change until the end, once I have all examples coded and ready to be tested to have a better idea of what is needed. But I will definitely add a few (3-4) crossover policies tailored for the Ruzzle problem.
I have solved the discrepancy (see here and here) between my results and the TSPLIB ones regarding the tour length of the Burma14 problem. Will probably add a lot bigger TSP problem to see how the framework can handle an extremely huge search space! Oh! And I need to clean up all the crap I added/modified while looking for the problem of « distance difference » : 2 classes were butchered in the process!
I need to add a few « crash test dummy » classes to test all those different selection policies (and crossover) in a simpler and more efficient manner! Or I should kick myself in the %*&#$!@ and code the « bits » example classes…
I will soon work on a customizable display of statistics. All that’s needed is already there, it’s just a matter of gluing everything together!
Once I’m done with the 8 queens problems, I’ll attack the numerically parametrized problems. Will probably have 2-3 examples (from De Jong functions) as well as the INSANE Griewank function.
The classes used for randomly choosing the next parent chromosomes as well as scaling/ranking can be optimized. But since they just work great since day 1, I’ll keep that for the very end. But I know they can be a lot faster than what they are right now.
I also plan on having a very basic export mechanism so I can dump all those ruzzle chromosomes in a MySQL database to be able to do some reporting and study the various policies and their effects.
I started adding comments to the classes, mostly to keep references, maintain a todo list per class and add some notes for myself to quickly remember why things work that way!
I’ll probably have an image by tomorrow that will run simulations for the ruzzle problem full-time. I wanna beat that record!
Note: this article was originally posted in French here.
You! Yes! Yep, you! I’m talking to you my friend! You, the one that I help out in forums, on IRC, on mailing lists!
Each and every day, I answer questions related to MySQL, mostly about SQL queries or optimization problems. Whether it’s on dBForums, Stack Overflow, MySQL.com, developpez.net or on IRC (irc.freenode.net, channel #mysql), it’s always the same thing! If you want me to help you, make it easy for me!
So, here’s a quick guide explaining what you have to do if you want us/me to help you solve your problem as quickly as possible.
Explain what your problem is as simply and clearly as possible
If you’re unable to explain, in one sentence, what you are trying to do, we’re off to a very bad start. You have to know what you’re talking about first! You have to know what you’re trying to do in the first place!
A good example: « I’m trying to find the best salesman in every store located in a district that sells less than the national district average« . This is clear and simple to understand!
A bad example: « My SQL query is slow, there it is » :
SELECT COUNT(*) AS x1k1, w.nndt, w.n01_d FROM p_usr_htgg w WHERE EXISTS (SELECT w2.id FROM puw_user w2 LEFT JOIN temp_eoms te ON te.mt = w2.mt WHERE w2.k1 = w.k1 AND w.sales = MAX(w2.sales) ) GROUP BY w.nndt, w.n01_d HAVING COUNT(*) > 3 ORDER BY x1k1 DESC LIMIT 0, 100
Can you see the difference between these two examples? In one case, the problem is clear. In the second case, just seeing the SQL without any further explanation makes me want to puke! How can I help with all that SQL garbage?
Start small! If you have no clue about how to use JOINs, don’t focus on the the ROLLUP or aggregate stuff!!! Learn the basics while you’re at it! Learn the lingo as well. If you don’t know what something means, ask it! Don’t pretend you know when you don’t!
One more thing : pick names that are meaningful for tables and columns! Acronyms and cryptic abbreviations tell me nothing! Same thing for table and column names in Italian, Greek, Spanish or German! Use English !
« I have a huge database with an enormous amount of concurrent users and my query is very slow and it is executed many many times« .
What is huge for you is probably a joke for Google, Amazon and GitHub or for someone working in bioinformatics: give me numbers (number of records, tablespace size, etc.), not your impressions!
« Enormous » is often relative! Fifty concurrent users and a table of 100 million rows on a server with 2G of RAM is an enormous amount of users! But on a server with 16 processors and 128G of RAM, 50 concurrent users is not a problem at all : once again, give me numbers, not adjectives!
« Many many times » doesn’t mean anything to me! What does that mean exactly? 200 times/minute? 15 times per minute? What? Give me precise numbers! What do you mean by slow? 3 seconds or 0.2 seconds? Be precise!
Don’t make me google for you
Please, make some minimal effort! Often times, I can give you an URL explaining your problem faster than the time it took you to write your question! You would have saved everyone’s precious time if you would have googled in the first place!
Give me some test data and expected results
« It doesn’t work! I should have six xyz records for every abcd record but I only get five ».
This tells me nothing useful! I don’t know your database, your tables, your domain, the content of the columns, etc. Even worse, I don’t know what’s in those tables, how the data is spread out, the selectivity of some columns, etc. Creating a test example with some data only takes two minutes on SQLFiddle!
Show me the EXPLAIN
Your query is slow? So what! Why? I have no way to know unless you give me the EXPLAIN of your query, I cannot help you at all nor even know why MySQL picks a particular access plan over a more promising one!
Show me the SHOW CREATE TABLE
Even if you provided me with the EXPLAIN of your SQL query, I still need to know what the table looks like. One is useless without the other! I also need the SHOW CREATE TABLE, it’s as simple as that!
Are both keys referenced in your JOIN of the same type and length? Is the LIKE in your query way too slow because we have to scan 4Mb of text in every record? Are we trying to match records by looking into a 1G BLOB column? Is it an InnoDB, a MyISAM or another table type? What are the character set and collation?
Learn to use SHOW VARIABLES
In some cases, your query is fine and the problems you encounter are caused by session/server/client parameters! You have to know how to use SHOW VARIABLES!
Learn to use SHOW STATUS
Sometimes, you have to be able to check the status of your server. This is essential! In that case, SHOW STATUS is another command you must know!
Use SHOW TABLE STATUS
The SHOW TABLE STATUS will allow you, in a glance, to have a precise picture of what’s in your database. You’ll be able to see, in one shot, table names, their type, their size, collation info, etc. Very useful command : for you and for me!
Give me some volumetrics
I just said it earlier and I’ll say it again : I need numbers, measures, to help you! Give me numbers, not fuzzy impressions!
Most of your queries execute in 0.1 second but your server cannot keep up? Why? Perhaps, it would have been useful to know that you have 3500 concurrent connections at any moment hitting this poor server! Or that one particular query is executed 900 times per minute!
Your query takes 27 seconds to complete? Perhaps you should have told me that your table has 728 million records!
You wonder why the EXPLAIN of your query shows a full table scan even though you’ve indexed just about everything on that table? Had I known your table only has 8 records, I would have told you that in that case a full table scan is perfectly normal!
If you want me to help you, I need a minimum of context ! Only numbers can tell me the real story! Not « enormous » nor « many » or « slow » !
Take a deep breath : do not panic!
Premature optimization is the root of all evil.
« I have this HYPER-HUGE-BIGGEST-OH-MY-GOD database and my query, with 2 INNER JOIN, is executed every 15 seconds and we add 2500 records to the main table every month… »
Take it easy! Relax! MySQL is not Excel : it can handle lots of data! For my own amusement, my server at home has 10000+ tables! Performance is mostly a matter of hardware and every database is different!
Unless you’re trying to compete with Google or you have a very complex query that causes real problems, don’t try to optimize every SQL query and make it execute in less that 100 milliseconds! In some cases, it’s just not possible possible. Focus on the real problems!
Don’t make me sweat over a query for 4 hours and end up telling me that this SQL runs once a year on January 1st at 2h15 AM and it takes 5 minutes to complete when only 2 users are connected! Pick your battles! I don’t mind helping you out but I don’t like to waste my time!
Don’t try to optimize everything! Work on what counts!
It’s easy to use and it works great when debugging SQL queries! And it’s free! And it’s here! And it allows us to work on exactly the same thing!
Use a pastebin
Use the MySQL client
Some problems are extremely hard to reproduce and difficult to understand. The last thing we need is some PHP code on top of your query! Another thing: please, don’t use crap like PHPMyAdmin or any other clumsy tool when trying to debug or fix things! All you need is the MySQL client ! It works just fine on every platform supported by MySQL, it’s rock solid and, honestly, it’s all you need! If you’re on Windows, you can always use HeidiSQL.
Instead of forcing me to scroll to the right for eons just to see the complete result of your query, an EXPLAIN or a SHOW TABLE STATUS, add \G at the end of your SQL statement to force the client to display results vertically and limit them to 80 chars per line if necessary!
Format your code
SELECT count(*) as x1k1, w.nndt, w.n01_d from P_USR_HTGG w where exists ( SELECT w2.id FROM puw_user w2 left join temp_eoms te ON te.mt = w2.mt where w2.k1 = w.k1 AND w.sales = MAX(w2.sales)) group by w.nndt, w.n01_d having count(*) > 3 order BY x1k1 desc limit 0, 100
Can you quickly see what going on in that query? Not me! And it’s the exact same query that was at the beginning of this post! Which one is easier to read?
Write the reserved words and functions in uppercase! SELECT, FROM, WHERE, INNER JOIN, SUBSTRING, HAVING, GROUP BY, ORDER BY, IN, EXISTS, COUNT, etc. Indent your code and regroup « logical » parts (for instance, sub-selects) of the work done by your query. If I have to read your query with a magnifier to identify what I’m looking for, it doesn’t help!
Don’t be overexcited
We’ve been working on your query for 30 minutes and now you’re satisfied with the speed and results? Comes out you only needed an extra index?
Don’t precipitate yourself to add the index in production! What sometimes takes 3 seconds on test data can take hours on a production database! Besides, when applying changes to a production database, you should always make sure you have a recent backup first!
This advice can sound somewhat useless and stupid but not so long ago, a poor guy I was helping had the very bright idea to add the missing index to his production database right away… only to realize that his production table had 94 million records! The index creation took a few hours… during the peak hours!
Be open to criticism
Lots of times, a poor design or a bad choice of data types is the problem, not the circumvoluted query you’re trying to optimize. Your query is unnecessary complex because of this! Be open to suggestions : we’re just trying to help you!
I’m helping people with their MySQL/SQL problems because I like it and the diverse nature of problems out there helps me stay mentally sharp and up-to-date with MySQL. If I spend 2 hours of my time helping you, a « thank you » is not much to ask in return!
This was originally posted in French here.
There’s no string function in MySQL (and many other databases!) to help you find the number of occurrences of a string within another string. For example, how many times does « abc » appear in « abcbcbabcbacbcabcababcabacb » ?
I was asked this question on IRC a long time ago. Some poor soul was trying to find a particular subsequence in a genomic string (for instance « TAT ») in the following sequence :
Obviously, you can use your favorite programming language and do this kind of search programmatically but is there a way to do it in SQL?
Luckily, the answer is yes! The solution is simple and looks like this:
SELECT FLOOR(( LENGTH(source) - LENGTH(REPLACE(source, target, '')) ) / (LENGTH(target))) as occ
To come back to our example, « source » being the genomic sequence and « target » being « TAT », you’d have :
SELECT FLOOR(( LENGTH('ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT') - LENGTH(REPLACE('ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT', 'TAT', '')) ) / (LENGTH('TAT'))) as occ
Here’s the answer!
Fortunately, in life there are way more many solutions than problems! And sometimes, long SQL queries!
After a major data loss (I haven’t given up on getting back all my data, mostly code repositories and databases!), I had to start all my pet projects from scratch. Luckily, it’s easier second time around as they say! And, lucky me, I store all my personal stuff on the web! So here’s a list of what’s coming up on this blog.
Even though I had a decent working version of the genetic algorithm program to find the best ruzzle grid (original posts in French here, here and here), I wasn’t satisfied with the code. It slowly evolved from a bunch of code snippets into something I could somehow call a genetic algorithm. Problem was that my solution was tailored for this specific problem only! Since I lost all the Smalltalk code, I redid the whole thing from scratch : better design, simpler API, more flexible framework. I can currently solve a TSP problem, the best ruzzle grid search and a diophantine equation.
I also plan to provide examples of the 8 queens problem, the knapsack problem, a quadratic equation problem, a resource-constrained problem and a simple bit-based example with the GA framework. Besides, the are now more selection operators, more crossover operators, more termination detectors (as well as support for sets of termination criteria!), cleaner code and the list goes on! So I’ll soon publish a GA framework for Pharo.
As most of you know, the Rush fan in me had to pick a project name in some way related to my favorite band! So the framework will be called Freewill, for the lyrics in the song :
Each of us
A cell of awareness
Imperfect and incomplete
With uncertain ends
On a fortune hunt that’s far too fleet
A stupid quest I’ll address after the first version of my GA framework is published. It all started with a simple question related to the game of bingo (don’t ask!) : can we estimate the number of bingo cards sold in an event based on how many numbers it takes for each card configuration to have a winner? So it’s just a matter of generating millions of draws and cards à la Monte Carlo and averaging how many numbers it takes for every configuration. Why am I doing that? Just because I’m curious!
There’s been a lot of action on the Pharo side and Glorp. I plan on having a serious look at the latest Glorp/Pharo combo and even participate to the development!
I’ll translate my articles (in French here, here and here) on the SQL sudoku solver in English and test the whole thing on the latest MySQL server. Besides, db4free has upgraded to a new MySQL server version!
I had done a port of NeoCSV to Dolphin right before losing all my code data. Wasn’t hard to port so I’ll redo it as soon as I reinstall Dolphin!
It’s time to reinstall VisualAge, VisualWorks, Squeak, ObjectStudio and Dolphin and see what’s new in each environment! From what I saw, there’s a lot of new and interesting stuff on the web side. Add to that the fact that most social media platforms have had significant changes in their respective APIs recently, so there’s a lot to learn there!
That’s a wrap folks!