Unlike many articles in Hungred Dot Com where i share valuable web development information with my readers, this article is something that required everyone to debate on. Every system will require a logging system (unless it is a crappy system). Regardless is transaction log, result log, database log, error log and etc., there is always a need have a quick, secure and reliable log to store these information for any further investigation. And logging details usually fall into file or database category. We need to look at three important thing to consider a media to log our details. There are performance, security and reliability. Let me elaborate the importance of each point.
Log Performance
Performance, performance performance! This is something we all want to know about. Whether file base log or database log is better? We will be looking at long run where our log gets really huge! Delay and performance problem might arise and which media will be more resistance against such problem. Another good thing to consider between these two media is the extra cost of HTTP request comparing to a read and write and the problem of delay arise from huge size. We won't want to consider the alternative media only after the problem appear don't we?
Log Security
Another thing that every hacker will be interested with is the log file. Valuable information is being stored in our log file and it is necessary to consider how secure can either media gives us. Log file may even carry sensitive details of our customers which was log by our program. Hence, considering the security risk of having plain text and a database is important to prevent security hole in our system environment. Each media will have its own way to further secure its media but which is better?
Log Reliability
Why we bother to have a log file if it is unreliable. This is necessary for a system that is required to keep track of a system that handle important transaction. An unreliable log might miss a log due to various reason such as manual query termination, file lock, database down during logging and etc. It is necessary to have all our log in order to capture important incidents.
Other log criteria
Scalability and flexibility is another thing some of you might want to mention. Migration of server and ease of searching etc. is also points that is important for us to consider as a log that cannot find its detail is consider a useless log.
Database Logging
Performance wise, database might be slower when log amount is small. But once the log amount became a huge amount, database based logging might really be much faster. The problem i can see is that it will fight with other urgent query which has higher priority to be executed and table locking. This is usually resolve by using MySQL Insert Delay operation. Another issue will be latency which cause the delay o of the logging operation. In term of searching database logging surely have the upper hand. Security of the log depends solely on the security of the server and database. There might be risk of SQL injection but usually this should be taken care of by the developers.
In term of reliability, using insert delay will risk the chances of our log getting lost especially if the system is a very active one. In a very busy system every few millisecond time interval there will be additional query that makes the database super busy until the insert delay log are pile up and have to wait till the database is quiet to be active. Hence, any accident such my sql die or forcefully terminated, the log query are gone. Furthermore, additional overhead to delay such insert will degrade MySQL performance by a little.
Log file
Log file is the simplest way to achieve a logging system. Its basically just a few lines of code (depend how paranoid logger are you). While the greatest advantage is its simplicity, the worst problem of file based logging is searching. Most developers who move to file based logging end up not relaying on logs. But usually this can be overcome with some formatting and regular expression. Performance wise, it should be directly opposite a database logging where smaller size will be better and larger it gets worst. Nonetheless, theoretically both should be the same in term of opening and closing of file regardless of size. It should be solve easily by utilizing buffer. In term of security, file based logging usually uses plain text file. Knowing the name of the log file is equivalence to exposing to the public (especially open source apps). But this is usually resolve using file permission setting.
Unlike database logging, file based logging doesn't required a call to the database. Hence, everything is done by the server scripting language you are using and operation is complete regardless of whether the connection is down(as long as the request pass from client to server is complete).
The other more critical part to choose file based logging is the problem of file locking where only one person is allowed to open the log file at one time. Hence, in a active system this might really post a big problem where logging is done intensively. The most expensive part in file based logging should be searching. Hence, regular expression can be really handy (or pain in the ass).
Summary
Some uses both file based logging and database logging with a little help from a external batch program. But it really depends on the need and required of your logging system. But my job here is done; I have started the fire. Now its time to heat it up. 😀
there is no good reason to put a log in the database. split files by time or purpose to reduce size.
Nice post... I love these type of posts, hopefully more people will join 🙂
on-topic: I'll vote for logfile.
Why?
Reason N° 1: It can be secured, IF you don't log in the public_html, htdocs or whatever directory available to the Web. If somebody's gained access to your server, they can check your database also, so that wouldn't be a concern. If you are creative (example: social engineering), they could even miss your logfiles. You can create a logfile named "homework-for-school-01.doc" and that wouldn't get my interest at first view.
Reason N° 2: Plain text is sooooo last week... Use XML! 😀 (Ok, I know XML is also sort of structured plain text, but I always wanted to use that expression xD).
SimpleXML, available in PHP5, makes working with XML a charm. I agree it isn't the fastest solution, but it works. Do you want to list only fatal errors? Use SimpleXML and filter only that type. After all:
$xmlobj = simplexml_load_file('cache/'.$file);
print_r($xmlobj);
Filtering an array isn't so difficult right? TIP: use foreach 😉 It is faster than while(list($key,$value) = each($my_array)). Processing 1.000.000 records can take even 2 seconds more with a while.
Reason N° 3: Speed. Maybe this one could be arguable, but MySQL primarily depends on RAM and on heavy sites a lot of it. Why should you be charging the RMDBS with more queries when you could balance the load, using not only RAM, but also disk? If you have a good OS (*cofcoflinuxcofcof*) and/or a good disk controller (not the on-chip software based cheap Sata disk controller of your personal computer's mainboard) this should REALLY not be a problem. In fact, a *very* good disk controller (US$1500+) can work so fast you'll think that your disk array is RAM-based.
Reason N° 4: Consistency. If your RMDBS goes down or does have a problem (MySQL tends to corrupt large and very intensive databases)... how could you possibly log? A logfile just keeps logging, you won't lose anything.
Anti-reason N° 1: Lack of space. Well, technically this isn't your program's fault, it is the admin's fault because he doesn't check at least once a day if the disks are running out of space. IMHO, it doesn't seem a powerful anti-reason to me.
So... that's 4 good reasons vs 1 weak reason. However, reason N° 3 could not be so because it depends on a lot of factors. Those factors are mainly:
1.- Disk controller used (Yes... VERY important!)
2.- Load on MySQL
3.- Average Load of the machine
4.- Database structure (if you have a lot of blob columns, the database ends up going to disk anyway)
5.- Installed OS. I've seen some very busy *BSD machines going down because of the bad software disk controller it haves. (I've seen this case already 3 times in 6 years, in two of the 3 cases, the machine was used primarilly for database purposes, the other one had Samba. According to the OS, the disk had a physical damage, however, after formatting the machine using the same disk it could easily serve its purpose 2 more years with no problems. Next year it'll be 3 years). Similar problems happens in Windows: some weeks ago, I had to recover data from a Windows 2003 Server Enterprise Premium Gold-Platinum-and-Bronze Edition because Windows didn't recognize the NTFS partition after an electrical problem shortcut in the datacenter. The disks were unharmed, but the NTFS partition was damaged.
6.- Other factors I may omit because I'm very tired.
Oh, and another thing: in order to not have huge log files, check (before you log) the size of the file. If it is over xxxx MB, simply create a new one. That way you won't overcharge your log file. (And possibly PHP's memory limit)
Greetings 😉
@unreal4u :
1. i love this. "homework-for-school-01.doc" lol!
2. Yup XML is more structure. Some use JSON format too 🙂
3. Quality of disk is really important in every server.
4. Something bad about logging in RMDBS but how about server who have master and slave where slave will take over once master DB went down?
All your points are very interesting factors of a file based log. But interestingly i would like to know how would you approach the issue on file lock problem? banker algorithm to avoid dead lock?
@moo : hmmm..split the file to reduce size will largely depends on the individual implementation. But i don't think it will reduce size that much as 10 files logging compare to 1 big file logging, they are approximately the same size ( unless you are going to delete them per period basis). Nonetheless, this is a good suggestion for monthly/weekly/daily logging that developers will appreciate.
Clay
Deadlock... Right. Like I said, I was just too tired yesterday.
Just thinking of some quick solution, this is what came out:
function logMe($LogThisObjects) {
clearstatcache();
$flag = TRUE; $i = 2;
if (!$fp = fopen('homework1.doc','a') OR filesize('homework1.doc') > 10000)
while ($flag === TRUE) {
if ($fp = fopen('homework'.$i.'.doc','a') AND filesize('homework'.$i.'.doc') < 10000) $flag = FALSE; $i++; } WriteTheLog($fp,$LogThisObjects); fclose($fp); }
Basic idea is first clearing cache so that filesize returns the actual current size and not an old one from PHP's internal cache. After that, if you can't open an file (deadlock or whatsoever reason) or the filesize is above 10000 bytes (Re-reading before posting: I think that's actually 10KB and not 10MB xD), we will enter a loop trying to open a log file in a numeric sequence.
When we finally have managed to open a log file, we write the log and then do a proper close of the open logfile. (Important!)
The only problem with this quick and dirty solution is file permission: if apache can't log to a certain directory, it will endlessly try to open a log file, don't say you haven't been warned xD Maybe you can put a limit to $i: if it's > 99, send an email to an administrator to warn him.
Indeed, you can further extend the log utility by checking if you can open a file and logging all failed attempts to the new log. This could be important because if a deadlock happens, that will mean your server is a *very* busy one, and it would be nice to at least log that.
However, don't log the filesize limit: you will end filling up the new logfile where 50% of the log will say: "Sorry, I couldn't open the previous logfile because it has surpassed the filesize limit". Imagine the case where $i == 4 because of that limit. In that case, 75% of the logfile will say the same. (For every log, there will be 3 previous attempts to open a logfile). What I've seen in some logs is that when you get to a certain filesize, the new logfile will say in it's first line: "[NOTICE] (datetime) - Opened a new log. Previous one (homework2.doc) exceeded max filesize: 10214 bytes. Welcome my master!", but in the end, it all depends on what events you will log.
Greetings !!
@unreal4u :
great explanation! The problem i see is the delay that will bring about due to the loop for checking file available. This can be solve with register_shutdown_function in PHP where it will register the function named by function to be executed when script processing is complete. Hence, user will be able to view the content while the logging is done after the page has been loaded 🙂
However, using register_shutdown_function may also mean that the function may not execute if the page was not loaded fully and was terminate in the process. Nonetheless, we should not worry about this since strictly speaking a page should be executed fully by PHP ( or else we won't be seeing anything on the page) although the images and styling hasn't been applied to the page. However, if a loop is executed long enough and the user exit the page before the instruction has the chance to use the resource (file), the log fail.
Balance between the user and logging system and see which is more important for your web system 🙂
Thanks unreal4u again:)
I'm using the special __destruct function on my db_class... basicly it does exactly the same, it is invoked when the class destroys itself...
Something like this:
define('LOG_TO_XML',true);
global $dbErrors = array();
class bla {
function __destruct() {
if (LOG_TO_XML === true AND !empty($dbErrors)) $this->dbLog($dbErrors);
}
}
where LOG_TO_XML is a constant, and $dbErrors is a global array.
----------------------------------
NOTE::::: I just discovered (live testing xD) that with this method, if you have any unexpected fatal error, it also executes the function, however, if you don't wan't an "failed to open stream: permission denied" error, you MUST give an absolute path to the logfile:
I have this code:
$dbLink = new DB();
//(some work done with this class, in it i'm executing an invalid query to generate a log)
trigger_error("This is an artificial fatal error before unsetting the $dbLink class",E_USER_ERROR);
die(); // just in case the above error doesn't stop the script from executing
exit(2); // just in case the above error doesn't stop the script from executing
This returns:
Warning: SimpleXMLElement::asXML(./db-log.xml) [simplexmlelement.asxml]: failed to open stream: Permission denied in /home/unreal4u/blabla/db.class.php on line XXX
and db-log.xml haves 777 permission. ( :retard )
However, when filename is an absolute path, it logs everything, that means: including the error when MySQL haves too much connections (I think that would be a fatal error), which would be impossible to log on a Database based log. If you can't connect to the database, how could you even log into it?
Well, it was a nice research, thanks!
Greetings !!
@unreal4u: Thanks for sharing again! Great information 🙂