Wed, 11 Jun 2008
Encoding ampersands with Python
I need to replace ampersands in a text file with the HTML entity '&'. I could simply use Python's string replace
method, however, this will mess up my text if some of the ampersands have already been turned into HTML entities. The same is true
if I use regular expressions to match a single '&'. What I really need to do is replace an ampersand providing it is
not followed by 'amp;'.
Using negative lookahead assertion with our regular
expression is the answer. Negative lookahead is used when you want to match something not followed by something else. It starts
with (?! and finishes at the ).
Our expression now becomes: &(?!amp;) and means the text it contains, amp;, must not follow the
expression that preceeds it.
In this example I also added an expression to not match any HTML entity numbers as well.
>>> import re
>>> s = "<Title>Eugene's Software Emporium & Arcade</Title>"
>>> pattern = re.compile('&(?!#)(?!amp;)')
>>> if pattern.search(s):
... iterator = pattern.finditer(s)
... for match in iterator:
... print match.span()
...
(38, 39)
>>> s[match.start():match.end()]
'&'
>>>
posted: 22:13 | 0 comments | tags: python, programming, html
Mon, 26 May 2008
Python - Web application frameworks
The PythonInfo Wiki defines a a web framework as,
a collection of packages or modules which allow developers
to write Web applications or services without having to handle such low-level
details as protocols, sockets or process/thread management.
As a testiment to Python's power and simplicity it would seem that many developers have created
their own frameworks rather than use a solution already in existence. As a result one will find solutions
in various stages of development and feature implementation.
I have always tried to subscribe to the basic principle of using the right tool for the job.
With that in mind I have embarked on an exploratory journey to investigate some of Python's existing
Web frameworks with hopes of finding one that will work for a couple of big projects I have in the
works. My requirements are fairly simple; I do not want to learn a behemoth of an API that will take
months to figure out, yet I do not want something so simplistic that it will expect me to handle to many
low-level details. Finally, until I can bring my own server back online, the chosen framework needs to work
with my current hosting provider, DreamHost.
The five high-level frameworks I am looking at include:
- Django is a high-level Python Web framework that encourages
rapid development and clean, pragmatic design. Because Django was developed in a fast-paced newsroom
environment, it was designed to make common Web-development tasks fast and easy.
- TurboGears builds on other open source projects. In TurboGears,
CherryPy controllers sit at the hub of your project. This is the
biggest area for integration. Providing tools that allow the controllers to more easily work
with SQLObject databases, answer asynchronous calls from
MochiKit and render out completed
Kid templates is where the big win will come.
- Pylons combines the very best ideas from the worlds of Ruby,
Python and Perl, providing a structured but extremely flexible Python web framework. It's also one
of the first projects to leverage the emerging WSGI standard, which allows extensive re-use and
flexibility - but only if you need it. Out of the box, Pylons aims to make web development fast,
flexible and easy.
- Webware is a suite of Python packages and tools
for developing object-oriented, web-based applications. The suite uses well known design patterns
and includes a fast Application Server, Servlets, Python Server Pages (PSP), Object-Relational Mapping,
Task Scheduling, Session Management, and many other features. Webware is very modular and easily extended.
- Zope is an open source application server for building content
management systems, intranets, portals, and custom applications. The Zope community consists of hundreds
of companies and thousands of developers all over the world, working on building the platform and Zope
applications. Zope is written in Python.
posted: 13:48 | 0 comments | tags: python, programming, frameworks
Tue, 06 Nov 2007
Python - Recursive Directory Crawl Using Generators
I was looking for an os.walk example to crawl through a file system and found
the locate function below on ActiveState's Python Cookbook site.
I incorporated it into a simple routine that dumps the output to an XML file that can then be
transformed using XSLT to sort and tally the results.
#!/usr/bin/env python
import os
import fnmatch
import time
from xml.dom import minidom
def locate(pattern, root=os.curdir):
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
def main():
doc = minidom.Document()
files = doc.createElement("files")
doc.appendChild(files)
comment = doc.createComment("Size attribute is reported in bytes.")
files.appendChild(comment)
for i, file in enumerate(locate("*.*", "\\\\SERVER\\Share")):
try:
item = doc.createElement("filename")
item.setAttribute("id", "%s" % (i))
item.setAttribute("path", file)
item.setAttribute("ext", os.path.splitext(file)[1].lower())
item.setAttribute("size", "%s" % os.stat(file).st_size)
item.setAttribute("last_modified", time.ctime(os.stat(file).st_mtime))
files.appendChild(item)
except OSError, e:
print "%s => %s" % (file, e.strerror)
fp = open('myfiles.xml', 'w')
doc.writexml(fp, "", " ", "\n", "iso-8859-1")
fp.close()
return
if __name__ == "__main__":
main()
The locate function takes two parameters; the first is a file pattern
to match and the second is the directory to start the crawl from.
posted: 23:25 | 0 comments | tags: programming, python, xml
Sat, 24 Mar 2007
Python - SOAPpy and HTTP Authentication
The other week I was wanting to use a SOAP web service that was protected by http
basic authentication. I could not find a way to do the authentication with SOAPpy.
I looked everywhere for an example before I stumbled upon a version of the
below code in an archived newsgroup post.
from SOAPpy import Config, HTTPTransport, SOAPAddress, WSDL
class myHTTPTransport(HTTPTransport):
username = None
passwd = None
@classmethod
def setAuthentication(cls,u,p):
cls.username = u
cls.passwd = p
def call(self, addr, data, namespace, soapaction=None, encoding=None,
http_proxy=None, config=Config):
if not isinstance(addr, SOAPAddress):
addr=SOAPAddress(addr, config)
if self.username != None:
addr.user = self.username+":"+self.passwd
return HTTPTransport.call(self, addr, data, namespace, soapaction,
encoding, http_proxy, config)
if __name__ == '__main__':
wsdlFile = 'http://localhost/soap/wsdl/'
myHTTPTransport.setAuthentication('gollum', 'myprecious')
server = WSDL.Proxy(wsdlFile, transport=myHTTPTransport)
print server.ApiVersion()
It works because you can specify your own transport to the WSDL.Proxy using
Python's **kw feature. The original author subclassed the default transport
in Client.HTTPTransport and added a static class method to supply the basic
authentication.
posted: 23:09 | 0 comments | tags: programming, python, soap, soappy
Thu, 15 Mar 2007
IBM's developerWorks - JavaScript and Ajax Tutorial Series
One of my often visited bookmarks is IBM's developerWorks
site. The site is virtual library of technical information and tutorials.
In September 2006, Brett McLaughlin, Author and Editor with O'Reilly Media Inc, concluded his six part
series on JavaScript, Ajax and the Document Object Model (DOM). Part one of the series starts with a
quick-paced introduction to what Ajax is and how it works, follows with the use of the XMLHttpRequest object
for Web requests and understanding the HTTP status codes it returns. The remaining parts of the series focus
on how to mix JavaScript and the DOM to create interactive Ajax applications.
It is a great series that I still reference now and then when in the midst of a project.
posted: 21:23 | 0 comments | tags: ajax, javascript, programming
Tue, 14 Mar 2006
Formating Numbers in a Text Box
I recently needed to control the format that was being entered into a text box on a form. I needed the user to enter a currency amount, however, I did not want dollar signs, commas or decimal places to be entered. I simply wanted the raw number.
This is one way to do it.
<html>
<head>
<title>Number Format</title>
<script language="JavaScript">
function formatNumber(e, field) {
key=e.keyCode;
// Allow use of return, left and right cursor keys
if ((key==13)||(key==37)||(key==39)) return;
temp=field.value.replace(/[^0-9]|^0+/g,"");
field.value=temp;
}
</script>
</head>
<body>
<form name="form1" method="post" action="">
Enter Salary:
<input type="text" name="amount" id="amount"
onKeyUp="formatNumber(event, this)">
</form>
</body>
</html>
The script is called on the "onKeyUp" event of the input field and strips out any character that is not a number.
posted: 00:38 | 0 comments | tags: javascript, programming
Mon, 13 Mar 2006
URL Rewriting in JavaScript
At work I am constantly flipping between a production and a development web server. The live site would have a URL like, "www.mysite.com" and the development site one like, "dev.mysite.com". I was getting tired of removing the "www" from the address and replacing it with "dev" and back again for all the different URLs. This script is a result of that. By simply clicking a saved bookmark on my browser toolbar I can automatically switch back and forth between the production and development servers with ease.
Right-click on the following link, DevSwitch, and select either "Bookmark This Link..." in FireFox or "Add to Favorites..." in Internet Explorer.
The Bookmarklet code is outlined below.
javascript:(function(){
x=location.href;
y=(x.indexOf('www')!=-1)?x.replace('www','dev'):x.replace('dev','www');
location.href=y;
})()
posted: 23:58 | 0 comments | tags: bookmarklet, javascript, programming
Sun, 16 Oct 2005
XSLT to convert from a 24-hour timestamp
I had an XML file in which the timestamp for the file creation was in a 24-hour GMT format. I needed to convert the display date and time to be in the format of, "XXX as of 12:00 a.m., on Oct 17, 2005"
I found some of the code for converting the month name on the internet and the rest I hacked together to handle the 24-hour and GMT conversion.
Longer lines are split with a "\" character which should be removed before you use the code.
<!-- Start: XSLT date and time formatting template -->
<!-- e.g. 2005-07-18T17:59:30.187-08:00 -->
<xsl:template name="format-date-time">
<xsl:param name="date" />
<xsl:variable name="year" select="substring($date, 1, 4)" />
<xsl:variable name="month" select="substring($date, 6, 2)" />
<xsl:variable name="day" select="substring($date, 9, 2)" />
<xsl:variable name="day2" select="translate($day, '0', '')" />
<xsl:variable name="monthName" \
select="substring('JanFebMarAprMayJunJulAugSebOctNovDec', \
substring-before(substring-after($date,'-'),'-')*3-2,3)" />
<xsl:variable name="hour24" select="substring($date, 12, 2) - 8" />
<xsl:variable name="minute" select="substring($date, 15, 2)" />
<xsl:variable name="second" select="substring($date, 18, 2)" />
<xsl:variable name="hour12">
<xsl:choose>
<xsl:when test="$hour24 < 0">
<xsl:value-of select="12 + $hour24" />
</xsl:when>
<xsl:when test="$hour24 = 0">
<xsl:value-of select="12" />
</xsl:when>
<xsl:when test="$hour24 = 12">
<xsl:value-of select="$hour24" />
</xsl:when>
<xsl:when test="$hour24 > 12">
<xsl:value-of select="$hour24 - 12" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$hour24" />
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="meridiem">
<xsl:choose>
<xsl:when test="$hour24 < 0">p.m.</xsl:when>
<xsl:when test="$hour24 >= 12">p.m.</xsl:when>
<xsl:otherwise>a.m.</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:value-of select="concat($hour12, ':', $minute, ':', $second, ' ', \
$meridiem, ' on ', $monthName, ' ', $day2, ', ', $year)" />
</xsl:template>
<!-- End: XSLT date and time formatting template -->
posted: 00:24 | 0 comments | tags: programming, xml, xslt
Fri, 06 May 2005
C# XML data export from MySQL
I originally wrote this in C# at work under Visual Studio to connect to a Microsoft SQL Server on my workstation. One of the tables contained 117 records of archived articles for the web site. We needed to extract the data and output it into an XML file. Once I got it working I decided to see how well it would compile and run under Mono, an open source development platform based on the .NET framework, and have it connect instead to a MySQL database.
In order to have my .NET application connect to MySQL instead of the Microsoft SQL Server I needed to download and install the MySQL Connector/Net driver. The MySQL Connector/Net is a fully-managed ADO.NET driver written in 100% pure C#. It was then simply a case of adding the new MySQL Namespace to the code and substituting the different Microsoft specific classes for the MySQL ones.
This is the resulting code,
// Filename: ExportXMLData.cs
using System;
using System.Data;
using MySql.Data;
using MySql.Data.MySqlClient;
namespace ExportXMLData
{
class ExportXML
{
static void Main(string[] args)
{
ExportXML exportXML = new ExportXML();
exportXML.Run();
}
private void Run()
{
// Change the variables to reflect values needed for
// your computer and database properties.
string Database = "";
string Server = "localhost";
string User = "";
string Pass = "";
string TableName = "";
string XMLRootNodeName = "Root";
string OutputFileName = "output.xml";
string conn =
"Database=" + Database + ";" +
"Server=" + Server + ";" +
"Uid=" + User + ";" +
"Pwd=" + Pass;
MySqlConnection connection = new MySqlConnection(conn);
MySqlDataAdapter adapter = new MySqlDataAdapter();
adapter.TableMappings.Add("Table", TableName);
connection.Open();
MySqlCommand query = new MySqlCommand("SELECT * FROM "
+ TableName, connection);
query.CommandType = CommandType.Text;
adapter.SelectCommand = query;
DataSet ds = new DataSet(XMLRootNodeName);
adapter.Fill(ds);
connection.Close();
ds.WriteXml(OutputFileName, XmlWriteMode.WriteSchema);
}
}
}
You will need to compile the code as follows so it finds the necessary libraries,
mcs ExportXMLData.cs -r System.Data -r MySql.Data
You can then run the C# program with,
mono ExportXMLData.exe
The result should be an XML file in your working directory containing your database table structure and data.
In a later post I will show you how you can use XSL Transformations (XSLT) to apply a stylesheet and change the display formatting of the XML file itself.
posted: 23:37 | 0 comments | tags: mono, mysql, programming, xml