I need to replace ampersands in a text file with the HTML entity '&'. I could simply use Python's s
tring replace method, however, this will mess up my text if some of the ampersands have already been turned into HTML entities. The same is true if I use regular expressions to match a single '&'. What I really need to do is replace an ampersand providing it is not followed by 'amp;'.
Using negative lookahead assertion with our regular expression is the answer. Negative lookahead is used when you want to match something not followed by something else. It starts with (?! and finishes at the ).
Our expression now becomes: &(?!amp;) and means the text it contains, amp;, must not follow the
expression that preceeds it.
In this example I also added an expression to not match any HTML entity numbers as well.
>>> import re
>>> s = "<Title>Eugene's Software Emporium & Arcade</Title>"
>>> pattern = re.compile('&(?!#)(?!amp;)')
>>> if pattern.search(s):
... iterator = pattern.finditer(s)
... for match in iterator:
... print match.span()
...
(38, 39)
>>> s[match.start():match.end()]
'&'
>>>
The PythonInfo Wiki defines a a web framework as,
a collection of packages or modules which allow developers to write Web applications or services without having to handle such low-level details as protocols, sockets or process/thread management.
As a testiment to Python's power and simplicity it would seem that many developers have created
their own frameworks rather than use a solution already in existence. As a result one will find solutions
in various stages of development and feature implementation.
I have always tried to subscribe to the basic principle of using the right tool for the job.
With that in mind I have embarked on an exploratory journey to investigate some of Python's existing
Web frameworks with hopes of finding one that will work for a couple of big projects I have in the
works. My requirements are fairly simple; I do not want to learn a behemoth of an API that will take
months to figure out, yet I do not want something so simplistic that it will expect me to handle to many
low-level details. Finally, until I can bring my own server back online, the chosen framework needs to work with my current hosting provider, DreamHost.
The five high-level frameworks I am looking at include:
- Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Because Django was developed in a fast-paced newsroom environment, it was designed to make common Web-development tasks fast and easy.
- TurboGears builds on other open source projects. In TurboGears,
CherryPy controllers sit at the hub of your project. This is the
biggest area for integration. Providing tools that allow the controllers to more easily work with SQLObject databases, answer asynchronous calls from
MochiKit and render out completed Kid templates is where the big win will come.
- Pylons combines the very best ideas from the worlds of Ruby, Python and Perl, providing a structured but extremely flexible Python web framework. It's also one of the first projects to leverage the emerging WSGI standard, which allows extensive re-use and flexibility - but only if you need it. Out of the box, Pylons aims to make web development fast, flexible and easy.
- Webware is a suite of Python packages and tools
for developing object-oriented, web-based applications. The suite uses well known design patterns
and includes a fast Application Server, Servlets, Python Server Pages (PSP), Object-Relational Mapping,
Task Scheduling, Session Management, and many other features. Webware is very modular and easily extended.
- Zope is an open source application server for building content management systems, intranets, portals, and custom applications. The Zope community consists of hundreds of companies and thousands of developers all over the world, working on building the platform and Zope applications. Zope is written in Python.
While at LinuxFest Northwest 2008 in Bellingham, WA this past weekend I attended a session on natural language processing in Python presented by Sean Boisen. His slide presentation was done, not with PowerPoint, but with HTML Slidy, a browser based XHTML presentation framework. The best thing about HTML Slidy is that it is cross-browser compatible using simple XHTML, JavaScript and CSS, operates like PointPoint and best of all, is accessible.
The next time I have a presentation to make, HTML Slidy is definitely something I am going to try.
This is one of the sites I did for my company, Effigy Interactive, back in 2002. I broke every usability and accessibility rule creating it, however, most self-promotional sites usually do.
The Flash portion of the site is meant to be atmospheric and encourage the viewer to explore and discover. I never got around to finishing the html portion as I was to busy with customer sites and other paying projects.
Last month I picked myself up an 8GB Insignia Pilot video MP3 player from Best Buy for Christmas.
I looked at a number of MP3 players before deciding on the Insignia Pilot. I had two main requirements: it had to work with Linux and it had to support Ogg Vorbis audio. The Insignia Pilot does both of these and more. It supports an impressive list of formats: MP3, WMA, WMA Lossless, WMA DRM, WMA Pro, OGG, WAV, Audible, MPEG4 (30 fps), WMV (30 fps) and JPEG.
The Insignia Pilot supports 320x240 MPEG4 video at 30 fps. Very nice. The installation CD includes a Windows based application that will convert video clips and images into a format compatible with the Pilot. This is great if you are running Windows, however, not so helpful if you are running Linux.
Over the next few days I learned more about video codecs and encoding than I had planned to. The Anything But iPod site was a great source of information and thanks in part to this initial thread on supported video formats
, MPlayer and hours of reading the documentation for MEncoder
and Xvid followed by trial and error I have a workable solution.
First copy the DVD (which you own) to your hard drive using the following,
mplayer dvd://1 -dumpstream -dumpfile dump.vob
Next is the encoding process. The Pilot's screen is 320x240, however, if the DVD is in widescreen format and you want to preserve the ratio you need to scale the video to 320x176 instead.
mencoder dump.vob -aid 128 -oac mp3lame -lameopts cbr:br=96 -srate 44100 -af resample=44100:0:0 -af volume=20 \
-ovc lavc -lavcopts vcodec=mpeg4:mbd=1:vbitrate=384 -sws 2 -vf scale=320:176,harddup \
-noskip -skiplimit 1 -ffourcc XVID -ofps 29.97 -o output.avi
With the above settings I can encode the Matrix to 481.1 MB. For me, these settings provide a reasonable trade off of size over quality. If one wants a slightly higher quality you can change the audio and video bitrates to cbr:br=128 and vbitrate=512 respectively.
I was looking for an os.walk example to crawl through a file system and found the locate function below on ActiveState's Python Cookbook site.
I incorporated it into a simple routine that dumps the output to an XML file that can then be transformed using XSLT to sort and tally the results.
#!/usr/bin/env python
import os
import fnmatch
import time
from xml.dom import minidom
def locate(pattern, root=os.curdir):
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
def main():
doc = minidom.Document()
files = doc.createElement("files")
doc.appendChild(files)
comment = doc.createComment("Size attribute is reported in bytes.")
files.appendChild(comment)
for i, file in enumerate(locate("*.*", "\\\\SERVER\\Share")):
try:
item = doc.createElement("filename")
item.setAttribute("id", "%s" % (i))
item.setAttribute("path", file)
item.setAttribute("ext", os.path.splitext(file)[1].lower())
item.setAttribute("size", "%s" % os.stat(file).st_size)
item.setAttribute("last_modified", time.ctime(os.stat(file).st_mtime))
files.appendChild(item)
except OSError, e:
print "%s => %s" % (file, e.strerror)
fp = open('myfiles.xml', 'w')
doc.writexml(fp, "", " ", "\n", "utf-8")
fp.close()
return
if __name__ == "__main__":
main()
The locate function takes two parameters; the first is a file pattern
to match and the second is the directory to start the crawl from.
Steve and I once again made the yearly trek to Bellingham, WA for
Linuxfest Northwest. This year's Fest spanned two days rather than the usual one and, in my opinion, it continues to get better each time.
The presentations I attended were varied:
My favorite sessions were by Linden Lab (Second Life), Google (Up and Running), Red Hat (OLPC) and MySQL (How Sites Scale Out).
Although I do not consider myself much of a virtual socialite, the concept of what Linden Lab is doing with Second Life appeals to a part of me. It is a social medium and, like any medium, it allows the creative an outlet to express oneself and hopefully, in turn, reach a receptive audience.
Listening to what Andy Carrel had to say about Google and the daily issues they face with the vast amounts of hardware and data is mind boggling. One of the things he said that set me thinking was that of programmer effectiveness. Google engineers create services that run on building-sized computing platforms. Their computer is made up of thousands of CPUs, lots of DRAM, networking devices, and disk drives. I consider myself fortunate if I have a second server to help load balance a service.
Jesse Keating with Red Hat gave a great presentation on the One Laptop Per Child (OLPC) initiative. The little green laptop is a marvel of engineering and a testiment to what they are trying to accomplish considering who their target audience is - the third world, underdeveloped countries, kids. Each child gets their own laptop to take home and bring back to school. The short and long term implications of what this could mean for their development as individuals and a nation is awe inspiring.
Next year's Linuxfest is already on my calendar. I can't wait.
Flickr is an online photo management and sharing application.
The service offers the ability to make photos available to others, both public and private, and collaborative
ways of organizing images by allowing others to categorize photos by adding comments and tags.
I have uploaded a few images into my Flickr account. Some are digital and others are scans from my 35mm, Minolta X-700 that I still use as my primary camera.
The other week I was wanting to use a SOAP web service that was protected by http basic authentication. I could not find a way to do the authentication with SOAPpy. I looked everywhere for an example before I stumbled upon a version of the below code in an archived newsgroup post.
from SOAPpy import Config, HTTPTransport, SOAPAddress, WSDL
class myHTTPTransport(HTTPTransport):
username = None
passwd = None
@classmethod
def setAuthentication(cls,u,p):
cls.username = u
cls.passwd = p
def call(self, addr, data, namespace, soapaction=None, encoding=None,
http_proxy=None, config=Config):
if not isinstance(addr, SOAPAddress):
addr=SOAPAddress(addr, config)
if self.username != None:
addr.user = self.username+":"+self.passwd
return HTTPTransport.call(self, addr, data, namespace, soapaction,
encoding, http_proxy, config)
if __name__ == '__main__':
wsdlFile = 'http://localhost/soap/wsdl/'
myHTTPTransport.setAuthentication('USERNAME', 'PASSWORD')
server = WSDL.Proxy(wsdlFile, transport=myHTTPTransport)
print server.ApiVersion()
It works because you can specify your own transport to the WSDL.Proxy using
Python's **kw feature. The original author subclassed the default transport
in Client.HTTPTransport and added a static class method to supply the basic
authentication.
One of my often visited bookmarks is IBM's developerWorks site. The site is virtual library of technical information and tutorials.
In September 2006, Brett McLaughlin, Author and Editor with O'Reilly Media Inc, concluded his six part series on JavaScript, Ajax and the Document Object Model (DOM). Part one of the series starts with a quick-paced introduction to what Ajax is and how it works, follows with the use of the XMLHttpRequest object for Web requests and understanding the HTTP status codes it returns. The remaining parts of the series focus on how to mix JavaScript and the DOM to create interactive Ajax applications.
It is a great series that I still reference now and then when in the midst of a project.