5
Nov

Terrain Collision

   Posted by: Foxtox   in Uncategorized, terrain

Added terrain collision last week, but as I’m not using heightfields I had to use
something slightly different from the norm.

First I tried a pure triangle mesh based approach, I knew this would be slow as
hell and use a ton of memory but I wanted to have a working baseline to compare
against.

Initially for physics I used Havok, but as I don’t have $100,000 laying around to waste on a
physics engine I only had access to the binary version–and Havok only supplies
libs for VS2008, not VS2010 which I what I am using, although I was able to get
the 2008 libs working. After getting the basic triangle mesh collision up and running
in Havok I decided I didn’t much care for not having access to the source so I switched
to Bullet.

I’d used Bullet before so it was easy to get it switched over, and once I had the triangle
mesh collision set up I gave it a trial run.

The triangle mesh collision used approximatly one gig of memory, although generation speed
for the btBvhTriangleMeshShape was fairly quick. I created a task
to generate collision and spread the work across the cores which made generation faster.

I added spheres and boxes that I could drop onto the terrain to test the accuracy and
performance of the collision detection.

A gig of memory for terrain collision was obviously out of the question so I began testing
convex hulls.

Bullet has a btConvexHullShape which takes an array of
vertices in floating point format. This worked and reduced memory usage by more than
half. Still wasn’t good enough though.

I wrote my own convex hull shape which I called btCompressedConvexHullShape,
as the name implies it uses compressed verts(about 1/4th the memory per vert).

I also started using Bullets utility class btShapeHull. This class takes in
an array of vertices and produces a convex tri mesh with a greatly reduced number of
vertices. Feed it 2000 verts and get back a 14 vert convex mesh, that type of thing.

I feed the results of the btShapeHull back into btCompressedConvexHullShape or
a btConvexTriangleMeshShape(favoring btCompressedConvexHullShape as they
both seem to produce the same results and it uses less memory).

Memory usage for the physics simulation was greatly reduced at this point, down to
about 100 megs. There are still a few optimizations I’d like to do, mostly to reduce the allocations
taking place in the btShapeHull step, but overall the performance and
memory usage is fairly good at this point.

I’ve also got the physics simulation running as it’s own task, with adding and
removing of objects done asynchronously. This helps because as you move through
the world a great many terrain chunks(each as a convex hull) are being added and removed.

Collision seems to be fairly accurate as long as I don’t have it too far off
from the visual representation.

My gravity is currently just set using Bullets built in system, which is directional.
This means if I navigate to the side of the planet I can start dropping objects
and watch them bounce along through mountains and valleys for miles as they
travel along the edge of the planet.

Need to add a character control system soon.

13
Aug

Compile time string hashing

   Posted by: Foxtox   in Uncategorized

Here is a working version of compile time string hashing in C++,
pretty tedious to make this so perhaps I’ll save someone some time.

You cannot currently do this with templates, at least not without
resorting to specifying one character at a time or using multi-byte for
4 at a time, so it massages the compiler into optimizing the string hash
out of existence.

Another way of doing it is with pure macros.

It was written with VS2010, so results may not be the same on other compilers
(though I’d expect GCC and Intel should be able to pull it off). In debug(without
optimizations on), it will not be optimized out so you may want to force it to use
the const char* path in debug.

Also the __forceinline calls are necessary, without them once the compiler starts
seeing more than one string literal of the same length it switches to calling a function.

Tested it with string literals up to 64 characters in length, and the resulting
assembly is just a single assignment.

Usage is like this:

size_t id = ptl::string_id(“hello what is my hash?”);

You can also pass a const char* or a std::string, but this will result in a run
time hash(using same hash function so results should be the same).

Additionally you can use it with std::unordered_map/set like so

std::unordered_map < ptl::string_id, int, ptl::string_id::hash > PreHashed;

And a lookup like this:

auto it = PreHashed.find(“hello”);

Won’t actually hash anything, it will just use the compile-time calculated hash value.

The hash function used is called One-at-a-Time, see here for description.

Of course– it probably isn’t a good idea to actually use something like this without throughly testing
to verify that the compiler generates the correct code in all situations, which is pretty difficult
to do :(

1
Aug

progress

   Posted by: Foxtox   in Uncategorized, terrain

I’ve been porting my old terrain system to my new code base, and
from OpenGL to D3D11. As of today it is rendering at more
or less visual parity with the OpenGL version. It still lacks any form
of culling and has no shadows, but overall a major upgrade from a few
days ago when it looked like a couple of broken triangles.

While porting it, one system I ditched was my old threaded job
system–which was used heavily during terrain generation.

Instead I’ve decided to use Intel’s thread building blocks. Initially I
was using Microsoft’s concurrency runtime, but a few problems compared to
TBB drove me away. One major annoyance is that MCRT won’t compile
in C++/cli(while TBB will); amusing considering C++/cli is
a MS product.

It took abit but eventually I had TBB performing on par with my
old job system, and TBB should be a win in the end since it is
much more flexible.

Files

One useful feature I added recently was a file monitoring system–
mostly intended to make development easier–it notes any files that
my program opens, and if and when any of them change while the program
is running it sends out a message so that any interested system can
reload the file.

So far just using this with lua files and shader files, but it is very nice
to be able to modify a shader, hit save, and have it immediately reflected
in the game.

To make this work generically for all D3D resources I had to add an
extra level of indirection, I did wrap it up in a way that just changing
a typedef will allow me to toggle between the two– though on a modern
PC it probably makes little to no difference.

22
Jul

Tool windows

   Posted by: Foxtox   in Uncategorized

I have been trying to come to come up with a good way to handle editors/tools and
I think I’ve finally found a method that satisfies me.

Some projects use external tools, while others build them directly into the application.

The external tool method bothers me for a few reasons

1) potential problems with tool not being in sync with engine/game
2) some tools will need access to engine code
3) end up duplicating functionality already found in the engine/game
4) won’t be able to really see what it looks like in the game(not easily)

The method of directly embedding(preferably as DLL) it into the game is somewhat
more appealing to me, but this method has one big issue at least…

1) Since your game is likely using DirectX/OpenGL, there really aren’t any great UI
solutions readily available, so you will likely use whatever method your game uses
for in-game UI (scaleform for in game editor? Ugh.. no thanks)
2) Also if the game is slow to load and has no methods to reduce what is loaded,
lots of time can be wasted

Then I heard about methods such as this D3D10 and WPF
and was interested because I thought I might be able to use this to somehow display
WPF with my D3D11 app.
Unfortunately it seems to work the other way, you can display D3D in the WPF app,
but not vice versa(at least not that I’ve been able to discover).
Dropping the D3D into WPF means you are at the mercy of WPF for updating the window,
and if you have multiple tools each one must have the D3D interop code added. BLEH.

My codebase is primary native C++, but supports dynamic loading of .Net assemblies,
and I’ve written a fair amount of interop code to allow these assemblies to communicate
with the rest of the game.

I began to wonder if it was possible to make a WPF assembly, dynamically load it like any
other .net assembly and have it pop up a separate window. Perhaps this isn’t surprising
to people more familiar with managed code/WPF, but for me it as somewhat surprising that
this did indeed work– although I had to overcome a few odd problems.

Steps:
1) Create the WPF project
2) Go to project properies and change output type to class library
3) the project defaults to application, to change this click on app.xaml, go to properties,
and change “Build Action” to “Page”

(may be other/better ways of handling some of this next part)
4) I way I do it is; create an instance of the window(called MainWindow by default )
within my startup class, but I have to do it a special way to avoid .net tossing
exceptions such as this lovely one

“The calling thread must be STA, because many UI components require this.”

_Thread = new Thread(LaunchWindow);
_Thread.SetApartmentState(ApartmentState.STA); //Many WPF UI elements need to be created inside STA
_Thread.Start();

This calls the “LaunchWindow” function on a STA approved thread, the function
just does this currently

_Window = new MainWindow();
_Window.ShowDialog();

This causes a 2nd window to popup, it acts independently of the apps
primary window.

ToolWindow

I’ve heard of other methods where the game acts as a server and each tool a
client, and communication is done via networking. Sounds good, but I don’t feel
like spending time to implement that( and I already have a generic event system which
allows for communication between unrelated modules in lua, C++, or .net), also that method
does prevent directly access(which could be seen as good.. or bad…).

2
May

Functional

   Posted by: Foxtox   in Uncategorized

Sometimes I find that I need to use delegates in C++, in the past I almost always used the fast delegate library available on codeproject, but over the last year or so I’ve been using std::tr1::functional often enough instead–functional is somewhat more powerful as you can
bind functors containing state.

I was curious if the std::tr1::functional that ships with VS2010 has any sort of SBO.

(small buffer optimization- this allows it to avoid any dynamic memory allocation if the bound object is small enough).

Turns out it does, although it isn’t terribly large, in the header xxxfunction I found this union…

union _Space_union
{ // storage for small wrappers
_Pfnty _Pfn[3];
void *_Pobj[3];
long double _Ldbl; // for maximum alignment
char _Alias[3 * sizeof (void *)]; // to permit aliasing
} _Space;

Only 12 bytes for storage(in 32 bit mode).

//member fxn takes up 16 bytes and thus causes dynamic allocation
auto memfxn = std::tr1::bind(&RenderModule::UpdateDisplay, this);

//free fxn is only 4 bytes
auto freeFxn = std::tr1::bind(Hello);

//lambda capturing this, is only 4 bytes also
auto lambda = [this](){this->UpdateDisplay();};

Functional also seems to tack on an extra 4 bytes when you pass the bind to the functional,
so the mem fxn ends up requiring a 20 byte SBO to avoid dynamic allocation.

So to avoid dynamic allocation on member functions you can either go into xxxfunction and
simply change the size of the SBO(changing the 3 to a 5 works for mem fxn),
or use a lambda which in turn calls the desired mem fxn, the disadvantage being it involves an extra function call.

The fast delegate library uses less static memory(sizeof shows only 8 bytes)
and does not dynamically allocate, also as the name implies– it is quite fast.

So I’ll continue using fast delegate wherever speed is paramount.

11
Apr

Visual Studio 2010

   Posted by: Foxtox   in Uncategorized

VS2010 is supposedly coming out tomorrow– I’ve been using the release candidate for the last few months and have found it to be a great improvement over VS2008.

VS2010 supports 6 new language level features from C++0x, which is not alot, but they did add some of the important ones at least.

New Language Features:

Lambda’s: Instead of writing incredibly verbose functors/function objects(and often enough not in the location where they are actually used)– you can now use the very succinct lambda syntax to declare an anonymous function that behaviors like a functor. While this doesn’t allow you do anything you couldn’t already do in C++, the resulting code is much shorter and you no longer have to write tedious functor constructors to accept external state. I’ve found lambda support in C++ to be quite awesome.

R-Value references: A new reference type which will bind to temporary’s, this allowed for the introduction of move constructors and move assignment operators– which steal their resources from R-value references. This features allows for faster performing code by the removal of needless copying.
One of the remaining issues with C++ STL is that it often results in a good amount of data copying going on– the addition of R-Value references allows for any STL implementation that takes advantage of them to reduce the amount of copying that occurs(which the VS2010 Dinkumware STL does).

decltype: can deduce the type of an expression– for example this can be used to get the return type of a function. Another useful tool for generic programming.

static_assert: compile time asserts that will print whatever message you specified, this is nice because you can add a useful description of the problem, better then using C++ hackery to get a description with lots of underscores.

nullptr: so you can say float* ptr = nullptr instead of 0. One problem with this is that it uses the same keyword as the C++/cli version, but has a different meaning, so MS has added a non-standard __nullptr which will always act as the native nullptr, even in C++/cli. I like this addition, but the conflict with C++/cli is annoying.

auto: you can now declare an objects type to be auto and the compiler will attempt to deduce which type it is, this allows for more concise code(especially when it comes to iterators) and is useful for generic programming. Nice feature.

C++0x features that didn’t make it include variadic templates, strongly typed enums(so tired of wrapping enums in classes/namespaces), initializer lists, template typedefs, and foreach.

New stuff in the standard library

forward_list – singly-linked list– cool, but I rarely use list so I doubt I’ll use this much either
unique_ptr – now this is I find very very useful, it takes advantage of R-Value references to allow it to be stored in STL containers(unlike auto_ptr), and includes support for a custom deleter similar to shared_ptr

The IDE

The new UI allows you to break off parts of the UI and then drag them outside of the primary window, I’m using a multi-monitor setup at home, so I can drag my output window & search window to the 2nd monitor. Very nice..

Intellisense: from the brief period I attempted to use VS2010 without Visual Assist it did appear to be improved

Concurrency Runtime: Microsoft has been busy working on two C++ systems for handling concurrency– the Asynchronous Agents Library and the Parallel Patterns Library, so far I’ve only dabbled around with the PPL, but I do like what I’ve seen. It does pretty much stick you with windows though, so perhaps forking over $300 for intel’s TBB might be a better route.

F#: I don’t know this language yet, but it was added as a first class language with full intellisense support etc. I’m gonna spend some time learning the language as I’d like to know a functional language.

And boo to Perforce, I’d have expected perforce would have a working plugin for VS2010 by now, but alas they have nothing.

Tags: ,

13
Jun

Few Experiments

   Posted by: Foxtox   in noise

Played around with a few methods for modifying the terrain, so far just doing simple
modifications to the fractional part of the noise input.

The input is from 0.0-1.0, so to keep the terrain continuous it helps if the output also is
(more or less); but at first I was getting terrain with boxy shapes because I wasn’t doing this
correctly.

One very interesting result was this: A world composed out the outline of boxes.

Modification:
_mu = _mu*2.0*1.33134 -1.33134;
_mu = cos(_mu*_mu*_mu*_mu);
tinker.jpg

Where they grow..
Tinker toys

Another example where the signal is not smooth…
Cool but pattern can be seen

Realized what I was doing wrong, fixed it, and implemented a sin filter; it only looks
subtly different from the standard appearance so I won’t bother posting a picture.

More interesting was a sin(x^2) filter:

_mu *= 1.2533141373155002512078826424054;
_mu = sin(_mu*_mu)

Lots of poker type things appeared…
Poke

And some odd looking rounded shapes occasionally..

Smooth

Dark twisted area I’m posting for no particular reason.
darks


Super Shape?

I saw this about Super Shapes and thought I might see if they could be made to
work within my generation process– but the results aren’t so great, and it is also rather
slow since it involves a great many calls to sin/cos.
Not so Super
-Problem being the super shape function expects as input angles, and returns positions;
but I already have a position & just want to modify the density. What I did was calculate an
angle based on an arbitrary center point, and then use the returned position to modify
the existing density dependent upon how far it was from the real position… but
it tended to blob together and stretch/tear in ways that the proper super shapes don’t.

Point Rendering:

Awhile back I was reading the Atom journal and began wondering if points might
be a nice solution for my terrain.

Good:
+ Would greatly reduce memory usage since index buffers wouldn’t be necessary.
+ The poor triangulation MC generates could be avoided and likely point interpolation
would look superior.
+ Could skip most of the MC steps altogether for faster generation
Bad:
– Could be issues if points aren’t dense enough
– The hole filling step doesn’t sound particularly fast, and I would have to do this
multiple times(shadow maps too).
– Might not work so great with occlusion culling

Here is the terrain rendered using points(looks kinda crap from jpg compression)
point cloud

Visual Studio 2010
Also download the VS2010 beta 1, it supports lamba functions and
various other new C++ features, but mainly what I noticed was:

1) Unresponsive UI — often around a 1 second delay
2) It crashed within the first 5 minutes
3) Intellisense can’t compete with visual assist, though it does have some new
abilities I have not seen before(red squiggle lines under things it doesn’t understand;
hover over said items and it actually says in English why).

I look forward to the new C++ language features, but MS really needs to improving
the UI responsiveness; considering I’m running an i7 there is no reason
for it to be so slow.

Bindless Graphics
Also tested out NVidia’s new bindless graphics extensions, but for whatever reason
this resulted in crashes once enough terrain chunks had been generated.
An Nvidia guy told me it was a driver bug, so hopefully I can re-enable this down the road.
Using occlusion culling also seemed to anger it greatly as any attempt to run with OC
enabled would result in a crash within seconds of startup.

I’d like to start using dx11(or 10) more, OGL is just getting old.

5
May

Links

   Posted by: Foxtox   in Uncategorized

Game Dev Blogs:

Atom Engine Old: Interesting tech journal, has some pretty cool ideas
Atom Engine New: New location for above
Real Time Rendering: Lots of good links can be found here
C0DE517E: Another cool programming blog
Ignacio Castaño: Has a very useful website, lots of knowledge about procedural/noise/demo scene related topics.
Wolfgang: Light pre-pass
Infinity: Ysaneya’s procedural universe game
Level of Detail: Graphics talk
frostbite: dx10 and graphics
Anteru: Another graphics blog(voxels/points)

API/Code Links:

physfs: multi-platform fileIO & file packing
Squirrel: scripting language
Simple-fast media layer: Input, other basic stuff

24
Mar

Occlusion

   Posted by: Foxtox   in terrain

Haven’t updated this website in far too long, took a break from this project(moved/new job).
Back on track now; purchased a new computer, primarily intended for development purposes,
it has an i7 920 quad core so I can really stress test my threading.

i7 Results:
Chunk generation on this processor makes my old Pentium D look like a slug,
I have hyper threading enabled, which currently results in 7 job threads being created.
Didn’t measure it scientifically or anything, but eye balling the printout(prints out every
512 chunks), it appears to creating 1k-1.5k chunks per second(old Pentium-D gave
100-200/sec).

Also have a new monitor which supports up to 1920×1200 so I can finally run this at
something higher then 1280×1024. I was surprised to find that maxing the settings for
the terrain(uses horizontal screen width) actually ran relatively smoothly.

Occlusion Culling:
Last weekend I implemented occlusion culling for the terrain system, this has given a
really nice boost to rendering speed, and has also helped out the generation process,
as I can use occlusion information when assigning a priority to chunks that need to be
generated(as in, if parent is occluded, reduce score, if grandparent is also occluded,
reduce it even more etc).

My implementation works like this(using a 3 frame delay):
-Initially we have no OC’ing info, so render terrain front to back, issuing an occlusion
query for each chunk.
-If a chunk fails OC, switch to rendering its bounding box(AABB are rendered after all
actual terrain chunks, and with depth/color writes disabled).
-Newly created chunks take OC’ing state of parent.
-If all children of a parent have failed OC test, fall back to parent(this will result in a
chain reaction of fall backs in portions of scene that are deeply occluded – and more
importantly, less objects to render), once parent becomes visible again, fall forward to
children. This results in an interesting deblurring effect; when you pop your head
around a corner, the world very rapidly comes into focus.

Few issues I ran into with occlusion culling:

-Fast movement over the ledge of a cliff will result in the sudden appearance of new
chunks, but since they occlusion results are delayed three frames, they still think they
are occluded and don’t show up immediately.

-Another problem was with chunks whose AABB test indicated they were the frustum,
but upon rendering, none of the actual triangles are truly within the frustum, resulting in
a failed query. These chunks then promptly disappear(switch to invisible AABB test).
The result of this, is that as the screen pans around, small holes appear on the side of
the screen nearest the direction of movement.

Turns out there is a really simple solution, NV_conditional_render. Just issue the
occlusion queries like normal, then in a later stage attempt to render those
objects you think are occluded with a conditional render and the GPU handles
the rest. I still read back the occlusion results a few frames later, since it
is useful for LOD and chunk generation.

Overall I’m very satisfied with the speedup(not exactly sure what it was, I’ll measure
it at some point, but it was quite noticeable).

Noise code to SSE:
Awhile back(couple months ago) I also rewrote the majority of the noise code
in SSE2. Used AMD CodeAnalyst to find the hotspots and focused on reducing
whichever part it indicated as being the slowest. Another nice performance boost
was noticed. At some point I think I will go back and add 64 bit double support
to the entire pipeline since I would really like to be able to use more then 20 octaves
before running into the damn stair step affect from precision loss, so I’ll
have to write another SSE version using doubles, eventually.

Planet View

18
Nov

Shadows

   Posted by: Foxtox   in Uncategorized, shadows, terrain

I’ve added PSSM(Parallel-Split shadow maps) based shadows, this technique uses
multiple shadow maps each one progressively closer to the viewer. I went with PSSM
because of the size of my terrain, even so, using 4 splits I have to limit how far away
objects can be and still receive shadows.

Having the render the scene an extra four times is pretty unpleasant, hopefully
once I add some more optimizations things will won’t be so slow.

It still has some kinks, I need to eventually take into account the visible
objects when constructing the matrices for the shadow map, and add
some much better filtering. There is also some perspective aliasing that
needs fixing. ShaderX6 apparently has an article covering some of these
things but I don’t have a copy of the book.

Eventually I would like to use some sort of deferred rendering technique since
it would allow for a cleaner shadow system, but not until I can get AA with deferred..
though the light pre-pass method might work.

Some screens with shadows: this made my 6800 cry

In the Shade

A big rock says hi

a moon

A lonely hanger

Page 1 of 212»